Tech Talent Source

Overview

  • Founded Date December 3, 1926
  • Sectors Education Training
  • Posted Jobs 0
  • Viewed 108
Bottom Promo

Company Description

DeepSeek’s First-generation Reasoning Models

DeepSeek’s first-generation thinking designs, achieving performance comparable to OpenAI-o1 throughout math, code, and thinking tasks.

Models

DeepSeek-R1

Distilled designs

DeepSeek team has actually demonstrated that the thinking patterns of larger designs can be distilled into smaller designs, leading to much better performance compared to the reasoning patterns discovered through RL on small models.

Below are the models created through fine-tuning versus numerous dense designs widely utilized in the research neighborhood using reasoning data generated by DeepSeek-R1. The assessment results demonstrate that the distilled smaller sized dense designs carry out well on standards.

DeepSeek-R1-Distill-Qwen-1.5 B

DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Llama-8B

DeepSeek-R1-Distill-Qwen-14B

DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill-Llama-70B

License

The model weights are accredited under the MIT License. DeepSeek-R1 series assistance industrial usage, enable for any modifications and acquired works, consisting of, but not restricted to, distillation for training other LLMs.

Bottom Promo
Bottom Promo
Top Promo