Verified Performance Data

DeepSeek

The DeepSeek model family: From DeepSeek-V2 to R1. Open-source reasoning models challenging frontier performance. Benchmark data across knowledge, reasoning, coding, math, multimodal, and instruction following.

4
Models Tracked
2023
Founded
90.8
Best Knowledge
79.8
Best Math (AIME)

About DeepSeek

DeepSeek is a Chinese AI research lab that has made waves with its open-source, high-performance language models. Founded in 2023, DeepSeek has rapidly produced models that compete with and sometimes exceed Western frontier labs on key benchmarks, while using novel training efficiency techniques.

DeepSeek-R1 (2025) introduced a reasoning model that rivals OpenAI’s o1, achieving 90.8% on MMLU and 79.8% on AIME 2024. The company’s V3 base model also demonstrated strong performance with 88.5% on MMLU.

DeepSeek Model Timeline

Verified scores from DeepSeek-R1 paper and V3 technical report.

January 2025

DeepSeek-R1

DeepSeek’s reasoning model, rivaling OpenAI o1. Achieves 90.8% on MMLU, 84.0% on MMLU-Pro, 71.5% on GPQA Diamond, and 79.8% on AIME 2024. Also scores 97.3% on MATH-500.

MMLU 90.8 MMLU-Pro 84.0 GPQA 71.5 AIME 79.8 MATH-500 97.3
December 2024

DeepSeek-V3

A strong general-purpose model. Achieves 88.5% on MMLU, 75.9% on MMLU-Pro, and 59.1% on GPQA Diamond. Trained with novel efficiency techniques that significantly reduce compute costs.

MMLU 88.5 MMLU-Pro 75.9 GPQA 59.1
September 2024

DeepSeek-V2.5

An incremental update to V2 with improved instruction following and coding capabilities.

May 2024

DeepSeek-V2

DeepSeek’s second-generation model, introducing mixture-of-experts architecture for efficient scaling.

Benchmark Performance

DeepSeek-R1 scores across verified benchmark categories.

About This Data

Scores sourced from DeepSeek’s peer-reviewed arxiv papers. The R1 paper also provides cross-reference scores for GPT-4o, Claude 3.5, and o1, which are used to verify those providers’ data.