DeepSeek: AI Model Benchmarks

4

Models Tracked

2023

Founded

90.8

Best Knowledge

79.8

Best Math (AIME)

About DeepSeek

DeepSeek is a Chinese AI research lab that has made waves with its open-source, high-performance language models. Founded in 2023, DeepSeek has rapidly produced models that compete with and sometimes exceed Western frontier labs on key benchmarks, while using novel training efficiency techniques.

DeepSeek-R1 (2025) introduced a reasoning model that rivals OpenAI’s o1, achieving 90.8% on MMLU and 79.8% on AIME 2024. The company’s V3 base model also demonstrated strong performance with 88.5% on MMLU.

DeepSeek Model Timeline

Verified scores from DeepSeek-R1 paper and V3 technical report.

January 2025

DeepSeek-R1

DeepSeek’s reasoning model, rivaling OpenAI o1. Achieves 90.8% on MMLU, 84.0% on MMLU-Pro, 71.5% on GPQA Diamond, and 79.8% on AIME 2024. Also scores 97.3% on MATH-500.

MMLU 90.8 MMLU-Pro 84.0 GPQA 71.5 AIME 79.8 MATH-500 97.3

December 2024

DeepSeek-V3

A strong general-purpose model. Achieves 88.5% on MMLU, 75.9% on MMLU-Pro, and 59.1% on GPQA Diamond. Trained with novel efficiency techniques that significantly reduce compute costs.

MMLU 88.5 MMLU-Pro 75.9 GPQA 59.1

September 2024

DeepSeek-V2.5

An incremental update to V2 with improved instruction following and coding capabilities.

May 2024

DeepSeek-V2

DeepSeek’s second-generation model, introducing mixture-of-experts architecture for efficient scaling.

Benchmark Performance

DeepSeek-R1 scores across verified benchmark categories.

About This Data

Scores sourced from DeepSeek’s peer-reviewed arxiv papers. The R1 paper also provides cross-reference scores for GPT-4o, Claude 3.5, and o1, which are used to verify those providers’ data.

Explore More Providers

Previous: xAI Back to Leaderboard Next: Mistral