Google DeepMind: Gemini Model Benchmarks

6

Models Tracked

2023

DeepMind Merged

89.8

Best Knowledge

92.0

Best Math (AIME)

About Google DeepMind

Google DeepMind was formed in 2023 by merging Google Brain and DeepMind, creating one of the largest AI research organizations in the world. The combined entity develops the Gemini family of multimodal AI models, which power Google’s AI products including Gemini (chat), Google Search AI Overviews, and Android AI features.

The Gemini model family is natively multimodal—designed from the ground up to understand text, images, audio, and video together. Gemini 2.5 Pro (2025) represents Google’s most capable model, achieving leading scores in mathematics and reasoning benchmarks.

Gemini Model Timeline

The evolution of Google’s Gemini family. Verified scores from official Google announcements.

March 2025

Gemini 2.5 Pro

Google’s most capable model. Gemini 2.5 Pro achieves 89.8% on Global MMLU (Lite), 84.0% on GPQA Diamond, and 92.0% on AIME 2024—one of the highest math scores among all AI models. It features built-in “thinking” mode for extended reasoning.

MMLU 89.8 GPQA 84.0 AIME 2024 92.0 AIME 2025 86.7

December 2024

Gemini 2.0 Flash

A speed-optimized model in the Gemini 2.0 generation, designed for fast inference while maintaining strong performance across multimodal tasks.

May 2024

Gemini 1.5 Pro

Introduced a groundbreaking 1 million token context window—by far the largest of any model at the time. Enabled processing of entire codebases, hour-long videos, and massive document collections in a single prompt.

May 2024

Gemini 1.5 Flash

A lightweight, cost-efficient version of Gemini 1.5 designed for high-throughput applications. Maintains the long context window while offering significantly faster inference.

February 2024

Gemini 1.0 Ultra

Google’s first frontier-class model in the Gemini family. Natively multimodal from the ground up, representing a significant architectural departure from prior text-first approaches.

December 2023

Gemini 1.0 Pro

The first model in the Gemini family. Designed as a balanced mid-tier option, it powered Google’s initial Gemini chat product and replaced Bard.

Benchmark Performance

Gemini 2.5 Pro scores across verified benchmark categories.

Key Strengths

Mathematics Leadership

Gemini 2.5 Pro achieves 92.0% on AIME 2024 and 86.7% on AIME 2025, placing it among the very top models for competition-level mathematical reasoning.

Long Context Pioneer

Gemini 1.5 Pro introduced the first 1M token context window, enabling processing of entire codebases, long videos, and massive document collections.

Native Multimodality

Gemini models are designed from the ground up to understand text, images, audio, and video together, rather than adding modalities to a text-first model.

About This Data

Benchmark scores are sourced from official Google DeepMind announcements. Only Gemini 2.5 Pro has verified benchmark scores in our database; older models are listed for historical context.

Explore More Providers

Compare Google’s Gemini models against other frontier AI systems.

Previous: OpenAI Back to Leaderboard Next: Meta AI