About Alibaba Cloud (Qwen)
Alibaba Cloud’s Qwen team develops the Qwen series of large language models, one of the most widely adopted open-weight model families globally. Based in Hangzhou, China, the team has rapidly iterated from Qwen 1.5 through Qwen 2.5, with each generation bringing significant improvements in multilingual understanding, mathematical reasoning, and coding.
The Qwen 2.5 family includes models ranging from 0.5B to 72B parameters, all released under open licenses. QwQ-32B (2025) introduced dedicated reasoning capabilities, rivaling much larger closed-source models on mathematical and scientific benchmarks while remaining open-weight.
Qwen Model Timeline
The evolution of the Qwen model family. Verified scores from official Qwen team announcements.
QwQ 32B
A dedicated reasoning model using reinforcement learning for extended chain-of-thought. QwQ-32B achieves 65.2% on GPQA Diamond despite being only 32B parameters, demonstrating that smaller reasoning-optimized models can compete with much larger general-purpose ones.
Qwen 2.5 72B
The flagship of the Qwen 2.5 family. Achieves 86.1% on MMLU, 49.0% on GPQA, 86.6% on HumanEval, and 84.1% on IFEval (strict prompt). One of the strongest open-weight models across all benchmarks.
Qwen 2 72B
A major architecture upgrade from Qwen 1.5, introducing improved tokenization, longer context windows, and significantly better multilingual understanding. Released as open weights across multiple size variants.
Qwen 1.5 72B
Alibaba’s first widely competitive open-weight model. Qwen 1.5 introduced the family’s foundation of strong multilingual capabilities and demonstrated that open models from China could compete with Western counterparts.
Benchmark Performance
Qwen 2.5 72B scores across verified benchmark categories.
Key Strengths
The entire Qwen 2.5 family is released under open licenses, from 0.5B to 72B parameters, making frontier-class AI accessible to researchers and developers worldwide.
Qwen 2.5 72B achieves competitive scores across all categories: 86.1% MMLU, 86.6% HumanEval, 84.1% IFEval—one of the most balanced open models available.
QwQ-32B demonstrates that small, reasoning-focused models can achieve 65.2% on GPQA Diamond—competitive with models many times its size—through dedicated reinforcement learning training.
About This Data
Benchmark scores are sourced from official Qwen team blog posts and announcements. Qwen 2.5 72B scores use the Instruct variant for GPQA, HumanEval, and IFEval; the MMLU score (86.1%) is from the base model.
Explore More Providers
Compare Alibaba’s Qwen models against other frontier AI systems.