AI Benchmarks

AI benchmark rankings, model scores, and performance data.

Track live AI benchmark rankings, coding scores, math scores, and benchmark results across leading models from OpenAI, Anthropic, Google, Meta, DeepSeek, and more.

24 of 24 models
# Model Org IntelligenceCodingMathMMLU ProGPQALiveCodeBenchAIME 2025MATH 500SciCodeIFBenchHLE
1 Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) Anthropic 64.9 62.0 9260.0 6020.0 6346.9 5330.0
2 Claude Opus 4.8 (Adaptive Reasoning, Max Effort) Anthropic 61.4 56.7 9200.0 5350.0 6224.5 4570.0
3 GPT-5.5 (xhigh) OpenAI 60.2 59.1 9350.0 5610.0 7585.0 4430.0
4 GPT-5.5 (high) OpenAI 58.9 58.5 9320.0 5590.0 7163.3 4300.0
5 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Anthropic 57.3 52.5 9140.0 5450.0 5863.9 3960.0
6 Gemini 3.1 Pro Preview Google 57.2 55.5 9410.0 5890.0 7714.3 4470.0
7 GPT-5.4 (xhigh) OpenAI 56.8 57.2 9200.0 5660.0 7394.6 4160.0
8 GPT-5.5 (medium) OpenAI 56.7 56.2 9260.0 5350.0 7095.2 4060.0
9 Qwen3.7 Max Alibaba 56.6 50.1 9230.0 4880.0 8054.4 3810.0
10 Gemini 3.5 Flash (high) Google 55.3 45.0 9220.0 5310.0 7632.7 4100.0
11 Gemini 3.5 Flash (medium) Google 54.8 43.9 9210.0 5300.0 7455.8 3990.0
12 MiniMax-M3 MiniMax 54.7 43.4 9290.0 4540.0 8285.7 3710.0
13 Kimi K2.6 Kimi 53.9 47.1 9110.0 5350.0 7598.6 3590.0
14 MiMo-V2.5-Pro Xiaomi 53.8 45.5 8660.0 5020.0 7986.4 3380.0
15 GPT-5.3 Codex (xhigh) OpenAI 53.6 53.1 9150.0 5320.0 7537.4 3990.0
16 Qwen3.7 Plus Alibaba 53.3 46.5 9000.0 4550.0 7795.9 3340.0
17 Grok 4.3 (high) xAI 53.2 41.0 9010.0 4730.0 8129.3 3500.0
18 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) Anthropic 52.9 48.1 8960.0 5190.0 5312.9 3670.0
19 Muse Spark Meta 52.2 47.5 8840.0 5150.0 7591.8 3990.0
20 Qwen3.6 Max Preview Alibaba 51.8 44.9 8880.0 4690.0 7659.9 2890.0
21 Claude Opus 4.7 (Non-reasoning, High Effort) Anthropic 51.8 53.1 8850.0 5010.0 4360.5 3120.0
22 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) Anthropic 51.7 50.9 8750.0 4680.0 5659.9 3000.0
23 DeepSeek V4 Pro (Reasoning, Max Effort) DeepSeek 51.5 47.5 8880.0 5000.0 7646.3 3590.0
24 GLM-5.1 (Reasoning) Z AI 51.4 43.4 8680.0 4380.0 7625.9 2800.0