| 1 | Gemini 3.1 Pro Preview | Google | 57.2 | 110.4 | $4.50 |
| 2 | GPT-5.4 (xhigh) | OpenAI | 57.0 | 78.5 | $5.63 |
| 3 | GPT-5.3 Codex (xhigh) | OpenAI | 54.0 | 68.3 | $4.81 |
| 4 | Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 53.0 | 55.1 | $10.00 |
| 5 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 51.7 | 68.5 | $6.00 |
| 6 | GPT-5.2 (xhigh) | OpenAI | 51.3 | 68.5 | $4.81 |
| 7 | GLM-5 (Reasoning) | Z AI | 49.8 | 50.0 | $1.55 |
| 8 | Claude Opus 4.5 (Reasoning) | Anthropic | 49.7 | 60.5 | $10.00 |
| 9 | GPT-5.2 Codex (xhigh) | OpenAI | 49.0 | 64.7 | $4.81 |
| 10 | Gemini 3 Pro Preview (high) | Google | 48.4 | 114.0 | $4.50 |
| 11 | GPT-5.1 (high) | OpenAI | 47.7 | 95.5 | $3.44 |
| 12 | GPT-5.2 (medium) | OpenAI | 46.6 | 0.0 | $4.81 |
| 13 | Claude Opus 4.6 (Non-reasoning, High Effort) | Anthropic | 46.5 | 49.1 | $10.00 |
| 14 | Gemini 3 Flash Preview (Reasoning) | Google | 46.4 | 163.8 | $1.13 |
| 15 | Qwen3.5 397B A17B (Reasoning) | Alibaba | 45.0 | 56.0 | $1.35 |
| 16 | Qwen3.5 397B A17B (Reasoning) | Alibaba | 45.0 | 54.5 | $1.35 |
| 17 | GPT-5 Codex (high) | OpenAI | 44.6 | 182.4 | $3.44 |
| 18 | GPT-5 (high) | OpenAI | 44.6 | 66.7 | $3.44 |
| 19 | Claude Sonnet 4.6 (Non-reasoning, High Effort) | Anthropic | 44.4 | 53.1 | $6.00 |
| 20 | GPT-5.1 Codex (high) | OpenAI | 43.1 | 123.4 | $3.44 |