AI Models
Tiered ranking of large language models optimized for agentic workflows. Updated continuously.
Complex reasoning · Strategy · Planning · External dev only
| Model | Cost /1M | Context | Key Specs | Benchmarks | Why This Model |
|---|---|---|---|---|---|
Claude Opus 4.7Anthropic · Apr 2026 | $5 / $25 outFlat model: $5/$25 | 1M128k out |
| SWE-Verified87.6% SWE-Pro64.3% MCP-Atlas77.3% | Best publicly available model Apr 2026. 3x vision resolution. 87.6% SWE-Verified leads all non-preview models. Same price as 4.6. |
Z GLM-5.1Z.AI · Apr 2026 | $1.40 / $4.40 out | 200K131K max output |
| SWE-Bench Pro58.4% SWE-Verified77.8% GPQA Diamond83.9% | #1 SWE-Pro globally at launch (58.4%). Open weights on HuggingFace. 1/3 the cost of Opus. Self-reported benchmarks — independent verification pending. |
Kimi K2.6Moonshot AI · Apr 2026 | $0.95 / $4 outVia Kimi API | 256K |
| SWE-Verified80.2% SWE-Pro58.6% Terminal-Bench 2.066.7% | Open-weight at near-frontier level. 300 sub-agents in parallel. Competitive with Opus 4.6 at a fraction of the cost. Native video input. |
Claude Opus 4.6Anthropic · Feb 2026 | $5 / $25 outFlat model: $5/$25 | 1M128k out |
| SWE-Verified80.8% Terminal B265.4% ARC-AGI-268.9% | #1 agentic terminal coding. Handles own errors in loop over long sessions. API only — no subscription for agents. Superseded by 4.7. |
GPT-5.4OpenAI · Mar 2026 | $2.50 / $15 out2+ above $72k/day | 1.05M |
| SWE-Pro57.7% OSWorld75.0% GPQA Diamond92.8% | Multi-hour autonomous execution with real planning. Only worth the premium when task complexity genuinely needs frontier judgment. |
Tool calls · Long task chains · Multi-step pipelines
| Model | Cost /1M | Context | Key Specs | Benchmarks | Why This Model |
|---|---|---|---|---|---|
Gemini 3.1 ProGoogle · Feb 2026 | $2 / $12 out84k out | 1M |
| ARC-AGI-277.1% GPQA Diamond94.3% SWE-Verified80.6% | 7.5× cheaper than Opus on input. Leads most benchmarks on vision. No separate pipeline for media. |
MiniMax M2.7MiniMax · Mar 2026 | $0.30 / $1.20 out$19/mo = 1500 calls/5h | 205K131k out |
| SWE-Pro56.2% Terminal B257.0% Vibe-Pro55.6% | Best price-to-agent-capability in the stack. 97% skill adherence critical for OpenClaw's skill ecosystems. $19 plan is absurdly good value. |
Kimi K2.5Moonshot · Feb 2026 | $0.60 / $3.00 out | 256K |
| HLE w/ tools50.2% BrowseComp79.4% SWE-Verified76.8% | Best long-context stability for extended tasks. ~6× more output tokens than peers — budget carefully. |
DeepSeek V3.2DeepSeek · Dec 2025 | $0.27 / $0.41 out | 164K |
| SWE-Verified70.0% Aider polyglot74.2% | 90% of GPT-5.4 performance at 1/60th the cost. Best price-performance in this tier via OpenRouter. |
Content · Code · Research · Day-to-day tasks
| Model | Cost /1M | Context | Key Specs | Benchmarks | Why This Model |
|---|---|---|---|---|---|
Claude Sonnet 4.6Anthropic · Feb 2026 | $3 / $15 outAPI only | 1M64k out |
| SWE-Verified79.9% Computer use94.0% AI Index52/100 | 98% of Opus coding at 1/5 the cost. API only — no $10/mo plan exists for this model. |
GPT-5.4 miniOpenAI · Various | $0.75 / $4.50 outOAuth via subscription | 400K |
| SWE-Pro54.4% Tool call r193.4% GSWorld72.1% | Smart enough to run entire system. 93.4% tool-call reliability. ChatGPT OAuth = no API billing needed. |
Qwen3.6 PlusAlibaba · Apr 2026 | FREE NOW$0 / $0 | 1M |
| SWE-Verified78.8% | Best free model available. 1M context. Near-frontier coding. Free until preview window closes. |
Llama 4 MaverickMeta · 2026 | $0.19–$0.49$0 if self-hosted | 1M |
| MMLU85.5% SWE-Verified~68% | Only serious open-weight option at this level. Self-host = zero ongoing cost. Best for data sovereignty needs. |
Mistral Small 4Mistral · Mar 2026 | $0.15 / $0.60 out | 256K |
| AA Intelligence Index27/100 AA LCR score0.72 MATH-500~93.6% | Replaces three separate models with one Apache 2.0 weights file. 162 t/s output. Best simplicity-to-capability ratio in Tier 3. |
Local inference · Zero API cost · Full privacy
Edge Models for Hermes Agent
Local inference models that run entirely on your hardware. Zero API cost. Full privacy. No internet required.