Hermes Guide
Benchmarks

AI Models

Tiered ranking of large language models optimized for agentic workflows. Updated continuously.

Tier 1Frontier

Complex reasoning · Strategy · Planning · External dev only

5 models
Anthropic
Claude Opus 4.7Anthropic · Apr 2026
Cost /1M$5 / $25 out
Context1M
SWE-Verified87.6%
SWE-Pro64.3%
MCP-Atlas77.3%
Z
GLM-5.1Z.AI · Apr 2026
Cost /1M$1.40 / $4.40 out
Context200K
SWE-Bench Pro58.4%
SWE-Verified77.8%
GPQA Diamond83.9%
Moonshot AI
Kimi K2.6Moonshot AI · Apr 2026
Cost /1M$0.95 / $4 out
Context256K
SWE-Verified80.2%
SWE-Pro58.6%
Terminal-Bench 2.066.7%
Anthropic
Claude Opus 4.6Anthropic · Feb 2026
Cost /1M$5 / $25 out
Context1M
SWE-Verified80.8%
Terminal B265.4%
ARC-AGI-268.9%
OpenAI
GPT-5.4OpenAI · Mar 2026
Cost /1M$2.50 / $15 out
Context1.05M
SWE-Pro57.7%
OSWorld75.0%
GPQA Diamond92.8%
Tier 2Agent Execution

Tool calls · Long task chains · Multi-step pipelines

4 models
Google
Gemini 3.1 ProGoogle · Feb 2026
Cost /1M$2 / $12 out
Context1M
ARC-AGI-277.1%
GPQA Diamond94.3%
SWE-Verified80.6%
MiniMax
MiniMax M2.7MiniMax · Mar 2026
Cost /1M$0.30 / $1.20 out
Context205K
SWE-Pro56.2%
Terminal B257.0%
Vibe-Pro55.6%
Moonshot
Kimi K2.5Moonshot · Feb 2026
Cost /1M$0.60 / $3.00 out
Context256K
HLE w/ tools50.2%
BrowseComp79.4%
SWE-Verified76.8%
DeepSeek
DeepSeek V3.2DeepSeek · Dec 2025
Cost /1M$0.27 / $0.41 out
Context164K
SWE-Verified70.0%
Aider polyglot74.2%
Tier 3Balanced

Content · Code · Research · Day-to-day tasks

5 models
Anthropic
Claude Sonnet 4.6Anthropic · Feb 2026
Cost /1M$3 / $15 out
Context1M
SWE-Verified79.9%
Computer use94.0%
AI Index52/100
OpenAI
GPT-5.4 miniOpenAI · Various
Cost /1M$0.75 / $4.50 out
Context400K
SWE-Pro54.4%
Tool call r193.4%
GSWorld72.1%
Alibaba
Qwen3.6 PlusAlibaba · Apr 2026
Cost /1MFREE NOW
Context1M
SWE-Verified78.8%
Meta
Llama 4 MaverickMeta · 2026
Cost /1M$0.19–$0.49
Context1M
MMLU85.5%
SWE-Verified~68%
Mistral
Mistral Small 4Mistral · Mar 2026
Cost /1M$0.15 / $0.60 out
Context256K
AA Intelligence Index27/100
AA LCR score0.72
MATH-500~93.6%
Open Source — Runs on Device

Local inference · Zero API cost · Full privacy

Coming Soon
Coming Soon

Edge Models for Hermes Agent

Local inference models that run entirely on your hardware. Zero API cost. Full privacy. No internet required.