# Model Comparison Matrix

**Source:** r/hermesagent community testing and discussion (May 2026)
**Based on:** 121 comments from "What model are you running?" thread + multiple setup discussions

---

## Quick Reference: Best Models by Use Case

| Use Case | Recommended Model | Provider | Cost Tier |
|----------|------------------|----------|-----------|
| Daily driver (general tasks) | Qwen 3.6-27B | Local/vLLM or OpenRouter | Free-Paid |
| Budget option | MiniMax M2.7 | AIStudio (\\$10/mo plan) | \\$ |
| Best value cloud model | DeepSeek V4 Pro | DeepSeek API directly | \\$$ |
| Complex reasoning tasks | Qwen 3.6-35B or GPT-5.5 | OpenRouter/Cloud | \\$$$ |
| Coding assistant | Qwen 3.6-27B (local) + Claude/GPT for review | Mixed | \\$$-\\$$$ |
| Vision/image analysis | DeepSeek V4 Flash or Gemini 3.1 Flash Preview | Various | \\$$-\\$$ |
| Auxiliary tasks (search, extraction) | DeepSeek V4 Flash or OSS 120B | AIStudio/OpenRouter | \\$ |

---

## Detailed Model Reviews

### Qwen 3.6 Series

**Qwen 3.6-27B** - Community favorite, "custom-made for Hermes"
- **Strengths:** Excellent tool calling, agentic workflows, reasoning
- **Context:** Up to 128k (some users report degradation past this point)
- **Local setup:** vLLM recommended over Ollama for full context support. FP8 quant uses ~60GB VRAM. Q8 GGUF via llama.cpp also viable.
- **Performance:** 90+ TPS on single Pro 6000 with MTP=3 (u/trashacct383)
- **Community verdict:** "Absolute workhorse" - best balance of capability and cost

**Qwen 3.6-35B** - Step up from 27B
- **Strengths:** Better reasoning, handles complex multi-step tasks
- **Local setup:** Requires more VRAM. Q4 quant on RTX 3090 (24GB) gets ~45 TPS with 200k context (u/ObjectiveMediocre748)
- **Community verdict:** "122b for tasks that need more detail" - use as upgrade path from 27B

**Qwen 3.6 Plus 35B** - Cloud variant
- **Strengths:** Full capability without local hardware requirements
- **Cost:** Competitive on OpenRouter and DeepSeek platforms
- **Community verdict:** u/Jonathan_Rivera's favorite for weeks before switching to DS V4 Pro

### MiniMax M2.7

**MiniMax M2.7** - Budget champion with caveats
- **Strengths:** Cheap (\\$10/mo token plan), decent for basic tasks, good auxiliary model
- **Weaknesses:** "All over the place" consistency (u/idefix1515), not top-tier intelligence
- **Best use:** Auxiliary tasks, paired with stronger main model for reasoning
- **Community verdict:** "Forces me to think more and learn twice" - good for learning, not for complex work

### DeepSeek Series

**DeepSeek V4 Pro** - Current community favorite for cloud
- **Strengths:** Excellent capability, cheap via direct API (not OpenRouter), great caching
- **Cost:** \\$1-1.5/day vs \\$2-3/day on OpenRouter for same usage (u/Almarma)
- **Community verdict:** "Really cheap and really efficient using cache" - best cloud value

**DeepSeek V4 Flash** - Lightweight option
- **Strengths:** Very cheap, good for auxiliary tasks and vision
- **Best use:** Vision-only tasks, search/extraction, delegated simple work
- **Community verdict:** Good auxiliary model, not recommended as main driver

### Gemma 4 Series

**Gemma 4 (all variants)** - Generally NOT recommended for Hermes
- **Weaknesses:** Poor agentic performance, weak tool calling
- **Context limitation:** Limited context size on local hardware
- **Community verdict:** "Tried all Gemma4 models, none was great at Agentic" (u/EmuHefty)

### Kimi K2.6

**Kimi K2.6** - Solid alternative
- **Strengths:** Good general reasoning and tool handling
- **Best use:** Medium-tier tasks, monitoring, scraping
- **Community verdict:** "Solid all-around" but not the top pick (u/Fair-Yogurtcloset-21)

### GPT Series

**GPT-5.4 Mini / GPT-5.5** - Premium option
- **Strengths:** High capability, reliable tool calling
- **Weaknesses:** "Very chatty" (u/Ryankolp), expensive for daily use
- **Best use:** Complex tasks where quality matters more than cost
- **Community verdict:** Good for specific high-value tasks, not as daily driver

### GLM 5.1

**GLM 5.1** - Mixed results
- **Issues:** "Model generated invalid tool call" errors reported (u/bef349)
- **Status:** Overloaded/unstable at time of writing
- **Community verdict:** Avoid for now, wait for stability improvements

---

## Provider Comparison

### Direct API vs OpenRouter

| Factor | Direct API | OpenRouter |
|--------|-----------|------------|
| Cost | Usually cheaper (no markup) | Slightly higher prices |
| Caching | Native caching support | Caching may not work as well |
| Model variety | Limited to one provider | Access to many models |
| Reliability | Direct connection, fewer hops | Additional routing layer |
| Best for | Single-model setups | Multi-model experimentation |

**Community recommendation:** Use direct API when you've settled on a model. Use OpenRouter during exploration phase. (u/Maleficent-Anything2)

### Ollama Cloud

- **Cost:** \\$20/mo Pro subscription
- **Models:** Access to many high-end models
- **Missing:** Image generation at time of writing
- **Community verdict:** "Great for complex tasks" but image gen gap is a limitation (u/aaronmcbaron)

---

## Model Routing Strategies

### Community Pattern 1: Tiered Approach (Most Popular)
- **Main model:** Qwen 3.6-27B or DeepSeek V4 Pro
- **Auxiliary model:** DeepSeek V4 Flash or MiniMax M2.7
- **Upgrade path:** Bump to Qwen 3.6-35B or Claude/GPT for complex tasks

### Community Pattern 2: Local + Cloud Hybrid (u/trashacct383)
- **Local:** Qwen 3.6-27B via vLLM for daily work
- **Cloud:** Claude or GPT for planning and review phases
- **Workflow:** Plan with local model -> execute locally -> QC with cloud model

### Community Pattern 3: Orchestrator + Worker (u/An-R-Nguyen)
- **Orchestrator profile:** Main model handles planning and QC
- **Coder profile:** Dedicated coding agent, one-shots requests
- **Pattern:** If quality < 80%, nuke and restart rather than fix

### Community Pattern 4: Free-Tier Pooling (u/azzbeeter)
- **Tool:** llm-keypool proxy
- **Strategy:** Rotate across multiple free-tier API keys from different providers
- **Benefit:** Zero cost, pooled rate limits
- **Warning:** Multiple keys for same provider may violate ToS

---

## Hardware Requirements for Local Models

| Model | Minimum VRAM | Recommended VRAM | Quantization |
|-------|-------------|-----------------|--------------|
| Qwen 3.6-27B (FP8) | 48GB | 60GB+ | FP8 via vLLM |
| Qwen 3.6-27B (Q8) | 32GB | 48GB | Q8 GGUF via llama.cpp |
| Qwen 3.6-35B (Q4) | 16GB | 24GB | Q4 GGUF via Ollama/llama.cpp |
| MiniMax M2.7 | Varies | Check provider docs | Provider-dependent |

**Note on MoE models:** You can offload expert layers to CPU for more context, but expect ~50% TPS reduction. (u/Asleep-Land-3914, u/xeeff)

---

## Model-Specific Issues

### Censored vs Uncensored Models
- **Issue:** Some Qwen variants refuse browser automation on external portals (e.g., school parent portals)
- **Solution:** Use abliterated/uncensored variants for tasks requiring unrestricted access
- **Trade-off:** Uncensored models may have slightly reduced accuracy
- See: Model Variants memory note for specific model names

### Context Window Limits
- **Qwen 3.6-27B:** Handles 128k well, gradual degradation past that point
- **Ollama reported context:** May show lower than actual (e.g., 64k instead of full context)
- **vLLM advantage:** Full advertised context available locally

### Token Usage Optimization
- Switch models less frequently - each switch requires re-reading chat history
- Keep conversations shorter or start new sessions when switching models
- Use caching-enabled providers (DeepSeek direct API excels here)
- Set compression at ~70% for long-running sessions

---

## Community Model Testing Results

From the "What model are you running?" thread (121 responses):

**Most mentioned models:**
1. MiniMax M2.7 - Budget favorite, widely tested
2. Qwen 3.6-27B - Local deployment champion
3. DeepSeek V4 Flash/Pro - Cloud value leader
4. Kimi K2.6 - Solid alternative
5. GPT variants - Premium option for specific tasks

**Least recommended:**
1. Gemma 4 series - Consistently poor agentic performance
2. GLM 5.1 - Stability issues at time of writing