Hermes Guide
Back to guides
Field ManualApr 22, 2026 · 8 min read

Local LLM Setup Guide in Hermes Agent for Beginners

Step-by-step guide to installing Ollama, downloading local models, setting up Hermes Agent, and launching the web dashboard. No prior LLM experience required.

Introduction

Hermes Agent is an open-source AI agent framework by Nous Research that runs entirely on your local machine. Unlike cloud-based assistants, Hermes gives you full privacy, zero API costs, and complete control over your data.

This guide walks you through the entire setup process—from installing Ollama to running your first query in the Hermes web dashboard. No prior experience with local LLMs required.

What You'll Need

Before starting, make sure your system meets these requirements:

  • macOS, Linux, or Windows (with WSL2)
  • At least 16GB RAM for smaller models (9B–12B parameters)
  • 32GB+ RAM recommended for larger models (27B–31B parameters)
  • Git installed on your system
  • Node.js v20+ (for browser tools)
  • A stable internet connection for downloading models

Note: Hermes uses Ollama as its local inference engine. Every model you run is downloaded to your machine, so ensure you have enough disk space (each model ranges from 5GB to 25GB).


Step 1: Install Ollama

Ollama is the engine that runs large language models locally. Installing it takes less than a minute.

Open your terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

This command downloads and installs Ollama along with the CLI tools you need to pull and run models.

Ollama homepage showing the install command
Ollama homepage showing the install command
The Ollama homepage at ollama.com—copy the install command or download the desktop app.

Verify the installation:

ollama --version

You should see the Ollama version printed. If not, restart your terminal and try again.


Step 2: Choose Your Model

Hermes Agent supports both cloud models (via API keys) and local models (via Ollama). For this guide, we focus on local models—because that's where the real power of Hermes shines.

Head to the Ollama model library and search for your preferred model. The two most popular families for Hermes are Qwen 3.5 and Gemma 4.

Ollama library page for qwen3.5:9b
Ollama library page for qwen3.5:9b
The Qwen 3.5 model page on Ollama—showing all available parameter sizes from 0.8B to 397B.

Model Recommendations by RAM

RAMRecommended ModelsSizeBest For
16GBgemma4:e4b~4B effectiveFast responses, coding, everyday tasks
16GBqwen3.5:9b~9B parametersReasoning, multimodal, agentic workflows
32GBgemma4:31b~31B parametersBest raw quality, deep reasoning, complex coding
32GBqwen3.5:27b~27B parametersAdvanced reasoning, long-context tasks, research

Why these models?

  • Gemma 4 (Google): Apache 2.0 licensed, natively multimodal, optimized for reasoning and coding. The E4B variant is the "sweet spot" for desktop users—fast and capable.
  • Qwen 3.5 (Alibaba): Open-weight, 256K context window, strong at agentic tool use and visual understanding. The 9B and 27B variants balance speed and intelligence.

Tip: If you're unsure, start with qwen3.5:9b. It's the most forgiving beginner model and handles a wide range of tasks well.


Step 3: Download Your Model

Once you've chosen a model, download it with a single command. Ollama handles everything—downloading weights, verifying integrity, and preparing the model for inference.

For 16GB RAM systems:

# Option A: Gemma 4 E4B (fast, lightweight)
ollama run gemma4:e4b

# Option B: Qwen 3.5 9B (balanced, capable)
ollama run qwen3.5:9b

For 32GB RAM systems:

# Option A: Gemma 4 31B (best quality, dense architecture)
ollama run gemma4:31b

# Option B: Qwen 3.5 27B (advanced reasoning, 256K context)
ollama run qwen3.5:27b
Terminal showing ollama run qwen3.5:9b command
Terminal showing ollama run qwen3.5:9b command
Downloading a model is as simple as running ollama run [model]—Ollama pulls the weights automatically.

First-time downloads take 5–20 minutes depending on your internet speed and model size. The 9B model is ~6GB; the 27B model is ~16GB.

Verify the model is ready:

ollama list

You should see your downloaded model in the list with its size and modification date.


Step 4: Install Hermes Agent

With Ollama and your model ready, it's time to install Hermes Agent.

Run the official installer script:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
Hermes Agent installer running in terminal
Hermes Agent installer running in terminal
The Hermes installer checks for dependencies—uv, Python 3.11, Git, Node.js, ripgrep, and ffmpeg.

The installer will:

  1. Detect your operating system
  2. Check for required dependencies (uv, Python 3.11+, Git, Node.js, ripgrep, ffmpeg)
  3. Install any missing tools automatically
  4. Clone the Hermes Agent repository to ~/.hermes/hermes-agent
  5. Set up the Python virtual environment

If you already have Hermes installed, the script updates your existing installation to the latest version.


Step 5: Select Your Model in Hermes

Launch Hermes to open the model picker:

ollama launch hermes

Hermes Agent supports multiple models simultaneously. When you first run this command, it presents a model picker where you choose which model to use for your session.

Hermes model picker showing local and cloud options
Hermes model picker showing local and cloud options
The Hermes model picker—showing recommended cloud models (top) and local Ollama models (bottom).

Navigating the picker:

  • Use ↑/↓ arrow keys to navigate
  • Press Enter to select
  • Press to go back

Recommended setup for beginners:

  • Choose qwen3.5:9b or gemma4:e4b from the local section
  • Cloud models (kimi-k2.5, glm-5.1, etc.) require API keys and are optional

Cloud vs Local: Cloud models offer frontier-level intelligence but cost money per request. Local models are free forever after download but require more RAM. For this guide, stick with local.


Step 6: Start Hermes

With your model selected, start Hermes Agent:

hermes
Terminal showing the hermes command being typed
Terminal showing the hermes command being typed
Simply type hermes in your terminal to launch the agent.

Hermes will:

  1. Load the selected model into memory
  2. Initialize its tool ecosystem (web search, file operations, code execution)
  3. Present you with a chat interface

First launch takes 30–60 seconds as the model is loaded into RAM. Subsequent launches are faster.


Step 7: Test Your Setup

Time for your first query. Hermes supports natural language—just type what you want.

Try asking something that requires real-time information:

> can you give me current world news quick
Hermes CLI responding to a world news query
Hermes CLI responding to a world news query
Hermes searching DuckDuckGo for current world news and summarizing headlines.

Hermes will:

  1. Analyze your request
  2. Use the DuckDuckGo search tool to find current information
  3. Summarize the results into readable headlines

What just happened? Hermes didn't rely on its training data (which has a cutoff date). Instead, it used a tool— DuckDuckGo search—to fetch real-time information. This is the core power of agentic AI.

Other things to try:

  • "Write a Python script that fetches weather data"
  • "Explain quantum computing like I'm five"
  • "Analyze this CSV file and plot the trends" (drag a file into the terminal)

Step 8: Launch the Web Dashboard

While the CLI is powerful, the web dashboard offers a richer experience—especially for long conversations and visual workflows.

Launch it with:

hermes dashboard
Terminal building and launching the Hermes web dashboard
Terminal building and launching the Hermes web dashboard
Running hermes dashboard builds the web UI and opens it at http://127.0.0.1:9119.

Hermes will:

  1. Build the web UI (takes ~10 seconds on first run)
  2. Start a local server
  3. Open your browser automatically

The dashboard runs entirely locally—no data leaves your machine.


Step 9: Explore the Dashboard

The Hermes dashboard is your mission control center. Here's what each tab does:

Full Hermes web dashboard showing sessions, chat, and tool usage
Full Hermes web dashboard showing sessions, chat, and tool usage
The Hermes dashboard—sessions tab showing a live conversation with tool calls visible.
TabPurpose
StatusSystem health, model load status, active tools
SessionsChat history, manage multiple conversations
AnalyticsToken usage, response times, cost tracking
LogsDebug output, tool execution traces
CronSchedule recurring tasks and automations
SkillsBrowse and install community skill packs
ExamplePre-built prompts and workflow templates
ConfigModel settings, tool preferences, API keys
KeysManage cloud provider API keys (optional)

Key features to explore:

  • Search your conversation history
  • Switch models mid-session without restarting
  • View tool calls in real-time (see exactly what Hermes is doing)
  • Export conversations as Markdown or JSON

Troubleshooting

"Ollama not found"

Restart your terminal after installing Ollama. If it still fails, add Ollama to your PATH:

export PATH="$PATH:/usr/local/bin"

"Model download is stuck"

Large models can take 20+ minutes. If it freezes, cancel (Ctrl+C) and retry:

ollama pull qwen3.5:9b

"Hermes says 'model not found'"

Make sure you've downloaded the model via Ollama first:

ollama list

If the model isn't listed, re-run the download command.

"Out of memory errors"

Your model is too large for your RAM. Switch to a smaller variant:

  • 16GB RAM → use 9B or E4B models
  • 32GB RAM → use up to 27B models

You can also close other applications to free up RAM.

"Dashboard won't open"

Check if port 9119 is already in use:

lsof -i :9119

If something is using it, kill the process or wait for it to finish.


Next Steps

Now that Hermes is running, here's what to explore next:

  1. Add API keys for cloud models (Claude Opus 4.7, Kimi K2.6, GPT 5.4) in the Config tab
  2. Install skills from the community repository to extend capabilities
  3. Set up cron jobs for automated research and monitoring
  4. Read the AI Models guide to understand which model fits each task
  5. Compare coding plans to decide if you need cloud API access

Hermes Agent is designed to grow with you. Start local, add cloud models when you need frontier-level reasoning, and build custom skills for your specific workflows.

Welcome to local-first AI.