OpenClaw with Ollama: Running Local AI Models
Configure OpenClaw with Ollama to run AI models locally. Complete guide for privacy-first setups with no API costs — setup, model recommendations, and performance.
OpenClaw with Ollama enables a fully local AI agent — no API costs, no data sent to external servers, and no dependency on third-party service availability. Ollama runs LLM models directly on your hardware, exposing an OpenAI-compatible API that OpenClaw connects to seamlessly. This guide covers setup, model selection, and realistic performance expectations.
Requirements
Local LLM performance scales directly with RAM. As a rough guide:
| RAM | Best Model | Quality Level |
|---|---|---|
| 8GB | Llama 3.2 3B, Phi-3.5 mini | Good for simple tasks |
| 16GB | Llama 3.1 8B, Mistral 7B | Very good general use |
| 32GB | Llama 3.1 70B Q4, Qwen 2.5 32B | Near GPT-3.5 quality |
| 64GB+ | Llama 3.1 70B full, Qwen 2.5 72B | Excellent for most tasks |
Apple Silicon (M2/M3/M4) unified memory makes Macs particularly efficient — a Mac Mini M4 Pro with 24GB runs Llama 3.1 70B at 4-bit quantization very well.
Installing Ollama
# macOS or Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows: download installer from ollama.ai
# Verify installation
ollama --version
Pulling Models
# General purpose, excellent for most agent tasks
ollama pull llama3.1:8b
# Lightweight and fast, good for high-frequency tasks
ollama pull phi3.5
# Best coding model
ollama pull qwen2.5-coder:7b
# More capable, needs 32GB+ RAM
ollama pull llama3.1:70b
Running Ollama as a Service
# Start Ollama server (it listens on localhost:11434)
ollama serve
On macOS, Ollama starts automatically as a menu bar app. On Linux, enable the systemd service:
sudo systemctl enable ollama
sudo systemctl start ollama
Configuring OpenClaw for Ollama
{
"llm": {
"provider": "ollama",
"base_url": "http://localhost:11434",
"model": "llama3.1:8b",
"temperature": 0.7,
"max_tokens": 4096
}
}
Performance Expectations
Local models are slower than cloud APIs for generation. On a Mac Mini M4 16GB:
| Model | Tokens/second | Typical response time |
|---|---|---|
| Phi-3.5 mini | 60–80 tok/s | 1–3 seconds |
| Llama 3.1 8B | 40–55 tok/s | 3–8 seconds |
| Llama 3.1 70B Q4 | 8–15 tok/s | 20–60 seconds |
For an agent that processes a few messages per hour, these speeds are perfectly acceptable.
Hybrid Setup: Local + Cloud Fallback
Many users combine Ollama for routine tasks with a cloud LLM for complex reasoning:
{
"llm": {
"provider": "ollama",
"model": "llama3.1:8b",
"fallback_provider": "anthropic",
"fallback_model": "claude-3-5-sonnet-20261022",
"fallback_api_key": "sk-ant-...",
"fallback_on_complexity": true
}
}
nacre.sh with Ollama
nacre.sh supports connecting to an external Ollama instance via the dashboard's LLM provider settings. Your Ollama endpoint must be publicly accessible or within the same network as your nacre.sh instance.
Frequently Asked Questions
Which Ollama model is best for OpenClaw tasks?
For general agent use, Llama 3.1 8B is the best combination of quality and speed on 16GB+ systems. For coding tasks, Qwen 2.5 Coder 7B is excellent.
Can Ollama use my GPU?
Yes. Ollama automatically uses NVIDIA GPUs (via CUDA), AMD GPUs (via ROCm), and Apple Silicon's Metal framework. GPU acceleration significantly increases generation speed.
Is a locally running agent completely private?
Yes. Messages never leave your device when using Ollama with a local OpenClaw instance. This is the maximum privacy configuration available.
nacre.sh
Run OpenClaw without the server headaches
Dedicated instance, automatic TLS, nightly backups, and 290+ LLM integrations. Live in under 90 seconds from $12/month.
Deploy your agent →