What if your AI assistant never sent a single byte of your data to the cloud?
With Ollama and OpenClaw, that’s not just possible—it’s surprisingly easy. No API keys. No monthly bills. No privacy concerns. Just you and your AI, running entirely on your own hardware.
Here’s how to set it up.
Why Run AI Locally?
Privacy: Your conversations, files, and data never leave your machine. No corporate server ever sees your work.
Cost: Zero API fees. Run as many interactions as you want without watching a meter tick up.
Speed: No network latency. Local inference can feel snappier for many tasks.
Reliability: No outages, rate limits, or “ChatGPT is at capacity” messages. Your AI works when you do.
Control: You choose the model. You choose when to update. No one can change the behavior of your assistant without your consent.
Hardware Requirements
Let’s be realistic about what you need:
Minimum (7B models):
- 8GB RAM
- Any modern CPU
- ~5GB disk space
Recommended (13B-34B models):
- 16-32GB RAM
- Apple Silicon Mac or decent GPU
- 20GB+ disk space
Optimal (70B+ models):
- 64GB+ RAM or good GPU (RTX 3090+)
- Fast SSD
- 50GB+ disk space
The good news? A MacBook Air M1 runs 7B models beautifully. You don’t need a data center.
Step 1: Install Ollama
Ollama makes running local AI models trivially easy.
macOS/Linux:
curl -fsSL https://ollama.com/install.sh | sh
macOS (Homebrew):
brew install ollama
Windows: Download from ollama.com
Start the Ollama service:
ollama serve
Leave this running in the background.
Step 2: Download a Model
Ollama supports dozens of models. Here are the best for assistant use:
Fast & Capable (7B):
ollama pull llama3.2
Great balance of speed and intelligence. Perfect for most tasks.
Smarter (13B):
ollama pull llama3.2:13b
Noticeably better reasoning. Still fast on Apple Silicon.
Maximum Intelligence (70B):
ollama pull llama3.2:70b
Approaches GPT-4 quality. Needs serious hardware.
Coding Specialist:
ollama pull codellama
Optimized for code generation and review.
For your first setup, start with llama3.2. You can always add more later.
Step 3: Configure OpenClaw
Point OpenClaw at your local Ollama instance:
ai:
provider: ollama
model: llama3.2
baseUrl: http://localhost:11434 # Default Ollama port
If you haven’t installed OpenClaw yet:
npm install -g openclaw
openclaw init
Step 4: Test It
Start OpenClaw:
openclaw start
Send a message through your configured channel (Telegram, WhatsApp, etc.):
“What’s the weather like in Paris?”
Your assistant should respond—entirely locally. Check your network monitor if you want proof. Zero bytes sent to the cloud.
Optimizing Performance
Model Selection Strategy
Use different models for different tasks:
ai:
provider: ollama
model: llama3.2 # Default
models:
coding: codellama # For code tasks
fast: llama3.2:7b # For quick responses
smart: llama3.2:13b # For complex reasoning
OpenClaw can automatically route tasks to the best model.
Context Window
Local models typically have smaller context windows (2048-4096 tokens vs 128K for GPT-4). Configure appropriately:
ai:
provider: ollama
model: llama3.2
maxTokens: 2048
contextStrategy: sliding # Keeps recent context
GPU Acceleration
If you have a compatible GPU:
NVIDIA (Linux/Windows):
# Install CUDA-enabled Ollama
ollama serve --gpu
Apple Silicon: GPU acceleration is automatic. Metal handles everything.
Memory Management
Running out of RAM? Limit model memory:
OLLAMA_MAX_LOADED_MODELS=1 ollama serve
This unloads models when not in use.
Combining Local and Cloud
You don’t have to go fully local. A hybrid approach often works best:
ai:
primary:
provider: ollama
model: llama3.2
fallback:
provider: anthropic
model: claude-3-opus
apiKey: sk-ant-xxx
routing:
local:
- quick questions
- file operations
- simple tasks
cloud:
- complex reasoning
- long documents
- creative writing
Simple stuff runs locally (free, private). Complex stuff falls back to cloud (when you need the power).
Model Recommendations by Use Case
| Use Case | Best Model | Why |
|---|---|---|
| General assistant | llama3.2 | Best all-rounder |
| Code review | codellama | Trained on code |
| Creative writing | mistral | Good prose quality |
| Long context | yarn-llama | Extended context window |
| Math/reasoning | deepseek-coder | Strong analytical ability |
| Fast responses | tinyllama | Sub-second inference |
Troubleshooting
“Connection refused” errors: Make sure Ollama is running:
ollama serve
Slow responses:
- Try a smaller model (7B instead of 13B)
- Reduce context window
- Check RAM usage
Model download fails:
ollama pull llama3.2 --verbose
Check disk space and network connection.
OpenClaw can’t find Ollama: Verify the port and URL in your config:
ai:
baseUrl: http://127.0.0.1:11434
The Privacy Payoff
Once you’re running fully local:
✅ Every conversation is private ✅ Every file analysis stays on-device ✅ No API costs, ever ✅ No dependency on cloud services ✅ Complete control over your assistant’s behavior
This isn’t just a technical achievement. It’s a fundamental shift in who controls your AI relationship.
Next Steps
- Experiment with models: Try different options for different tasks
- Add skills: Install OpenClaw skills that work offline
- Optimize: Fine-tune model selection for your hardware
- Consider hybrid: Keep cloud as a fallback for complex tasks
Ready to try? Start with the 10-minute setup, then switch your config to Ollama. For a deep dive on managing costs when you do use cloud APIs, read our API cost reduction guide. If you want the best hardware for local models, check out the Mac Mini setup guide. Your private AI assistant awaits.