What if your AI assistant never sent a single byte of your data to the cloud?

With Ollama and OpenClaw, that’s not just possible—it’s surprisingly easy. No API keys. No monthly bills. No privacy concerns. Just you and your AI, running entirely on your own hardware.

Here’s how to set it up.

Why Run AI Locally?

Privacy: Your conversations, files, and data never leave your machine. No corporate server ever sees your work.

Cost: Zero API fees. Run as many interactions as you want without watching a meter tick up.

Speed: No network latency. Local inference can feel snappier for many tasks.

Reliability: No outages, rate limits, or “ChatGPT is at capacity” messages. Your AI works when you do.

Control: You choose the model. You choose when to update. No one can change the behavior of your assistant without your consent.

Hardware Requirements

Let’s be realistic about what you need:

Minimum (7B models):

  • 8GB RAM
  • Any modern CPU
  • ~5GB disk space

Recommended (13B-34B models):

  • 16-32GB RAM
  • Apple Silicon Mac or decent GPU
  • 20GB+ disk space

Optimal (70B+ models):

  • 64GB+ RAM or good GPU (RTX 3090+)
  • Fast SSD
  • 50GB+ disk space

The good news? A MacBook Air M1 runs 7B models beautifully. You don’t need a data center.

Step 1: Install Ollama

Ollama makes running local AI models trivially easy.

macOS/Linux:

curl -fsSL https://ollama.com/install.sh | sh

macOS (Homebrew):

brew install ollama

Windows: Download from ollama.com

Start the Ollama service:

ollama serve

Leave this running in the background.

Step 2: Download a Model

Ollama supports dozens of models. Here are the best for assistant use:

Fast & Capable (7B):

ollama pull llama3.2

Great balance of speed and intelligence. Perfect for most tasks.

Smarter (13B):

ollama pull llama3.2:13b

Noticeably better reasoning. Still fast on Apple Silicon.

Maximum Intelligence (70B):

ollama pull llama3.2:70b

Approaches GPT-4 quality. Needs serious hardware.

Coding Specialist:

ollama pull codellama

Optimized for code generation and review.

For your first setup, start with llama3.2. You can always add more later.

Step 3: Configure OpenClaw

Point OpenClaw at your local Ollama instance:

ai:
  provider: ollama
  model: llama3.2
  baseUrl: http://localhost:11434  # Default Ollama port

If you haven’t installed OpenClaw yet:

npm install -g openclaw
openclaw init

Step 4: Test It

Start OpenClaw:

openclaw start

Send a message through your configured channel (Telegram, WhatsApp, etc.):

“What’s the weather like in Paris?”

Your assistant should respond—entirely locally. Check your network monitor if you want proof. Zero bytes sent to the cloud.

Optimizing Performance

Model Selection Strategy

Use different models for different tasks:

ai:
  provider: ollama
  model: llama3.2          # Default
  models:
    coding: codellama      # For code tasks
    fast: llama3.2:7b      # For quick responses
    smart: llama3.2:13b    # For complex reasoning

OpenClaw can automatically route tasks to the best model.

Context Window

Local models typically have smaller context windows (2048-4096 tokens vs 128K for GPT-4). Configure appropriately:

ai:
  provider: ollama
  model: llama3.2
  maxTokens: 2048
  contextStrategy: sliding  # Keeps recent context

GPU Acceleration

If you have a compatible GPU:

NVIDIA (Linux/Windows):

# Install CUDA-enabled Ollama
ollama serve --gpu

Apple Silicon: GPU acceleration is automatic. Metal handles everything.

Memory Management

Running out of RAM? Limit model memory:

OLLAMA_MAX_LOADED_MODELS=1 ollama serve

This unloads models when not in use.

Combining Local and Cloud

You don’t have to go fully local. A hybrid approach often works best:

ai:
  primary:
    provider: ollama
    model: llama3.2
  fallback:
    provider: anthropic
    model: claude-3-opus
    apiKey: sk-ant-xxx
  
  routing:
    local:
      - quick questions
      - file operations
      - simple tasks
    cloud:
      - complex reasoning
      - long documents
      - creative writing

Simple stuff runs locally (free, private). Complex stuff falls back to cloud (when you need the power).

Model Recommendations by Use Case

Use CaseBest ModelWhy
General assistantllama3.2Best all-rounder
Code reviewcodellamaTrained on code
Creative writingmistralGood prose quality
Long contextyarn-llamaExtended context window
Math/reasoningdeepseek-coderStrong analytical ability
Fast responsestinyllamaSub-second inference

Troubleshooting

“Connection refused” errors: Make sure Ollama is running:

ollama serve

Slow responses:

  • Try a smaller model (7B instead of 13B)
  • Reduce context window
  • Check RAM usage

Model download fails:

ollama pull llama3.2 --verbose

Check disk space and network connection.

OpenClaw can’t find Ollama: Verify the port and URL in your config:

ai:
  baseUrl: http://127.0.0.1:11434

The Privacy Payoff

Once you’re running fully local:

✅ Every conversation is private ✅ Every file analysis stays on-device ✅ No API costs, ever ✅ No dependency on cloud services ✅ Complete control over your assistant’s behavior

This isn’t just a technical achievement. It’s a fundamental shift in who controls your AI relationship.

Next Steps

  1. Experiment with models: Try different options for different tasks
  2. Add skills: Install OpenClaw skills that work offline
  3. Optimize: Fine-tune model selection for your hardware
  4. Consider hybrid: Keep cloud as a fallback for complex tasks

Ready to try? Start with the 10-minute setup, then switch your config to Ollama. For a deep dive on managing costs when you do use cloud APIs, read our API cost reduction guide. If you want the best hardware for local models, check out the Mac Mini setup guide. Your private AI assistant awaits.