Files
AITools/Readme_Ollama.md
2025-10-11 23:13:08 +02:00

11 KiB

Ollama Docker Setup 🦙 (WSL2 + Windows 11)

Complete guide for running Ollama with Docker Compose and GPU acceleration on WSL2.

📋 Table of Contents

🔧 Prerequisites

Required Software

  • Windows 11 with WSL2 enabled
  • Ubuntu 24.04 on WSL2
  • Docker Desktop for Windows with WSL2 backend
  • NVIDIA GPU with CUDA support (RTX series recommended)
  • NVIDIA Driver for Windows (latest version)

System Requirements

  • Windows 11 Build 22000 or higher
  • 16GB RAM minimum (32GB recommended for larger models)
  • 50GB+ free disk space for models
  • NVIDIA GPU with 8GB+ VRAM

🪟 WSL2 Setup

1. Enable WSL2 (if not already done)

# Run in PowerShell as Administrator
wsl --install
wsl --set-default-version 2

# Install Ubuntu 24.04
wsl --install -d Ubuntu-24.04

# Verify WSL2 is active
wsl --list --verbose

2. Install Docker Desktop for Windows

  1. Download from Docker Desktop
  2. Install and enable WSL2 backend in settings
  3. Enable integration with Ubuntu-24.04 distro in: Settings → Resources → WSL Integration

3. Verify GPU Support in WSL2

# Open WSL2 Ubuntu terminal
wsl

# Check NVIDIA driver
nvidia-smi

# You should see your GPU listed

Important: You do NOT need to install NVIDIA Container Toolkit in WSL2. Docker Desktop handles GPU passthrough automatically.

4. Test Docker GPU Access

# In WSL2 terminal
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

If this works, you're ready to go! 🎉

🚀 Installation

1. Create Project Structure in WSL2

# Open WSL2 terminal
wsl

# Create project directory
mkdir -p ~/ollama-docker
cd ~/ollama-docker

2. Create docker-compose.yml

Use the provided docker-compose.yml file with the WSL2 path:

  • Windows path: E:\volumes\ollama\data
  • WSL2 path: /mnt/e/volumes/ollama/data

3. Create Volume Directory

# From WSL2 terminal
sudo mkdir -p /mnt/e/volumes/ollama/data

# Or from Windows PowerShell
mkdir E:\volumes\ollama\data

▶️ Starting Ollama

# Navigate to project directory
cd ~/ollama-docker

# Start the service
docker compose up -d

# Check logs
docker compose logs -f ollama

# Verify service is running
curl http://localhost:11434

Expected response: Ollama is running

Access from Windows

Ollama is accessible from both WSL2 and Windows:

  • WSL2: http://localhost:11434
  • Windows: http://localhost:11434

📦 Model Management

List Available Models

# Inside container
docker exec -it ollama ollama list

# Or from WSL2 (if ollama CLI installed)
ollama list

Pull/Download Models

# Pull a model
docker exec -it ollama ollama pull llama3.2

# Popular models
docker exec -it ollama ollama pull mistral
docker exec -it ollama ollama pull codellama
docker exec -it ollama ollama pull phi3
docker exec -it ollama ollama pull llama3.2:70b

Model Sizes Reference

Model Parameters Size RAM Required VRAM Required
phi3 3.8B ~2.3 GB 8 GB 4 GB
llama3.2 8B ~4.7 GB 8 GB 6 GB
mistral 7B ~4.1 GB 8 GB 6 GB
llama3.2:70b 70B ~40 GB 64 GB 48 GB
codellama 7B ~3.8 GB 8 GB 6 GB

Remove/Unload Models

# Remove a model from disk
docker exec -it ollama ollama rm llama3.2

# Stop a running model (unload from memory)
docker exec -it ollama ollama stop llama3.2

# Show running models
docker exec -it ollama ollama ps

Copy Models Between Systems

# Export model
docker exec ollama ollama show llama3.2 --modelfile > Modelfile

# Import on another system
cat Modelfile | docker exec -i ollama ollama create my-model -f -

💡 Usage Examples

Interactive Chat

# Start interactive session
docker exec -it ollama ollama run llama3.2

# Chat with specific model
docker exec -it ollama ollama run mistral "Explain quantum computing"

Using the API

Generate Completion

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Chat Completion

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "Hello! Can you help me with Python?"
    }
  ],
  "stream": false
}'

Streaming Response

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Write a haiku about programming",
  "stream": true
}'

Python Example (from Windows or WSL2)

import requests
import json

def chat_with_ollama(prompt, model="llama3.2"):
    url = "http://localhost:11434/api/generate"
    payload = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }
    
    response = requests.post(url, json=payload)
    return response.json()["response"]

# Usage
result = chat_with_ollama("What is Docker?")
print(result)

JavaScript Example (from Windows or WSL2)

async function chatWithOllama(prompt, model = "llama3.2") {
    const response = await fetch("http://localhost:11434/api/generate", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
            model: model,
            prompt: prompt,
            stream: false
        })
    });
    
    const data = await response.json();
    return data.response;
}

// Usage
chatWithOllama("Explain REST APIs").then(console.log);

🔌 API Reference

Main Endpoints

Endpoint Method Description
/api/generate POST Generate text completion
/api/chat POST Chat completion with conversation history
/api/tags GET List available models
/api/pull POST Download a model
/api/push POST Upload a custom model
/api/embeddings POST Generate embeddings

Generate Parameters

{
  "model": "llama3.2",
  "prompt": "Your prompt here",
  "stream": false,
  "options": {
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 40,
    "num_predict": 128,
    "stop": ["\n"]
  }
}

🐛 Troubleshooting

Container Won't Start

# Check logs
docker compose logs ollama

# Common issues:
# 1. GPU not accessible
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

# 2. Port already in use
netstat -ano | findstr :11434  # From Windows PowerShell
ss -tulpn | grep 11434         # From WSL2

GPU Not Detected in WSL2

# Update NVIDIA driver (from Windows)
# Download latest driver from: https://www.nvidia.com/Download/index.aspx

# Restart WSL2 (from PowerShell)
wsl --shutdown
wsl

# Verify GPU
nvidia-smi

Model Download Fails

# Check disk space
docker exec ollama df -h /root/.ollama

# Check WSL2 disk space
df -h /mnt/e

# Retry with verbose logging
docker exec -it ollama ollama pull llama3.2 --verbose

Out of Memory Errors

# Check GPU memory
nvidia-smi

# Use smaller model or reduce context
docker exec ollama ollama run llama3.2 --num-ctx 2048

WSL2 Disk Space Issues

# Compact WSL2 virtual disk (from PowerShell as Admin)
wsl --shutdown
Optimize-VHD -Path "$env:LOCALAPPDATA\Packages\CanonicalGroupLimited.Ubuntu24.04LTS_*\LocalState\ext4.vhdx" -Mode Full

Docker Desktop Integration Issues

  1. Open Docker Desktop
  2. Go to Settings → Resources → WSL Integration
  3. Enable integration with Ubuntu-24.04
  4. Click Apply & Restart

Permission Denied on Volume

# From WSL2
sudo chmod -R 755 /mnt/e/volumes/ollama/data

Performance Tips

1. WSL2 Memory Configuration

Create/edit .wslconfig in Windows user directory (C:\Users\YourName\.wslconfig):

[wsl2]
memory=16GB
processors=8
swap=8GB

Apply changes:

wsl --shutdown
wsl

2. GPU Memory Optimization

# In docker-compose.yml
environment:
  - CUDA_VISIBLE_DEVICES=0
  - OLLAMA_NUM_GPU=1

3. Concurrent Requests

# In docker-compose.yml
environment:
  - OLLAMA_MAX_LOADED_MODELS=3
  - OLLAMA_NUM_PARALLEL=4

4. Context Window

# Reduce for faster responses
docker exec ollama ollama run llama3.2 --num-ctx 2048

# Increase for longer conversations
docker exec ollama ollama run llama3.2 --num-ctx 8192

5. Model Quantization

Use quantized models for better performance:

# 4-bit quantization (faster, less accurate)
docker exec ollama ollama pull llama3.2:q4_0

# 8-bit quantization (balanced)
docker exec ollama ollama pull llama3.2:q8_0

6. Store Models on SSD

For best performance, ensure E:\volumes is on an SSD, not HDD.

📊 Monitoring

Check Resource Usage

# Container stats
docker stats ollama

# GPU utilization (from WSL2 or Windows)
nvidia-smi

# Continuous monitoring
watch -n 1 nvidia-smi

Model Status

# Show running models
docker exec ollama ollama ps

# Model information
docker exec ollama ollama show llama3.2

WSL2 Resource Usage

# From Windows PowerShell
wsl --list --verbose

🛑 Stopping and Cleanup

# Stop service
docker compose down

# Stop and remove volumes
docker compose down -v

# Remove all models
docker exec ollama sh -c "rm -rf /root/.ollama/models/*"

# Shutdown WSL2 (from Windows PowerShell)
wsl --shutdown

🎯 Quick Reference

Common Commands

# Start Ollama
docker compose up -d

# Pull a model
docker exec -it ollama ollama pull llama3.2

# Run interactive chat
docker exec -it ollama ollama run llama3.2

# List models
docker exec -it ollama ollama list

# Check GPU
nvidia-smi

# Stop Ollama
docker compose down

📝 Notes for WSL2 Users

  • Path Conversion: Windows E:\folder = WSL2 /mnt/e/folder
  • Performance: Models stored on Windows drives are accessible but slightly slower
  • GPU Passthrough: Handled automatically by Docker Desktop
  • Networking: localhost works from both Windows and WSL2
  • Memory: Configure WSL2 memory in .wslconfig for large models

Need help? Open an issue or check the Ollama Discord