kodjo/AITools

Fork 0

Files

Kodjo Sossouvi c29c7e11ec Added whisper

2025-10-11 23:13:08 +02:00

11 KiB

Raw Blame History

Ollama Docker Setup 🦙 (WSL2 + Windows 11)

Complete guide for running Ollama with Docker Compose and GPU acceleration on WSL2.

📋 Table of Contents

Prerequisites
WSL2 Setup
Installation
Starting Ollama
Model Management
Usage Examples
API Reference
Troubleshooting
Performance Tips

🔧 Prerequisites

Required Software

Windows 11 with WSL2 enabled
Ubuntu 24.04 on WSL2
Docker Desktop for Windows with WSL2 backend
NVIDIA GPU with CUDA support (RTX series recommended)
NVIDIA Driver for Windows (latest version)

System Requirements

Windows 11 Build 22000 or higher
16GB RAM minimum (32GB recommended for larger models)
50GB+ free disk space for models
NVIDIA GPU with 8GB+ VRAM

🪟 WSL2 Setup

1. Enable WSL2 (if not already done)

# Run in PowerShell as Administrator
wsl --install
wsl --set-default-version 2

# Install Ubuntu 24.04
wsl --install -d Ubuntu-24.04

# Verify WSL2 is active
wsl --list --verbose

2. Install Docker Desktop for Windows

Download from Docker Desktop
Install and enable WSL2 backend in settings
Enable integration with Ubuntu-24.04 distro in: Settings → Resources → WSL Integration

3. Verify GPU Support in WSL2

# Open WSL2 Ubuntu terminal
wsl

# Check NVIDIA driver
nvidia-smi

# You should see your GPU listed

Important: You do NOT need to install NVIDIA Container Toolkit in WSL2. Docker Desktop handles GPU passthrough automatically.

4. Test Docker GPU Access

# In WSL2 terminal
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

If this works, you're ready to go! 🎉

🚀 Installation

1. Create Project Structure in WSL2

# Open WSL2 terminal
wsl

# Create project directory
mkdir -p ~/ollama-docker
cd ~/ollama-docker

2. Create `docker-compose.yml`

Use the provided docker-compose.yml file with the WSL2 path:

Windows path: E:\volumes\ollama\data
WSL2 path: /mnt/e/volumes/ollama/data

3. Create Volume Directory

# From WSL2 terminal
sudo mkdir -p /mnt/e/volumes/ollama/data

# Or from Windows PowerShell
mkdir E:\volumes\ollama\data

▶️ Starting Ollama

# Navigate to project directory
cd ~/ollama-docker

# Start the service
docker compose up -d

# Check logs
docker compose logs -f ollama

# Verify service is running
curl http://localhost:11434

Expected response: Ollama is running

Access from Windows

Ollama is accessible from both WSL2 and Windows:

WSL2: http://localhost:11434
Windows: http://localhost:11434

📦 Model Management

List Available Models

# Inside container
docker exec -it ollama ollama list

# Or from WSL2 (if ollama CLI installed)
ollama list

Pull/Download Models

# Pull a model
docker exec -it ollama ollama pull llama3.2

# Popular models
docker exec -it ollama ollama pull mistral
docker exec -it ollama ollama pull codellama
docker exec -it ollama ollama pull phi3
docker exec -it ollama ollama pull llama3.2:70b

Model Sizes Reference

Model	Parameters	Size	RAM Required	VRAM Required
`phi3`	3.8B	~2.3 GB	8 GB	4 GB
`llama3.2`	8B	~4.7 GB	8 GB	6 GB
`mistral`	7B	~4.1 GB	8 GB	6 GB
`llama3.2:70b`	70B	~40 GB	64 GB	48 GB
`codellama`	7B	~3.8 GB	8 GB	6 GB

Remove/Unload Models

# Remove a model from disk
docker exec -it ollama ollama rm llama3.2

# Stop a running model (unload from memory)
docker exec -it ollama ollama stop llama3.2

# Show running models
docker exec -it ollama ollama ps

Copy Models Between Systems

# Export model
docker exec ollama ollama show llama3.2 --modelfile > Modelfile

# Import on another system
cat Modelfile | docker exec -i ollama ollama create my-model -f -

💡 Usage Examples

Interactive Chat

# Start interactive session
docker exec -it ollama ollama run llama3.2

# Chat with specific model
docker exec -it ollama ollama run mistral "Explain quantum computing"

Using the API

Generate Completion

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Chat Completion

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "Hello! Can you help me with Python?"
    }
  ],
  "stream": false
}'

Streaming Response

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Write a haiku about programming",
  "stream": true
}'

Python Example (from Windows or WSL2)

import requests
import json

def chat_with_ollama(prompt, model="llama3.2"):
    url = "http://localhost:11434/api/generate"
    payload = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }
    
    response = requests.post(url, json=payload)
    return response.json()["response"]

# Usage
result = chat_with_ollama("What is Docker?")
print(result)

JavaScript Example (from Windows or WSL2)

async function chatWithOllama(prompt, model = "llama3.2") {
    const response = await fetch("http://localhost:11434/api/generate", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
            model: model,
            prompt: prompt,
            stream: false
        })
    });
    
    const data = await response.json();
    return data.response;
}

// Usage
chatWithOllama("Explain REST APIs").then(console.log);

🔌 API Reference

Main Endpoints

Endpoint	Method	Description
`/api/generate`	POST	Generate text completion
`/api/chat`	POST	Chat completion with conversation history
`/api/tags`	GET	List available models
`/api/pull`	POST	Download a model
`/api/push`	POST	Upload a custom model
`/api/embeddings`	POST	Generate embeddings

Generate Parameters

{
  "model": "llama3.2",
  "prompt": "Your prompt here",
  "stream": false,
  "options": {
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 40,
    "num_predict": 128,
    "stop": ["\n"]
  }
}

🐛 Troubleshooting

Container Won't Start

# Check logs
docker compose logs ollama

# Common issues:
# 1. GPU not accessible
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

# 2. Port already in use
netstat -ano | findstr :11434  # From Windows PowerShell
ss -tulpn | grep 11434         # From WSL2

GPU Not Detected in WSL2

# Update NVIDIA driver (from Windows)
# Download latest driver from: https://www.nvidia.com/Download/index.aspx

# Restart WSL2 (from PowerShell)
wsl --shutdown
wsl

# Verify GPU
nvidia-smi

Model Download Fails

# Check disk space
docker exec ollama df -h /root/.ollama

# Check WSL2 disk space
df -h /mnt/e

# Retry with verbose logging
docker exec -it ollama ollama pull llama3.2 --verbose

Out of Memory Errors

# Check GPU memory
nvidia-smi

# Use smaller model or reduce context
docker exec ollama ollama run llama3.2 --num-ctx 2048

WSL2 Disk Space Issues

# Compact WSL2 virtual disk (from PowerShell as Admin)
wsl --shutdown
Optimize-VHD -Path "$env:LOCALAPPDATA\Packages\CanonicalGroupLimited.Ubuntu24.04LTS_*\LocalState\ext4.vhdx" -Mode Full

Docker Desktop Integration Issues

Open Docker Desktop
Go to Settings → Resources → WSL Integration
Enable integration with Ubuntu-24.04
Click Apply & Restart

Permission Denied on Volume

# From WSL2
sudo chmod -R 755 /mnt/e/volumes/ollama/data

⚡ Performance Tips

1. WSL2 Memory Configuration

Create/edit .wslconfig in Windows user directory (C:\Users\YourName\.wslconfig):

[wsl2]
memory=16GB
processors=8
swap=8GB

Apply changes:

wsl --shutdown
wsl

2. GPU Memory Optimization

# In docker-compose.yml
environment:
  - CUDA_VISIBLE_DEVICES=0
  - OLLAMA_NUM_GPU=1

3. Concurrent Requests

# In docker-compose.yml
environment:
  - OLLAMA_MAX_LOADED_MODELS=3
  - OLLAMA_NUM_PARALLEL=4

4. Context Window

# Reduce for faster responses
docker exec ollama ollama run llama3.2 --num-ctx 2048

# Increase for longer conversations
docker exec ollama ollama run llama3.2 --num-ctx 8192

5. Model Quantization

Use quantized models for better performance:

# 4-bit quantization (faster, less accurate)
docker exec ollama ollama pull llama3.2:q4_0

# 8-bit quantization (balanced)
docker exec ollama ollama pull llama3.2:q8_0

6. Store Models on SSD

For best performance, ensure E:\volumes is on an SSD, not HDD.

📊 Monitoring

Check Resource Usage

# Container stats
docker stats ollama

# GPU utilization (from WSL2 or Windows)
nvidia-smi

# Continuous monitoring
watch -n 1 nvidia-smi

Model Status

# Show running models
docker exec ollama ollama ps

# Model information
docker exec ollama ollama show llama3.2

WSL2 Resource Usage

# From Windows PowerShell
wsl --list --verbose

🛑 Stopping and Cleanup

# Stop service
docker compose down

# Stop and remove volumes
docker compose down -v

# Remove all models
docker exec ollama sh -c "rm -rf /root/.ollama/models/*"

# Shutdown WSL2 (from Windows PowerShell)
wsl --shutdown

🔗 Useful Links

🎯 Quick Reference

Common Commands

# Start Ollama
docker compose up -d

# Pull a model
docker exec -it ollama ollama pull llama3.2

# Run interactive chat
docker exec -it ollama ollama run llama3.2

# List models
docker exec -it ollama ollama list

# Check GPU
nvidia-smi

# Stop Ollama
docker compose down

📝 Notes for WSL2 Users

Path Conversion: Windows E:\folder = WSL2 /mnt/e/folder
Performance: Models stored on Windows drives are accessible but slightly slower
GPU Passthrough: Handled automatically by Docker Desktop
Networking: localhost works from both Windows and WSL2
Memory: Configure WSL2 memory in .wslconfig for large models

Need help? Open an issue or check the Ollama Discord

11 KiB Raw Blame History