11 KiB
11 KiB
Ollama Docker Setup 🦙 (WSL2 + Windows 11)
Complete guide for running Ollama with Docker Compose and GPU acceleration on WSL2.
📋 Table of Contents
- Prerequisites
- WSL2 Setup
- Installation
- Starting Ollama
- Model Management
- Usage Examples
- API Reference
- Troubleshooting
- Performance Tips
🔧 Prerequisites
Required Software
- Windows 11 with WSL2 enabled
- Ubuntu 24.04 on WSL2
- Docker Desktop for Windows with WSL2 backend
- NVIDIA GPU with CUDA support (RTX series recommended)
- NVIDIA Driver for Windows (latest version)
System Requirements
- Windows 11 Build 22000 or higher
- 16GB RAM minimum (32GB recommended for larger models)
- 50GB+ free disk space for models
- NVIDIA GPU with 8GB+ VRAM
🪟 WSL2 Setup
1. Enable WSL2 (if not already done)
# Run in PowerShell as Administrator
wsl --install
wsl --set-default-version 2
# Install Ubuntu 24.04
wsl --install -d Ubuntu-24.04
# Verify WSL2 is active
wsl --list --verbose
2. Install Docker Desktop for Windows
- Download from Docker Desktop
- Install and enable WSL2 backend in settings
- Enable integration with Ubuntu-24.04 distro in: Settings → Resources → WSL Integration
3. Verify GPU Support in WSL2
# Open WSL2 Ubuntu terminal
wsl
# Check NVIDIA driver
nvidia-smi
# You should see your GPU listed
Important: You do NOT need to install NVIDIA Container Toolkit in WSL2. Docker Desktop handles GPU passthrough automatically.
4. Test Docker GPU Access
# In WSL2 terminal
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
If this works, you're ready to go! 🎉
🚀 Installation
1. Create Project Structure in WSL2
# Open WSL2 terminal
wsl
# Create project directory
mkdir -p ~/ollama-docker
cd ~/ollama-docker
2. Create docker-compose.yml
Use the provided docker-compose.yml file with the WSL2 path:
- Windows path:
E:\volumes\ollama\data - WSL2 path:
/mnt/e/volumes/ollama/data
3. Create Volume Directory
# From WSL2 terminal
sudo mkdir -p /mnt/e/volumes/ollama/data
# Or from Windows PowerShell
mkdir E:\volumes\ollama\data
▶️ Starting Ollama
# Navigate to project directory
cd ~/ollama-docker
# Start the service
docker compose up -d
# Check logs
docker compose logs -f ollama
# Verify service is running
curl http://localhost:11434
Expected response: Ollama is running
Access from Windows
Ollama is accessible from both WSL2 and Windows:
- WSL2:
http://localhost:11434 - Windows:
http://localhost:11434
📦 Model Management
List Available Models
# Inside container
docker exec -it ollama ollama list
# Or from WSL2 (if ollama CLI installed)
ollama list
Pull/Download Models
# Pull a model
docker exec -it ollama ollama pull llama3.2
# Popular models
docker exec -it ollama ollama pull mistral
docker exec -it ollama ollama pull codellama
docker exec -it ollama ollama pull phi3
docker exec -it ollama ollama pull llama3.2:70b
Model Sizes Reference
| Model | Parameters | Size | RAM Required | VRAM Required |
|---|---|---|---|---|
phi3 |
3.8B | ~2.3 GB | 8 GB | 4 GB |
llama3.2 |
8B | ~4.7 GB | 8 GB | 6 GB |
mistral |
7B | ~4.1 GB | 8 GB | 6 GB |
llama3.2:70b |
70B | ~40 GB | 64 GB | 48 GB |
codellama |
7B | ~3.8 GB | 8 GB | 6 GB |
Remove/Unload Models
# Remove a model from disk
docker exec -it ollama ollama rm llama3.2
# Stop a running model (unload from memory)
docker exec -it ollama ollama stop llama3.2
# Show running models
docker exec -it ollama ollama ps
Copy Models Between Systems
# Export model
docker exec ollama ollama show llama3.2 --modelfile > Modelfile
# Import on another system
cat Modelfile | docker exec -i ollama ollama create my-model -f -
💡 Usage Examples
Interactive Chat
# Start interactive session
docker exec -it ollama ollama run llama3.2
# Chat with specific model
docker exec -it ollama ollama run mistral "Explain quantum computing"
Using the API
Generate Completion
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
Chat Completion
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "Hello! Can you help me with Python?"
}
],
"stream": false
}'
Streaming Response
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Write a haiku about programming",
"stream": true
}'
Python Example (from Windows or WSL2)
import requests
import json
def chat_with_ollama(prompt, model="llama3.2"):
url = "http://localhost:11434/api/generate"
payload = {
"model": model,
"prompt": prompt,
"stream": False
}
response = requests.post(url, json=payload)
return response.json()["response"]
# Usage
result = chat_with_ollama("What is Docker?")
print(result)
JavaScript Example (from Windows or WSL2)
async function chatWithOllama(prompt, model = "llama3.2") {
const response = await fetch("http://localhost:11434/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: model,
prompt: prompt,
stream: false
})
});
const data = await response.json();
return data.response;
}
// Usage
chatWithOllama("Explain REST APIs").then(console.log);
🔌 API Reference
Main Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/generate |
POST | Generate text completion |
/api/chat |
POST | Chat completion with conversation history |
/api/tags |
GET | List available models |
/api/pull |
POST | Download a model |
/api/push |
POST | Upload a custom model |
/api/embeddings |
POST | Generate embeddings |
Generate Parameters
{
"model": "llama3.2",
"prompt": "Your prompt here",
"stream": false,
"options": {
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"num_predict": 128,
"stop": ["\n"]
}
}
🐛 Troubleshooting
Container Won't Start
# Check logs
docker compose logs ollama
# Common issues:
# 1. GPU not accessible
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
# 2. Port already in use
netstat -ano | findstr :11434 # From Windows PowerShell
ss -tulpn | grep 11434 # From WSL2
GPU Not Detected in WSL2
# Update NVIDIA driver (from Windows)
# Download latest driver from: https://www.nvidia.com/Download/index.aspx
# Restart WSL2 (from PowerShell)
wsl --shutdown
wsl
# Verify GPU
nvidia-smi
Model Download Fails
# Check disk space
docker exec ollama df -h /root/.ollama
# Check WSL2 disk space
df -h /mnt/e
# Retry with verbose logging
docker exec -it ollama ollama pull llama3.2 --verbose
Out of Memory Errors
# Check GPU memory
nvidia-smi
# Use smaller model or reduce context
docker exec ollama ollama run llama3.2 --num-ctx 2048
WSL2 Disk Space Issues
# Compact WSL2 virtual disk (from PowerShell as Admin)
wsl --shutdown
Optimize-VHD -Path "$env:LOCALAPPDATA\Packages\CanonicalGroupLimited.Ubuntu24.04LTS_*\LocalState\ext4.vhdx" -Mode Full
Docker Desktop Integration Issues
- Open Docker Desktop
- Go to Settings → Resources → WSL Integration
- Enable integration with Ubuntu-24.04
- Click Apply & Restart
Permission Denied on Volume
# From WSL2
sudo chmod -R 755 /mnt/e/volumes/ollama/data
⚡ Performance Tips
1. WSL2 Memory Configuration
Create/edit .wslconfig in Windows user directory (C:\Users\YourName\.wslconfig):
[wsl2]
memory=16GB
processors=8
swap=8GB
Apply changes:
wsl --shutdown
wsl
2. GPU Memory Optimization
# In docker-compose.yml
environment:
- CUDA_VISIBLE_DEVICES=0
- OLLAMA_NUM_GPU=1
3. Concurrent Requests
# In docker-compose.yml
environment:
- OLLAMA_MAX_LOADED_MODELS=3
- OLLAMA_NUM_PARALLEL=4
4. Context Window
# Reduce for faster responses
docker exec ollama ollama run llama3.2 --num-ctx 2048
# Increase for longer conversations
docker exec ollama ollama run llama3.2 --num-ctx 8192
5. Model Quantization
Use quantized models for better performance:
# 4-bit quantization (faster, less accurate)
docker exec ollama ollama pull llama3.2:q4_0
# 8-bit quantization (balanced)
docker exec ollama ollama pull llama3.2:q8_0
6. Store Models on SSD
For best performance, ensure E:\volumes is on an SSD, not HDD.
📊 Monitoring
Check Resource Usage
# Container stats
docker stats ollama
# GPU utilization (from WSL2 or Windows)
nvidia-smi
# Continuous monitoring
watch -n 1 nvidia-smi
Model Status
# Show running models
docker exec ollama ollama ps
# Model information
docker exec ollama ollama show llama3.2
WSL2 Resource Usage
# From Windows PowerShell
wsl --list --verbose
🛑 Stopping and Cleanup
# Stop service
docker compose down
# Stop and remove volumes
docker compose down -v
# Remove all models
docker exec ollama sh -c "rm -rf /root/.ollama/models/*"
# Shutdown WSL2 (from Windows PowerShell)
wsl --shutdown
🔗 Useful Links
- Ollama Official Documentation
- Ollama Model Library
- API Documentation
- WSL2 GPU Documentation
- Docker Desktop WSL2 Backend
🎯 Quick Reference
Common Commands
# Start Ollama
docker compose up -d
# Pull a model
docker exec -it ollama ollama pull llama3.2
# Run interactive chat
docker exec -it ollama ollama run llama3.2
# List models
docker exec -it ollama ollama list
# Check GPU
nvidia-smi
# Stop Ollama
docker compose down
📝 Notes for WSL2 Users
- Path Conversion: Windows
E:\folder= WSL2/mnt/e/folder - Performance: Models stored on Windows drives are accessible but slightly slower
- GPU Passthrough: Handled automatically by Docker Desktop
- Networking:
localhostworks from both Windows and WSL2 - Memory: Configure WSL2 memory in
.wslconfigfor large models
Need help? Open an issue or check the Ollama Discord