541 lines
11 KiB
Markdown
541 lines
11 KiB
Markdown
# Ollama Docker Setup 🦙 (WSL2 + Windows 11)
|
|
|
|
Complete guide for running Ollama with Docker Compose and GPU acceleration on WSL2.
|
|
|
|
## 📋 Table of Contents
|
|
|
|
- [Prerequisites](#prerequisites)
|
|
- [WSL2 Setup](#wsl2-setup)
|
|
- [Installation](#installation)
|
|
- [Starting Ollama](#starting-ollama)
|
|
- [Model Management](#model-management)
|
|
- [Usage Examples](#usage-examples)
|
|
- [API Reference](#api-reference)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [Performance Tips](#performance-tips)
|
|
|
|
## 🔧 Prerequisites
|
|
|
|
### Required Software
|
|
|
|
- **Windows 11** with WSL2 enabled
|
|
- **Ubuntu 24.04** on WSL2
|
|
- **Docker Desktop for Windows** with WSL2 backend
|
|
- **NVIDIA GPU** with CUDA support (RTX series recommended)
|
|
- **NVIDIA Driver** for Windows (latest version)
|
|
|
|
### System Requirements
|
|
|
|
- Windows 11 Build 22000 or higher
|
|
- 16GB RAM minimum (32GB recommended for larger models)
|
|
- 50GB+ free disk space for models
|
|
- NVIDIA GPU with 8GB+ VRAM
|
|
|
|
## 🪟 WSL2 Setup
|
|
|
|
### 1. Enable WSL2 (if not already done)
|
|
|
|
```powershell
|
|
# Run in PowerShell as Administrator
|
|
wsl --install
|
|
wsl --set-default-version 2
|
|
|
|
# Install Ubuntu 24.04
|
|
wsl --install -d Ubuntu-24.04
|
|
|
|
# Verify WSL2 is active
|
|
wsl --list --verbose
|
|
```
|
|
|
|
### 2. Install Docker Desktop for Windows
|
|
|
|
1. Download from [Docker Desktop](https://www.docker.com/products/docker-desktop)
|
|
2. Install and enable **WSL2 backend** in settings
|
|
3. Enable integration with Ubuntu-24.04 distro in: Settings → Resources → WSL Integration
|
|
|
|
### 3. Verify GPU Support in WSL2
|
|
|
|
```bash
|
|
# Open WSL2 Ubuntu terminal
|
|
wsl
|
|
|
|
# Check NVIDIA driver
|
|
nvidia-smi
|
|
|
|
# You should see your GPU listed
|
|
```
|
|
|
|
**Important**: You do NOT need to install NVIDIA Container Toolkit in WSL2. Docker Desktop handles GPU passthrough automatically.
|
|
|
|
### 4. Test Docker GPU Access
|
|
|
|
```bash
|
|
# In WSL2 terminal
|
|
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
|
|
```
|
|
|
|
If this works, you're ready to go! 🎉
|
|
|
|
## 🚀 Installation
|
|
|
|
### 1. Create Project Structure in WSL2
|
|
|
|
```bash
|
|
# Open WSL2 terminal
|
|
wsl
|
|
|
|
# Create project directory
|
|
mkdir -p ~/ollama-docker
|
|
cd ~/ollama-docker
|
|
```
|
|
|
|
### 2. Create `docker-compose.yml`
|
|
|
|
Use the provided `docker-compose.yml` file with the WSL2 path:
|
|
- Windows path: `E:\volumes\ollama\data`
|
|
- WSL2 path: `/mnt/e/volumes/ollama/data`
|
|
|
|
### 3. Create Volume Directory
|
|
|
|
```bash
|
|
# From WSL2 terminal
|
|
sudo mkdir -p /mnt/e/volumes/ollama/data
|
|
|
|
# Or from Windows PowerShell
|
|
mkdir E:\volumes\ollama\data
|
|
```
|
|
|
|
## ▶️ Starting Ollama
|
|
|
|
```bash
|
|
# Navigate to project directory
|
|
cd ~/ollama-docker
|
|
|
|
# Start the service
|
|
docker compose up -d
|
|
|
|
# Check logs
|
|
docker compose logs -f ollama
|
|
|
|
# Verify service is running
|
|
curl http://localhost:11434
|
|
```
|
|
|
|
Expected response: `Ollama is running`
|
|
|
|
### Access from Windows
|
|
|
|
Ollama is accessible from both WSL2 and Windows:
|
|
- **WSL2**: `http://localhost:11434`
|
|
- **Windows**: `http://localhost:11434`
|
|
|
|
## 📦 Model Management
|
|
|
|
### List Available Models
|
|
|
|
```bash
|
|
# Inside container
|
|
docker exec -it ollama ollama list
|
|
|
|
# Or from WSL2 (if ollama CLI installed)
|
|
ollama list
|
|
```
|
|
|
|
### Pull/Download Models
|
|
|
|
```bash
|
|
# Pull a model
|
|
docker exec -it ollama ollama pull llama3.2
|
|
|
|
# Popular models
|
|
docker exec -it ollama ollama pull mistral
|
|
docker exec -it ollama ollama pull codellama
|
|
docker exec -it ollama ollama pull phi3
|
|
docker exec -it ollama ollama pull llama3.2:70b
|
|
```
|
|
|
|
### Model Sizes Reference
|
|
|
|
| Model | Parameters | Size | RAM Required | VRAM Required |
|
|
|-------|-----------|------|--------------|---------------|
|
|
| `phi3` | 3.8B | ~2.3 GB | 8 GB | 4 GB |
|
|
| `llama3.2` | 8B | ~4.7 GB | 8 GB | 6 GB |
|
|
| `mistral` | 7B | ~4.1 GB | 8 GB | 6 GB |
|
|
| `llama3.2:70b` | 70B | ~40 GB | 64 GB | 48 GB |
|
|
| `codellama` | 7B | ~3.8 GB | 8 GB | 6 GB |
|
|
|
|
### Remove/Unload Models
|
|
|
|
```bash
|
|
# Remove a model from disk
|
|
docker exec -it ollama ollama rm llama3.2
|
|
|
|
# Stop a running model (unload from memory)
|
|
docker exec -it ollama ollama stop llama3.2
|
|
|
|
# Show running models
|
|
docker exec -it ollama ollama ps
|
|
```
|
|
|
|
### Copy Models Between Systems
|
|
|
|
```bash
|
|
# Export model
|
|
docker exec ollama ollama show llama3.2 --modelfile > Modelfile
|
|
|
|
# Import on another system
|
|
cat Modelfile | docker exec -i ollama ollama create my-model -f -
|
|
```
|
|
|
|
## 💡 Usage Examples
|
|
|
|
### Interactive Chat
|
|
|
|
```bash
|
|
# Start interactive session
|
|
docker exec -it ollama ollama run llama3.2
|
|
|
|
# Chat with specific model
|
|
docker exec -it ollama ollama run mistral "Explain quantum computing"
|
|
```
|
|
|
|
### Using the API
|
|
|
|
#### Generate Completion
|
|
|
|
```bash
|
|
curl http://localhost:11434/api/generate -d '{
|
|
"model": "llama3.2",
|
|
"prompt": "Why is the sky blue?",
|
|
"stream": false
|
|
}'
|
|
```
|
|
|
|
#### Chat Completion
|
|
|
|
```bash
|
|
curl http://localhost:11434/api/chat -d '{
|
|
"model": "llama3.2",
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "Hello! Can you help me with Python?"
|
|
}
|
|
],
|
|
"stream": false
|
|
}'
|
|
```
|
|
|
|
#### Streaming Response
|
|
|
|
```bash
|
|
curl http://localhost:11434/api/generate -d '{
|
|
"model": "llama3.2",
|
|
"prompt": "Write a haiku about programming",
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
### Python Example (from Windows or WSL2)
|
|
|
|
```python
|
|
import requests
|
|
import json
|
|
|
|
def chat_with_ollama(prompt, model="llama3.2"):
|
|
url = "http://localhost:11434/api/generate"
|
|
payload = {
|
|
"model": model,
|
|
"prompt": prompt,
|
|
"stream": False
|
|
}
|
|
|
|
response = requests.post(url, json=payload)
|
|
return response.json()["response"]
|
|
|
|
# Usage
|
|
result = chat_with_ollama("What is Docker?")
|
|
print(result)
|
|
```
|
|
|
|
### JavaScript Example (from Windows or WSL2)
|
|
|
|
```javascript
|
|
async function chatWithOllama(prompt, model = "llama3.2") {
|
|
const response = await fetch("http://localhost:11434/api/generate", {
|
|
method: "POST",
|
|
headers: { "Content-Type": "application/json" },
|
|
body: JSON.stringify({
|
|
model: model,
|
|
prompt: prompt,
|
|
stream: false
|
|
})
|
|
});
|
|
|
|
const data = await response.json();
|
|
return data.response;
|
|
}
|
|
|
|
// Usage
|
|
chatWithOllama("Explain REST APIs").then(console.log);
|
|
```
|
|
|
|
## 🔌 API Reference
|
|
|
|
### Main Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/api/generate` | POST | Generate text completion |
|
|
| `/api/chat` | POST | Chat completion with conversation history |
|
|
| `/api/tags` | GET | List available models |
|
|
| `/api/pull` | POST | Download a model |
|
|
| `/api/push` | POST | Upload a custom model |
|
|
| `/api/embeddings` | POST | Generate embeddings |
|
|
|
|
### Generate Parameters
|
|
|
|
```json
|
|
{
|
|
"model": "llama3.2",
|
|
"prompt": "Your prompt here",
|
|
"stream": false,
|
|
"options": {
|
|
"temperature": 0.7,
|
|
"top_p": 0.9,
|
|
"top_k": 40,
|
|
"num_predict": 128,
|
|
"stop": ["\n"]
|
|
}
|
|
}
|
|
```
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Container Won't Start
|
|
|
|
```bash
|
|
# Check logs
|
|
docker compose logs ollama
|
|
|
|
# Common issues:
|
|
# 1. GPU not accessible
|
|
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
|
|
|
|
# 2. Port already in use
|
|
netstat -ano | findstr :11434 # From Windows PowerShell
|
|
ss -tulpn | grep 11434 # From WSL2
|
|
```
|
|
|
|
### GPU Not Detected in WSL2
|
|
|
|
```powershell
|
|
# Update NVIDIA driver (from Windows)
|
|
# Download latest driver from: https://www.nvidia.com/Download/index.aspx
|
|
|
|
# Restart WSL2 (from PowerShell)
|
|
wsl --shutdown
|
|
wsl
|
|
|
|
# Verify GPU
|
|
nvidia-smi
|
|
```
|
|
|
|
### Model Download Fails
|
|
|
|
```bash
|
|
# Check disk space
|
|
docker exec ollama df -h /root/.ollama
|
|
|
|
# Check WSL2 disk space
|
|
df -h /mnt/e
|
|
|
|
# Retry with verbose logging
|
|
docker exec -it ollama ollama pull llama3.2 --verbose
|
|
```
|
|
|
|
### Out of Memory Errors
|
|
|
|
```bash
|
|
# Check GPU memory
|
|
nvidia-smi
|
|
|
|
# Use smaller model or reduce context
|
|
docker exec ollama ollama run llama3.2 --num-ctx 2048
|
|
```
|
|
|
|
### WSL2 Disk Space Issues
|
|
|
|
```powershell
|
|
# Compact WSL2 virtual disk (from PowerShell as Admin)
|
|
wsl --shutdown
|
|
Optimize-VHD -Path "$env:LOCALAPPDATA\Packages\CanonicalGroupLimited.Ubuntu24.04LTS_*\LocalState\ext4.vhdx" -Mode Full
|
|
```
|
|
|
|
### Docker Desktop Integration Issues
|
|
|
|
1. Open Docker Desktop
|
|
2. Go to **Settings → Resources → WSL Integration**
|
|
3. Enable integration with **Ubuntu-24.04**
|
|
4. Click **Apply & Restart**
|
|
|
|
### Permission Denied on Volume
|
|
|
|
```bash
|
|
# From WSL2
|
|
sudo chmod -R 755 /mnt/e/volumes/ollama/data
|
|
```
|
|
|
|
## ⚡ Performance Tips
|
|
|
|
### 1. WSL2 Memory Configuration
|
|
|
|
Create/edit `.wslconfig` in Windows user directory (`C:\Users\YourName\.wslconfig`):
|
|
|
|
```ini
|
|
[wsl2]
|
|
memory=16GB
|
|
processors=8
|
|
swap=8GB
|
|
```
|
|
|
|
Apply changes:
|
|
```powershell
|
|
wsl --shutdown
|
|
wsl
|
|
```
|
|
|
|
### 2. GPU Memory Optimization
|
|
|
|
```yaml
|
|
# In docker-compose.yml
|
|
environment:
|
|
- CUDA_VISIBLE_DEVICES=0
|
|
- OLLAMA_NUM_GPU=1
|
|
```
|
|
|
|
### 3. Concurrent Requests
|
|
|
|
```yaml
|
|
# In docker-compose.yml
|
|
environment:
|
|
- OLLAMA_MAX_LOADED_MODELS=3
|
|
- OLLAMA_NUM_PARALLEL=4
|
|
```
|
|
|
|
### 4. Context Window
|
|
|
|
```bash
|
|
# Reduce for faster responses
|
|
docker exec ollama ollama run llama3.2 --num-ctx 2048
|
|
|
|
# Increase for longer conversations
|
|
docker exec ollama ollama run llama3.2 --num-ctx 8192
|
|
```
|
|
|
|
### 5. Model Quantization
|
|
|
|
Use quantized models for better performance:
|
|
```bash
|
|
# 4-bit quantization (faster, less accurate)
|
|
docker exec ollama ollama pull llama3.2:q4_0
|
|
|
|
# 8-bit quantization (balanced)
|
|
docker exec ollama ollama pull llama3.2:q8_0
|
|
```
|
|
|
|
### 6. Store Models on SSD
|
|
|
|
For best performance, ensure `E:\volumes` is on an SSD, not HDD.
|
|
|
|
## 📊 Monitoring
|
|
|
|
### Check Resource Usage
|
|
|
|
```bash
|
|
# Container stats
|
|
docker stats ollama
|
|
|
|
# GPU utilization (from WSL2 or Windows)
|
|
nvidia-smi
|
|
|
|
# Continuous monitoring
|
|
watch -n 1 nvidia-smi
|
|
```
|
|
|
|
### Model Status
|
|
|
|
```bash
|
|
# Show running models
|
|
docker exec ollama ollama ps
|
|
|
|
# Model information
|
|
docker exec ollama ollama show llama3.2
|
|
```
|
|
|
|
### WSL2 Resource Usage
|
|
|
|
```powershell
|
|
# From Windows PowerShell
|
|
wsl --list --verbose
|
|
```
|
|
|
|
## 🛑 Stopping and Cleanup
|
|
|
|
```bash
|
|
# Stop service
|
|
docker compose down
|
|
|
|
# Stop and remove volumes
|
|
docker compose down -v
|
|
|
|
# Remove all models
|
|
docker exec ollama sh -c "rm -rf /root/.ollama/models/*"
|
|
|
|
# Shutdown WSL2 (from Windows PowerShell)
|
|
wsl --shutdown
|
|
```
|
|
|
|
## 🔗 Useful Links
|
|
|
|
- [Ollama Official Documentation](https://github.com/ollama/ollama)
|
|
- [Ollama Model Library](https://ollama.com/library)
|
|
- [API Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md)
|
|
- [WSL2 GPU Documentation](https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute)
|
|
- [Docker Desktop WSL2 Backend](https://docs.docker.com/desktop/wsl/)
|
|
|
|
## 🎯 Quick Reference
|
|
|
|
### Common Commands
|
|
|
|
```bash
|
|
# Start Ollama
|
|
docker compose up -d
|
|
|
|
# Pull a model
|
|
docker exec -it ollama ollama pull llama3.2
|
|
|
|
# Run interactive chat
|
|
docker exec -it ollama ollama run llama3.2
|
|
|
|
# List models
|
|
docker exec -it ollama ollama list
|
|
|
|
# Check GPU
|
|
nvidia-smi
|
|
|
|
# Stop Ollama
|
|
docker compose down
|
|
```
|
|
|
|
## 📝 Notes for WSL2 Users
|
|
|
|
- **Path Conversion**: Windows `E:\folder` = WSL2 `/mnt/e/folder`
|
|
- **Performance**: Models stored on Windows drives are accessible but slightly slower
|
|
- **GPU Passthrough**: Handled automatically by Docker Desktop
|
|
- **Networking**: `localhost` works from both Windows and WSL2
|
|
- **Memory**: Configure WSL2 memory in `.wslconfig` for large models
|
|
|
|
---
|
|
|
|
**Need help?** Open an issue or check the [Ollama Discord](https://discord.gg/ollama) |