Added whisper

2025-10-11 23:12:43 +02:00
parent d77c0ba525
commit 2b218eeddc
2 changed files with 0 additions and 574 deletions
@@ -1,541 +0,0 @@
-# Ollama Docker Setup 🦙 (WSL2 + Windows 11)
-
-Complete guide for running Ollama with Docker Compose and GPU acceleration on WSL2.
-
-## 📋 Table of Contents
-
- [Prerequisites](#prerequisites)
- [WSL2 Setup](#wsl2-setup)
- [Installation](#installation)
- [Starting Ollama](#starting-ollama)
- [Model Management](#model-management)
- [Usage Examples](#usage-examples)
- [API Reference](#api-reference)
- [Troubleshooting](#troubleshooting)
- [Performance Tips](#performance-tips)
-
-## 🔧 Prerequisites
-
-### Required Software
-
- **Windows 11** with WSL2 enabled
- **Ubuntu 24.04** on WSL2
- **Docker Desktop for Windows** with WSL2 backend
- **NVIDIA GPU** with CUDA support (RTX series recommended)
- **NVIDIA Driver** for Windows (latest version)
-
-### System Requirements
-
- Windows 11 Build 22000 or higher
- 16GB RAM minimum (32GB recommended for larger models)
- 50GB+ free disk space for models
- NVIDIA GPU with 8GB+ VRAM
-
-## 🪟 WSL2 Setup
-
-### 1. Enable WSL2 (if not already done)
-
-```powershell
-# Run in PowerShell as Administrator
-wsl --install
-wsl --set-default-version 2
-
-# Install Ubuntu 24.04
-wsl --install -d Ubuntu-24.04
-
-# Verify WSL2 is active
-wsl --list --verbose
-```
-
-### 2. Install Docker Desktop for Windows
-
-1. Download from [Docker Desktop](https://www.docker.com/products/docker-desktop)
-2. Install and enable **WSL2 backend** in settings
-3. Enable integration with Ubuntu-24.04 distro in: Settings → Resources → WSL Integration
-
-### 3. Verify GPU Support in WSL2
-
-```bash
-# Open WSL2 Ubuntu terminal
-wsl
-
-# Check NVIDIA driver
-nvidia-smi
-
-# You should see your GPU listed
-```
-
-**Important**: You do NOT need to install NVIDIA Container Toolkit in WSL2. Docker Desktop handles GPU passthrough automatically.
-
-### 4. Test Docker GPU Access
-
-```bash
-# In WSL2 terminal
-docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
-```
-
-If this works, you're ready to go! 🎉
-
-## 🚀 Installation
-
-### 1. Create Project Structure in WSL2
-
-```bash
-# Open WSL2 terminal
-wsl
-
-# Create project directory
-mkdir -p ~/ollama-docker
-cd ~/ollama-docker
-```
-
-### 2. Create `docker-compose.yml`
-
-Use the provided `docker-compose.yml` file with the WSL2 path:
- Windows path: `E:\volumes\ollama\data`
- WSL2 path: `/mnt/e/volumes/ollama/data`
-
-### 3. Create Volume Directory
-
-```bash
-# From WSL2 terminal
-sudo mkdir -p /mnt/e/volumes/ollama/data
-
-# Or from Windows PowerShell
-mkdir E:\volumes\ollama\data
-```
-
-## ▶️ Starting Ollama
-
-```bash
-# Navigate to project directory
-cd ~/ollama-docker
-
-# Start the service
-docker compose up -d
-
-# Check logs
-docker compose logs -f ollama
-
-# Verify service is running
-curl http://localhost:11434
-```
-
-Expected response: `Ollama is running`
-
-### Access from Windows
-
-Ollama is accessible from both WSL2 and Windows:
- **WSL2**: `http://localhost:11434`
- **Windows**: `http://localhost:11434`
-
-## 📦 Model Management
-
-### List Available Models
-
-```bash
-# Inside container
-docker exec -it ollama ollama list
-
-# Or from WSL2 (if ollama CLI installed)
-ollama list
-```
-
-### Pull/Download Models
-
-```bash
-# Pull a model
-docker exec -it ollama ollama pull llama3.2
-
-# Popular models
-docker exec -it ollama ollama pull mistral
-docker exec -it ollama ollama pull codellama
-docker exec -it ollama ollama pull phi3
-docker exec -it ollama ollama pull llama3.2:70b
-```
-
-### Model Sizes Reference
-
-| Model | Parameters | Size | RAM Required | VRAM Required |
-|-------|-----------|------|--------------|---------------|
-| `phi3` | 3.8B | ~2.3 GB | 8 GB | 4 GB |
-| `llama3.2` | 8B | ~4.7 GB | 8 GB | 6 GB |
-| `mistral` | 7B | ~4.1 GB | 8 GB | 6 GB |
-| `llama3.2:70b` | 70B | ~40 GB | 64 GB | 48 GB |
-| `codellama` | 7B | ~3.8 GB | 8 GB | 6 GB |
-
-### Remove/Unload Models
-
-```bash
-# Remove a model from disk
-docker exec -it ollama ollama rm llama3.2
-
-# Stop a running model (unload from memory)
-docker exec -it ollama ollama stop llama3.2
-
-# Show running models
-docker exec -it ollama ollama ps
-```
-
-### Copy Models Between Systems
-
-```bash
-# Export model
-docker exec ollama ollama show llama3.2 --modelfile > Modelfile
-
-# Import on another system
-cat Modelfile | docker exec -i ollama ollama create my-model -f -
-```
-
-## 💡 Usage Examples
-
-### Interactive Chat
-
-```bash
-# Start interactive session
-docker exec -it ollama ollama run llama3.2
-
-# Chat with specific model
-docker exec -it ollama ollama run mistral "Explain quantum computing"
-```
-
-### Using the API
-
-#### Generate Completion
-
-```bash
-curl http://localhost:11434/api/generate -d '{
-  "model": "llama3.2",
-  "prompt": "Why is the sky blue?",
-  "stream": false
-}'
-```
-
-#### Chat Completion
-
-```bash
-curl http://localhost:11434/api/chat -d '{
-  "model": "llama3.2",
-  "messages": [
-    {
-      "role": "user",
-      "content": "Hello! Can you help me with Python?"
-    }
-  ],
-  "stream": false
-}'
-```
-
-#### Streaming Response
-
-```bash
-curl http://localhost:11434/api/generate -d '{
-  "model": "llama3.2",
-  "prompt": "Write a haiku about programming",
-  "stream": true
-}'
-```
-
-### Python Example (from Windows or WSL2)
-
-```python
-import requests
-import json
-
-def chat_with_ollama(prompt, model="llama3.2"):
-    url = "http://localhost:11434/api/generate"
-    payload = {
-        "model": model,
-        "prompt": prompt,
-        "stream": False
-    }
-    
-    response = requests.post(url, json=payload)
-    return response.json()["response"]
-
-# Usage
-result = chat_with_ollama("What is Docker?")
-print(result)
-```
-
-### JavaScript Example (from Windows or WSL2)
-
-```javascript
-async function chatWithOllama(prompt, model = "llama3.2") {
-    const response = await fetch("http://localhost:11434/api/generate", {
-        method: "POST",
-        headers: { "Content-Type": "application/json" },
-        body: JSON.stringify({
-            model: model,
-            prompt: prompt,
-            stream: false
-        })
-    });
-    
-    const data = await response.json();
-    return data.response;
-}
-
-// Usage
-chatWithOllama("Explain REST APIs").then(console.log);
-```
-
-## 🔌 API Reference
-
-### Main Endpoints
-
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/api/generate` | POST | Generate text completion |
-| `/api/chat` | POST | Chat completion with conversation history |
-| `/api/tags` | GET | List available models |
-| `/api/pull` | POST | Download a model |
-| `/api/push` | POST | Upload a custom model |
-| `/api/embeddings` | POST | Generate embeddings |
-
-### Generate Parameters
-
-```json
-{
-  "model": "llama3.2",
-  "prompt": "Your prompt here",
-  "stream": false,
-  "options": {
-    "temperature": 0.7,
-    "top_p": 0.9,
-    "top_k": 40,
-    "num_predict": 128,
-    "stop": ["\n"]
-  }
-}
-```
-
-## 🐛 Troubleshooting
-
-### Container Won't Start
-
-```bash
-# Check logs
-docker compose logs ollama
-
-# Common issues:
-# 1. GPU not accessible
-docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
-
-# 2. Port already in use
-netstat -ano | findstr :11434  # From Windows PowerShell
-ss -tulpn | grep 11434         # From WSL2
-```
-
-### GPU Not Detected in WSL2
-
-```powershell
-# Update NVIDIA driver (from Windows)
-# Download latest driver from: https://www.nvidia.com/Download/index.aspx
-
-# Restart WSL2 (from PowerShell)
-wsl --shutdown
-wsl
-
-# Verify GPU
-nvidia-smi
-```
-
-### Model Download Fails
-
-```bash
-# Check disk space
-docker exec ollama df -h /root/.ollama
-
-# Check WSL2 disk space
-df -h /mnt/e
-
-# Retry with verbose logging
-docker exec -it ollama ollama pull llama3.2 --verbose
-```
-
-### Out of Memory Errors
-
-```bash
-# Check GPU memory
-nvidia-smi
-
-# Use smaller model or reduce context
-docker exec ollama ollama run llama3.2 --num-ctx 2048
-```
-
-### WSL2 Disk Space Issues
-
-```powershell
-# Compact WSL2 virtual disk (from PowerShell as Admin)
-wsl --shutdown
-Optimize-VHD -Path "$env:LOCALAPPDATA\Packages\CanonicalGroupLimited.Ubuntu24.04LTS_*\LocalState\ext4.vhdx" -Mode Full
-```
-
-### Docker Desktop Integration Issues
-
-1. Open Docker Desktop
-2. Go to **Settings → Resources → WSL Integration**
-3. Enable integration with **Ubuntu-24.04**
-4. Click **Apply & Restart**
-
-### Permission Denied on Volume
-
-```bash
-# From WSL2
-sudo chmod -R 755 /mnt/e/volumes/ollama/data
-```
-
-## ⚡ Performance Tips
-
-### 1. WSL2 Memory Configuration
-
-Create/edit `.wslconfig` in Windows user directory (`C:\Users\YourName\.wslconfig`):
-
-```ini
-[wsl2]
-memory=16GB
-processors=8
-swap=8GB
-```
-
-Apply changes:
-```powershell
-wsl --shutdown
-wsl
-```
-
-### 2. GPU Memory Optimization
-
-```yaml
-# In docker-compose.yml
-environment:
-  - CUDA_VISIBLE_DEVICES=0
-  - OLLAMA_NUM_GPU=1
-```
-
-### 3. Concurrent Requests
-
-```yaml
-# In docker-compose.yml
-environment:
-  - OLLAMA_MAX_LOADED_MODELS=3
-  - OLLAMA_NUM_PARALLEL=4
-```
-
-### 4. Context Window
-
-```bash
-# Reduce for faster responses
-docker exec ollama ollama run llama3.2 --num-ctx 2048
-
-# Increase for longer conversations
-docker exec ollama ollama run llama3.2 --num-ctx 8192
-```
-
-### 5. Model Quantization
-
-Use quantized models for better performance:
-```bash
-# 4-bit quantization (faster, less accurate)
-docker exec ollama ollama pull llama3.2:q4_0
-
-# 8-bit quantization (balanced)
-docker exec ollama ollama pull llama3.2:q8_0
-```
-
-### 6. Store Models on SSD
-
-For best performance, ensure `E:\volumes` is on an SSD, not HDD.
-
-## 📊 Monitoring
-
-### Check Resource Usage
-
-```bash
-# Container stats
-docker stats ollama
-
-# GPU utilization (from WSL2 or Windows)
-nvidia-smi
-
-# Continuous monitoring
-watch -n 1 nvidia-smi
-```
-
-### Model Status
-
-```bash
-# Show running models
-docker exec ollama ollama ps
-
-# Model information
-docker exec ollama ollama show llama3.2
-```
-
-### WSL2 Resource Usage
-
-```powershell
-# From Windows PowerShell
-wsl --list --verbose
-```
-
-## 🛑 Stopping and Cleanup
-
-```bash
-# Stop service
-docker compose down
-
-# Stop and remove volumes
-docker compose down -v
-
-# Remove all models
-docker exec ollama sh -c "rm -rf /root/.ollama/models/*"
-
-# Shutdown WSL2 (from Windows PowerShell)
-wsl --shutdown
-```
-
-## 🔗 Useful Links
-
- [Ollama Official Documentation](https://github.com/ollama/ollama)
- [Ollama Model Library](https://ollama.com/library)
- [API Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md)
- [WSL2 GPU Documentation](https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute)
- [Docker Desktop WSL2 Backend](https://docs.docker.com/desktop/wsl/)
-
-## 🎯 Quick Reference
-
-### Common Commands
-
-```bash
-# Start Ollama
-docker compose up -d
-
-# Pull a model
-docker exec -it ollama ollama pull llama3.2
-
-# Run interactive chat
-docker exec -it ollama ollama run llama3.2
-
-# List models
-docker exec -it ollama ollama list
-
-# Check GPU
-nvidia-smi
-
-# Stop Ollama
-docker compose down
-```
-
-## 📝 Notes for WSL2 Users
-
- **Path Conversion**: Windows `E:\folder` = WSL2 `/mnt/e/folder`
- **Performance**: Models stored on Windows drives are accessible but slightly slower
- **GPU Passthrough**: Handled automatically by Docker Desktop
- **Networking**: `localhost` works from both Windows and WSL2
- **Memory**: Configure WSL2 memory in `.wslconfig` for large models
-
---
-
-**Need help?** Open an issue or check the [Ollama Discord](https://discord.gg/ollama)
@@ -1,33 +0,0 @@
-services:
-  ollama:
-    image: ollama/ollama:latest
-    container_name: ollama
-    restart: unless-stopped
-    ports:
-      - "11434:11434"
-    volumes:
-      - /mnt/e/volumes/ollama/data:/root/.ollama
-    environment:
-      - OLLAMA_HOST=0.0.0.0:11434
-      # Optional: Set GPU device if you have multiple GPUs
-      # - NVIDIA_VISIBLE_DEVICES=0
-    command: serve
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: 1
-              capabilities: [gpu]
-    networks:
-      - app-network
-    healthcheck:
-      test: ["CMD", "ollama", "list"]
-      interval: 30s
-      timeout: 10s
-      retries: 3
-      start_period: 40s
-
-networks:
-  app-network:
-    driver: bridge