diff --git a/Readme_FasterWhisper.md b/Readme_FasterWhisper.md new file mode 100644 index 0000000..c5e05b4 --- /dev/null +++ b/Readme_FasterWhisper.md @@ -0,0 +1,302 @@ +# Faster Whisper - Audio Transcription Service + +Audio transcription service using Faster Whisper with GPU acceleration (NVIDIA). + +## πŸ“‹ Prerequisites + +- Windows with WSL2 (Ubuntu 24.04) +- Docker Desktop for Windows with WSL2 backend +- NVIDIA GPU with drivers installed on Windows +- NVIDIA Container Toolkit configured in WSL2 +- Access to mounted volumes (`/mnt/e/volumes/faster-whisper/`) + +### WSL2 GPU Setup + +Ensure your WSL2 Ubuntu has access to the NVIDIA GPU: + +```bash +# Check GPU availability in WSL2 +nvidia-smi + +# If not available, install NVIDIA Container Toolkit in WSL2 +distribution=$(. /etc/os-release;echo $ID$VERSION_ID) +curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - +curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list + +sudo apt-get update +sudo apt-get install -y nvidia-container-toolkit +sudo systemctl restart docker +``` + +## πŸš€ Quick Start + +```bash +# Start the service +docker compose up -d faster-whisper + +# Check logs +docker logs faster-whisper -f + +# Stop the service +docker compose down +``` + +## βš™οΈ Configuration + +### Environment Variables + +| Variable | Value | Description | +|----------|-------|-------------| +| `PUID` | 1000 | User ID for file permissions | +| `PGID` | 1000 | Group ID for file permissions | +| `TZ` | Europe/Paris | Timezone | +| `WHISPER_MODEL` | turbo | Model to use (tiny, base, small, medium, large, turbo) | +| `WHISPER_LANG` | fr | Transcription language | +| `WHISPER_BEAM` | 5 | Beam search size (1-10, accuracy vs speed tradeoff) | + +### Available Models + +| Model | Size | VRAM | Speed | Accuracy | +|-------|------|------|-------|----------| +| `tiny` | ~75 MB | ~1 GB | Very fast | Low | +| `base` | ~142 MB | ~1 GB | Fast | Medium | +| `small` | ~466 MB | ~2 GB | Medium | Good | +| `medium` | ~1.5 GB | ~5 GB | Slow | Very good | +| `large` | ~2.9 GB | ~10 GB | Very slow | Excellent | +| `turbo` | ~809 MB | ~6 GB | Fast | Excellent | + +> **Note:** The `turbo` model is an excellent compromise for RTX 4060 Ti (8 GB VRAM). + +### Volumes + +- `/mnt/e/volumes/faster-whisper/audio` β†’ `/app` : Audio files directory to transcribe +- `/mnt/e/volumes/faster-whisper/models` β†’ `/root/.cache/whisper` : Downloaded models cache + +> **Windows Note:** The path `/mnt/e/` in WSL2 corresponds to `E:\` drive on Windows. + +## 🎯 Usage + +### REST API + +The service exposes a REST API on port **10300**. + +#### Transcribe an audio file + +```bash +# Place the file in /mnt/e/volumes/faster-whisper/audio/ +# Or on Windows: E:\volumes\faster-whisper\audio\ + +# From WSL2: +curl -X POST http://localhost:10300/transcribe \ + -F "file=@audio.mp3" + +# From Windows PowerShell: +curl.exe -X POST http://localhost:10300/transcribe -F "file=@audio.mp3" +``` + +#### Check service status + +```bash +curl http://localhost:10300/health +``` + +### Web Interface + +Access the web interface: `http://localhost:10300` + +The interface is accessible from both Windows and WSL2. + +## πŸ”§ Administration + +### Check GPU Usage + +```bash +# From WSL2 host +nvidia-smi + +# From inside the container +docker exec faster-whisper nvidia-smi + +# Monitor GPU in real-time +watch -n 1 nvidia-smi +``` + +### Update the Image + +```bash +docker compose pull faster-whisper +docker compose up -d faster-whisper +``` + +### Change Model + +1. Edit `WHISPER_MODEL` in docker-compose.yml +2. Restart the container: + ```bash + docker compose up -d faster-whisper + ``` + +The new model will be downloaded automatically on first startup. + +### Performance Optimization + +#### Adjust Beam Search + +- `WHISPER_BEAM=1`: Maximum speed, reduced accuracy +- `WHISPER_BEAM=5`: Good compromise (default) +- `WHISPER_BEAM=10`: Maximum accuracy, slower + +#### Monitor Memory Usage + +```bash +docker stats faster-whisper +``` + +### Clean Old Models + +Models are stored in `/mnt/e/volumes/faster-whisper/models/` (WSL2) or `E:\volumes\faster-whisper\models\` (Windows). + +```bash +# From WSL2 - List downloaded models +ls -lh /mnt/e/volumes/faster-whisper/models/ + +# Delete an unused model +rm -rf /mnt/e/volumes/faster-whisper/models/ +``` + +```powershell +# From Windows PowerShell +Get-ChildItem E:\volumes\faster-whisper\models\ + +# Delete an unused model +Remove-Item -Recurse E:\volumes\faster-whisper\models\ +``` + +## πŸ“Š Monitoring + +### Real-time Logs + +```bash +docker logs faster-whisper -f --tail 100 +``` + +### Check Container Status + +```bash +docker ps | grep faster-whisper +``` + +### Restart on Issues + +```bash +docker restart faster-whisper +``` + +## πŸ› Troubleshooting + +### Container Won't Start + +1. Verify NVIDIA Container Toolkit is installed in WSL2: + ```bash + docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi + ``` + +2. Check permissions on volumes: + ```bash + ls -la /mnt/e/volumes/faster-whisper/ + ``` + +3. Ensure Docker Desktop WSL2 integration is enabled: + - Open Docker Desktop β†’ Settings β†’ Resources β†’ WSL Integration + - Enable integration with Ubuntu-24.04 + +### "Out of Memory" Error + +- Reduce the model (e.g., from `turbo` to `small`) +- Reduce `WHISPER_BEAM` to 3 or 1 +- Close other GPU-intensive applications on Windows +- Check GPU memory usage: `nvidia-smi` + +### Poor Transcription Quality + +- Increase the model (e.g., from `small` to `turbo`) +- Increase `WHISPER_BEAM` to 7 or 10 +- Check audio quality of source file +- Verify the correct language is set in `WHISPER_LANG` + +### WSL2 Specific Issues + +#### GPU Not Detected + +```bash +# Check Windows GPU driver version (from PowerShell) +nvidia-smi + +# Update WSL2 kernel +wsl --update + +# Restart WSL2 +wsl --shutdown +# Then reopen Ubuntu +``` + +#### Volume Access Issues + +```bash +# Check if drive is mounted in WSL2 +ls /mnt/e/ + +# If not mounted, add to /etc/wsl.conf +sudo nano /etc/wsl.conf + +# Add these lines: +[automount] +enabled = true +options = "metadata,uid=1000,gid=1000" + +# Restart WSL2 +wsl --shutdown +``` + +## πŸ“ File Structure + +``` +Windows: E:\volumes\faster-whisper\ +WSL2: /mnt/e/volumes/faster-whisper/ +β”œβ”€β”€ audio/ # Audio files to transcribe +└── models/ # Whisper models cache +``` + +## πŸͺŸ Windows Integration + +### Access Files from Windows Explorer + +- Navigate to `\\wsl$\Ubuntu-24.04\mnt\e\volumes\faster-whisper\` +- Or directly to `E:\volumes\faster-whisper\` + +### Copy Files to Transcribe + +From Windows: +```powershell +Copy-Item "C:\path\to\audio.mp3" -Destination "E:\volumes\faster-whisper\audio\" +``` + +From WSL2: +```bash +cp /mnt/c/path/to/audio.mp3 /mnt/e/volumes/faster-whisper/audio/ +``` + +## πŸ”— Useful Links + +- [LinuxServer Docker Image Documentation](https://docs.linuxserver.io/images/docker-faster-whisper) +- [Faster Whisper GitHub](https://github.com/SYSTRAN/faster-whisper) +- [OpenAI Whisper Documentation](https://github.com/openai/whisper) +- [WSL2 GPU Support](https://docs.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute) + +## πŸ“ Notes + +- Service automatically restarts unless manually stopped (`restart: unless-stopped`) +- On first startup, the model will be downloaded (may take a few minutes) +- Supported audio formats: MP3, WAV, M4A, FLAC, OGG, etc. +- The service runs in WSL2 but is accessible from Windows +- GPU computations are performed on the Windows NVIDIA GPU \ No newline at end of file diff --git a/Readme_Ollama.md b/Readme_Ollama.md new file mode 100644 index 0000000..b4055ab --- /dev/null +++ b/Readme_Ollama.md @@ -0,0 +1,541 @@ +# Ollama Docker Setup πŸ¦™ (WSL2 + Windows 11) + +Complete guide for running Ollama with Docker Compose and GPU acceleration on WSL2. + +## πŸ“‹ Table of Contents + +- [Prerequisites](#prerequisites) +- [WSL2 Setup](#wsl2-setup) +- [Installation](#installation) +- [Starting Ollama](#starting-ollama) +- [Model Management](#model-management) +- [Usage Examples](#usage-examples) +- [API Reference](#api-reference) +- [Troubleshooting](#troubleshooting) +- [Performance Tips](#performance-tips) + +## πŸ”§ Prerequisites + +### Required Software + +- **Windows 11** with WSL2 enabled +- **Ubuntu 24.04** on WSL2 +- **Docker Desktop for Windows** with WSL2 backend +- **NVIDIA GPU** with CUDA support (RTX series recommended) +- **NVIDIA Driver** for Windows (latest version) + +### System Requirements + +- Windows 11 Build 22000 or higher +- 16GB RAM minimum (32GB recommended for larger models) +- 50GB+ free disk space for models +- NVIDIA GPU with 8GB+ VRAM + +## πŸͺŸ WSL2 Setup + +### 1. Enable WSL2 (if not already done) + +```powershell +# Run in PowerShell as Administrator +wsl --install +wsl --set-default-version 2 + +# Install Ubuntu 24.04 +wsl --install -d Ubuntu-24.04 + +# Verify WSL2 is active +wsl --list --verbose +``` + +### 2. Install Docker Desktop for Windows + +1. Download from [Docker Desktop](https://www.docker.com/products/docker-desktop) +2. Install and enable **WSL2 backend** in settings +3. Enable integration with Ubuntu-24.04 distro in: Settings β†’ Resources β†’ WSL Integration + +### 3. Verify GPU Support in WSL2 + +```bash +# Open WSL2 Ubuntu terminal +wsl + +# Check NVIDIA driver +nvidia-smi + +# You should see your GPU listed +``` + +**Important**: You do NOT need to install NVIDIA Container Toolkit in WSL2. Docker Desktop handles GPU passthrough automatically. + +### 4. Test Docker GPU Access + +```bash +# In WSL2 terminal +docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi +``` + +If this works, you're ready to go! πŸŽ‰ + +## πŸš€ Installation + +### 1. Create Project Structure in WSL2 + +```bash +# Open WSL2 terminal +wsl + +# Create project directory +mkdir -p ~/ollama-docker +cd ~/ollama-docker +``` + +### 2. Create `docker-compose.yml` + +Use the provided `docker-compose.yml` file with the WSL2 path: +- Windows path: `E:\volumes\ollama\data` +- WSL2 path: `/mnt/e/volumes/ollama/data` + +### 3. Create Volume Directory + +```bash +# From WSL2 terminal +sudo mkdir -p /mnt/e/volumes/ollama/data + +# Or from Windows PowerShell +mkdir E:\volumes\ollama\data +``` + +## ▢️ Starting Ollama + +```bash +# Navigate to project directory +cd ~/ollama-docker + +# Start the service +docker compose up -d + +# Check logs +docker compose logs -f ollama + +# Verify service is running +curl http://localhost:11434 +``` + +Expected response: `Ollama is running` + +### Access from Windows + +Ollama is accessible from both WSL2 and Windows: +- **WSL2**: `http://localhost:11434` +- **Windows**: `http://localhost:11434` + +## πŸ“¦ Model Management + +### List Available Models + +```bash +# Inside container +docker exec -it ollama ollama list + +# Or from WSL2 (if ollama CLI installed) +ollama list +``` + +### Pull/Download Models + +```bash +# Pull a model +docker exec -it ollama ollama pull llama3.2 + +# Popular models +docker exec -it ollama ollama pull mistral +docker exec -it ollama ollama pull codellama +docker exec -it ollama ollama pull phi3 +docker exec -it ollama ollama pull llama3.2:70b +``` + +### Model Sizes Reference + +| Model | Parameters | Size | RAM Required | VRAM Required | +|-------|-----------|------|--------------|---------------| +| `phi3` | 3.8B | ~2.3 GB | 8 GB | 4 GB | +| `llama3.2` | 8B | ~4.7 GB | 8 GB | 6 GB | +| `mistral` | 7B | ~4.1 GB | 8 GB | 6 GB | +| `llama3.2:70b` | 70B | ~40 GB | 64 GB | 48 GB | +| `codellama` | 7B | ~3.8 GB | 8 GB | 6 GB | + +### Remove/Unload Models + +```bash +# Remove a model from disk +docker exec -it ollama ollama rm llama3.2 + +# Stop a running model (unload from memory) +docker exec -it ollama ollama stop llama3.2 + +# Show running models +docker exec -it ollama ollama ps +``` + +### Copy Models Between Systems + +```bash +# Export model +docker exec ollama ollama show llama3.2 --modelfile > Modelfile + +# Import on another system +cat Modelfile | docker exec -i ollama ollama create my-model -f - +``` + +## πŸ’‘ Usage Examples + +### Interactive Chat + +```bash +# Start interactive session +docker exec -it ollama ollama run llama3.2 + +# Chat with specific model +docker exec -it ollama ollama run mistral "Explain quantum computing" +``` + +### Using the API + +#### Generate Completion + +```bash +curl http://localhost:11434/api/generate -d '{ + "model": "llama3.2", + "prompt": "Why is the sky blue?", + "stream": false +}' +``` + +#### Chat Completion + +```bash +curl http://localhost:11434/api/chat -d '{ + "model": "llama3.2", + "messages": [ + { + "role": "user", + "content": "Hello! Can you help me with Python?" + } + ], + "stream": false +}' +``` + +#### Streaming Response + +```bash +curl http://localhost:11434/api/generate -d '{ + "model": "llama3.2", + "prompt": "Write a haiku about programming", + "stream": true +}' +``` + +### Python Example (from Windows or WSL2) + +```python +import requests +import json + +def chat_with_ollama(prompt, model="llama3.2"): + url = "http://localhost:11434/api/generate" + payload = { + "model": model, + "prompt": prompt, + "stream": False + } + + response = requests.post(url, json=payload) + return response.json()["response"] + +# Usage +result = chat_with_ollama("What is Docker?") +print(result) +``` + +### JavaScript Example (from Windows or WSL2) + +```javascript +async function chatWithOllama(prompt, model = "llama3.2") { + const response = await fetch("http://localhost:11434/api/generate", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ + model: model, + prompt: prompt, + stream: false + }) + }); + + const data = await response.json(); + return data.response; +} + +// Usage +chatWithOllama("Explain REST APIs").then(console.log); +``` + +## πŸ”Œ API Reference + +### Main Endpoints + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/api/generate` | POST | Generate text completion | +| `/api/chat` | POST | Chat completion with conversation history | +| `/api/tags` | GET | List available models | +| `/api/pull` | POST | Download a model | +| `/api/push` | POST | Upload a custom model | +| `/api/embeddings` | POST | Generate embeddings | + +### Generate Parameters + +```json +{ + "model": "llama3.2", + "prompt": "Your prompt here", + "stream": false, + "options": { + "temperature": 0.7, + "top_p": 0.9, + "top_k": 40, + "num_predict": 128, + "stop": ["\n"] + } +} +``` + +## πŸ› Troubleshooting + +### Container Won't Start + +```bash +# Check logs +docker compose logs ollama + +# Common issues: +# 1. GPU not accessible +docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi + +# 2. Port already in use +netstat -ano | findstr :11434 # From Windows PowerShell +ss -tulpn | grep 11434 # From WSL2 +``` + +### GPU Not Detected in WSL2 + +```powershell +# Update NVIDIA driver (from Windows) +# Download latest driver from: https://www.nvidia.com/Download/index.aspx + +# Restart WSL2 (from PowerShell) +wsl --shutdown +wsl + +# Verify GPU +nvidia-smi +``` + +### Model Download Fails + +```bash +# Check disk space +docker exec ollama df -h /root/.ollama + +# Check WSL2 disk space +df -h /mnt/e + +# Retry with verbose logging +docker exec -it ollama ollama pull llama3.2 --verbose +``` + +### Out of Memory Errors + +```bash +# Check GPU memory +nvidia-smi + +# Use smaller model or reduce context +docker exec ollama ollama run llama3.2 --num-ctx 2048 +``` + +### WSL2 Disk Space Issues + +```powershell +# Compact WSL2 virtual disk (from PowerShell as Admin) +wsl --shutdown +Optimize-VHD -Path "$env:LOCALAPPDATA\Packages\CanonicalGroupLimited.Ubuntu24.04LTS_*\LocalState\ext4.vhdx" -Mode Full +``` + +### Docker Desktop Integration Issues + +1. Open Docker Desktop +2. Go to **Settings β†’ Resources β†’ WSL Integration** +3. Enable integration with **Ubuntu-24.04** +4. Click **Apply & Restart** + +### Permission Denied on Volume + +```bash +# From WSL2 +sudo chmod -R 755 /mnt/e/volumes/ollama/data +``` + +## ⚑ Performance Tips + +### 1. WSL2 Memory Configuration + +Create/edit `.wslconfig` in Windows user directory (`C:\Users\YourName\.wslconfig`): + +```ini +[wsl2] +memory=16GB +processors=8 +swap=8GB +``` + +Apply changes: +```powershell +wsl --shutdown +wsl +``` + +### 2. GPU Memory Optimization + +```yaml +# In docker-compose.yml +environment: + - CUDA_VISIBLE_DEVICES=0 + - OLLAMA_NUM_GPU=1 +``` + +### 3. Concurrent Requests + +```yaml +# In docker-compose.yml +environment: + - OLLAMA_MAX_LOADED_MODELS=3 + - OLLAMA_NUM_PARALLEL=4 +``` + +### 4. Context Window + +```bash +# Reduce for faster responses +docker exec ollama ollama run llama3.2 --num-ctx 2048 + +# Increase for longer conversations +docker exec ollama ollama run llama3.2 --num-ctx 8192 +``` + +### 5. Model Quantization + +Use quantized models for better performance: +```bash +# 4-bit quantization (faster, less accurate) +docker exec ollama ollama pull llama3.2:q4_0 + +# 8-bit quantization (balanced) +docker exec ollama ollama pull llama3.2:q8_0 +``` + +### 6. Store Models on SSD + +For best performance, ensure `E:\volumes` is on an SSD, not HDD. + +## πŸ“Š Monitoring + +### Check Resource Usage + +```bash +# Container stats +docker stats ollama + +# GPU utilization (from WSL2 or Windows) +nvidia-smi + +# Continuous monitoring +watch -n 1 nvidia-smi +``` + +### Model Status + +```bash +# Show running models +docker exec ollama ollama ps + +# Model information +docker exec ollama ollama show llama3.2 +``` + +### WSL2 Resource Usage + +```powershell +# From Windows PowerShell +wsl --list --verbose +``` + +## πŸ›‘ Stopping and Cleanup + +```bash +# Stop service +docker compose down + +# Stop and remove volumes +docker compose down -v + +# Remove all models +docker exec ollama sh -c "rm -rf /root/.ollama/models/*" + +# Shutdown WSL2 (from Windows PowerShell) +wsl --shutdown +``` + +## πŸ”— Useful Links + +- [Ollama Official Documentation](https://github.com/ollama/ollama) +- [Ollama Model Library](https://ollama.com/library) +- [API Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) +- [WSL2 GPU Documentation](https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute) +- [Docker Desktop WSL2 Backend](https://docs.docker.com/desktop/wsl/) + +## 🎯 Quick Reference + +### Common Commands + +```bash +# Start Ollama +docker compose up -d + +# Pull a model +docker exec -it ollama ollama pull llama3.2 + +# Run interactive chat +docker exec -it ollama ollama run llama3.2 + +# List models +docker exec -it ollama ollama list + +# Check GPU +nvidia-smi + +# Stop Ollama +docker compose down +``` + +## πŸ“ Notes for WSL2 Users + +- **Path Conversion**: Windows `E:\folder` = WSL2 `/mnt/e/folder` +- **Performance**: Models stored on Windows drives are accessible but slightly slower +- **GPU Passthrough**: Handled automatically by Docker Desktop +- **Networking**: `localhost` works from both Windows and WSL2 +- **Memory**: Configure WSL2 memory in `.wslconfig` for large models + +--- + +**Need help?** Open an issue or check the [Ollama Discord](https://discord.gg/ollama) \ No newline at end of file diff --git a/docker-compose.yml b/docker-compose.yml new file mode 100644 index 0000000..e00b04a --- /dev/null +++ b/docker-compose.yml @@ -0,0 +1,51 @@ +services: + ollama: + image: ollama/ollama:latest + container_name: ollama + restart: unless-stopped + ports: + - "11434:11434" + volumes: + - /mnt/e/volumes/ollama/data:/root/.ollama + environment: + - OLLAMA_HOST=0.0.0.0:11434 + # Optional: Set GPU device if you have multiple GPUs + # - NVIDIA_VISIBLE_DEVICES=0 + command: serve + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] + networks: + - app-network + healthcheck: + test: ["CMD", "ollama", "list"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 40s + + faster-whisper: + image: lscr.io/linuxserver/faster-whisper:gpu-legacy + container_name: faster-whisper + gpus: all + environment: + - PUID=1000 + - PGID=1000 + - TZ=Europe/Paris + - WHISPER_MODEL=turbo # bon compromis pour RTX 4060 Ti + - WHISPER_LANG=fr + - WHISPER_BEAM=5 # prΓ©cision vs rapiditΓ© + volumes: + - /mnt/e/volumes/faster-whisper/audio:/app + - /mnt/e/volumes/faster-whisper/models:/root/.cache/whisper + ports: + - 10300:10300 + restart: unless-stopped + +networks: + app-network: + driver: bridge