# Faster Whisper - Audio Transcription Service Audio transcription service using Faster Whisper with GPU acceleration (NVIDIA). ## 📋 Prerequisites - Windows with WSL2 (Ubuntu 24.04) - Docker Desktop for Windows with WSL2 backend - NVIDIA GPU with drivers installed on Windows - NVIDIA Container Toolkit configured in WSL2 - Access to mounted volumes (`/mnt/e/volumes/faster-whisper/`) ### WSL2 GPU Setup Ensure your WSL2 Ubuntu has access to the NVIDIA GPU: ```bash # Check GPU availability in WSL2 nvidia-smi # If not available, install NVIDIA Container Toolkit in WSL2 distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` ## 🚀 Quick Start ```bash # Start the service docker compose up -d faster-whisper # Check logs docker logs faster-whisper -f # Stop the service docker compose down ``` ## ⚙️ Configuration ### Environment Variables | Variable | Value | Description | |----------|-------|-------------| | `PUID` | 1000 | User ID for file permissions | | `PGID` | 1000 | Group ID for file permissions | | `TZ` | Europe/Paris | Timezone | | `WHISPER_MODEL` | turbo | Model to use (tiny, base, small, medium, large, turbo) | | `WHISPER_LANG` | fr | Transcription language | | `WHISPER_BEAM` | 5 | Beam search size (1-10, accuracy vs speed tradeoff) | ### Available Models | Model | Size | VRAM | Speed | Accuracy | |-------|------|------|-------|----------| | `tiny` | ~75 MB | ~1 GB | Very fast | Low | | `base` | ~142 MB | ~1 GB | Fast | Medium | | `small` | ~466 MB | ~2 GB | Medium | Good | | `medium` | ~1.5 GB | ~5 GB | Slow | Very good | | `large` | ~2.9 GB | ~10 GB | Very slow | Excellent | | `turbo` | ~809 MB | ~6 GB | Fast | Excellent | > **Note:** The `turbo` model is an excellent compromise for RTX 4060 Ti (8 GB VRAM). ### Volumes - `/mnt/e/volumes/faster-whisper/audio` → `/app` : Audio files directory to transcribe - `/mnt/e/volumes/faster-whisper/models` → `/root/.cache/whisper` : Downloaded models cache > **Windows Note:** The path `/mnt/e/` in WSL2 corresponds to `E:\` drive on Windows. ## 🎯 Usage ### REST API The service exposes a REST API on port **10300**. #### Transcribe an audio file ```bash # Place the file in /mnt/e/volumes/faster-whisper/audio/ # Or on Windows: E:\volumes\faster-whisper\audio\ # From WSL2: curl -X POST http://localhost:10300/transcribe \ -F "file=@audio.mp3" # From Windows PowerShell: curl.exe -X POST http://localhost:10300/transcribe -F "file=@audio.mp3" ``` #### Check service status ```bash curl http://localhost:10300/health ``` ### Web Interface Access the web interface: `http://localhost:10300` The interface is accessible from both Windows and WSL2. ## 🔧 Administration ### Check GPU Usage ```bash # From WSL2 host nvidia-smi # From inside the container docker exec faster-whisper nvidia-smi # Monitor GPU in real-time watch -n 1 nvidia-smi ``` ### Update the Image ```bash docker compose pull faster-whisper docker compose up -d faster-whisper ``` ### Change Model 1. Edit `WHISPER_MODEL` in docker-compose.yml 2. Restart the container: ```bash docker compose up -d faster-whisper ``` The new model will be downloaded automatically on first startup. ### Performance Optimization #### Adjust Beam Search - `WHISPER_BEAM=1`: Maximum speed, reduced accuracy - `WHISPER_BEAM=5`: Good compromise (default) - `WHISPER_BEAM=10`: Maximum accuracy, slower #### Monitor Memory Usage ```bash docker stats faster-whisper ``` ### Clean Old Models Models are stored in `/mnt/e/volumes/faster-whisper/models/` (WSL2) or `E:\volumes\faster-whisper\models\` (Windows). ```bash # From WSL2 - List downloaded models ls -lh /mnt/e/volumes/faster-whisper/models/ # Delete an unused model rm -rf /mnt/e/volumes/faster-whisper/models/ ``` ```powershell # From Windows PowerShell Get-ChildItem E:\volumes\faster-whisper\models\ # Delete an unused model Remove-Item -Recurse E:\volumes\faster-whisper\models\ ``` ## 📊 Monitoring ### Real-time Logs ```bash docker logs faster-whisper -f --tail 100 ``` ### Check Container Status ```bash docker ps | grep faster-whisper ``` ### Restart on Issues ```bash docker restart faster-whisper ``` ## 🐛 Troubleshooting ### Container Won't Start 1. Verify NVIDIA Container Toolkit is installed in WSL2: ```bash docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi ``` 2. Check permissions on volumes: ```bash ls -la /mnt/e/volumes/faster-whisper/ ``` 3. Ensure Docker Desktop WSL2 integration is enabled: - Open Docker Desktop → Settings → Resources → WSL Integration - Enable integration with Ubuntu-24.04 ### "Out of Memory" Error - Reduce the model (e.g., from `turbo` to `small`) - Reduce `WHISPER_BEAM` to 3 or 1 - Close other GPU-intensive applications on Windows - Check GPU memory usage: `nvidia-smi` ### Poor Transcription Quality - Increase the model (e.g., from `small` to `turbo`) - Increase `WHISPER_BEAM` to 7 or 10 - Check audio quality of source file - Verify the correct language is set in `WHISPER_LANG` ### WSL2 Specific Issues #### GPU Not Detected ```bash # Check Windows GPU driver version (from PowerShell) nvidia-smi # Update WSL2 kernel wsl --update # Restart WSL2 wsl --shutdown # Then reopen Ubuntu ``` #### Volume Access Issues ```bash # Check if drive is mounted in WSL2 ls /mnt/e/ # If not mounted, add to /etc/wsl.conf sudo nano /etc/wsl.conf # Add these lines: [automount] enabled = true options = "metadata,uid=1000,gid=1000" # Restart WSL2 wsl --shutdown ``` ## 📁 File Structure ``` Windows: E:\volumes\faster-whisper\ WSL2: /mnt/e/volumes/faster-whisper/ ├── audio/ # Audio files to transcribe └── models/ # Whisper models cache ``` ## 🪟 Windows Integration ### Access Files from Windows Explorer - Navigate to `\\wsl$\Ubuntu-24.04\mnt\e\volumes\faster-whisper\` - Or directly to `E:\volumes\faster-whisper\` ### Copy Files to Transcribe From Windows: ```powershell Copy-Item "C:\path\to\audio.mp3" -Destination "E:\volumes\faster-whisper\audio\" ``` From WSL2: ```bash cp /mnt/c/path/to/audio.mp3 /mnt/e/volumes/faster-whisper/audio/ ``` ## 🔗 Useful Links - [LinuxServer Docker Image Documentation](https://docs.linuxserver.io/images/docker-faster-whisper) - [Faster Whisper GitHub](https://github.com/SYSTRAN/faster-whisper) - [OpenAI Whisper Documentation](https://github.com/openai/whisper) - [WSL2 GPU Support](https://docs.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute) ## 📝 Notes - Service automatically restarts unless manually stopped (`restart: unless-stopped`) - On first startup, the model will be downloaded (may take a few minutes) - Supported audio formats: MP3, WAV, M4A, FLAC, OGG, etc. - The service runs in WSL2 but is accessible from Windows - GPU computations are performed on the Windows NVIDIA GPU