302 lines
7.0 KiB
Markdown
302 lines
7.0 KiB
Markdown
# Faster Whisper - Audio Transcription Service
|
|
|
|
Audio transcription service using Faster Whisper with GPU acceleration (NVIDIA).
|
|
|
|
## 📋 Prerequisites
|
|
|
|
- Windows with WSL2 (Ubuntu 24.04)
|
|
- Docker Desktop for Windows with WSL2 backend
|
|
- NVIDIA GPU with drivers installed on Windows
|
|
- NVIDIA Container Toolkit configured in WSL2
|
|
- Access to mounted volumes (`/mnt/e/volumes/faster-whisper/`)
|
|
|
|
### WSL2 GPU Setup
|
|
|
|
Ensure your WSL2 Ubuntu has access to the NVIDIA GPU:
|
|
|
|
```bash
|
|
# Check GPU availability in WSL2
|
|
nvidia-smi
|
|
|
|
# If not available, install NVIDIA Container Toolkit in WSL2
|
|
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
|
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
|
|
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
|
|
|
|
sudo apt-get update
|
|
sudo apt-get install -y nvidia-container-toolkit
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
## 🚀 Quick Start
|
|
|
|
```bash
|
|
# Start the service
|
|
docker compose up -d faster-whisper
|
|
|
|
# Check logs
|
|
docker logs faster-whisper -f
|
|
|
|
# Stop the service
|
|
docker compose down
|
|
```
|
|
|
|
## ⚙️ Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Value | Description |
|
|
|----------|-------|-------------|
|
|
| `PUID` | 1000 | User ID for file permissions |
|
|
| `PGID` | 1000 | Group ID for file permissions |
|
|
| `TZ` | Europe/Paris | Timezone |
|
|
| `WHISPER_MODEL` | turbo | Model to use (tiny, base, small, medium, large, turbo) |
|
|
| `WHISPER_LANG` | fr | Transcription language |
|
|
| `WHISPER_BEAM` | 5 | Beam search size (1-10, accuracy vs speed tradeoff) |
|
|
|
|
### Available Models
|
|
|
|
| Model | Size | VRAM | Speed | Accuracy |
|
|
|-------|------|------|-------|----------|
|
|
| `tiny` | ~75 MB | ~1 GB | Very fast | Low |
|
|
| `base` | ~142 MB | ~1 GB | Fast | Medium |
|
|
| `small` | ~466 MB | ~2 GB | Medium | Good |
|
|
| `medium` | ~1.5 GB | ~5 GB | Slow | Very good |
|
|
| `large` | ~2.9 GB | ~10 GB | Very slow | Excellent |
|
|
| `turbo` | ~809 MB | ~6 GB | Fast | Excellent |
|
|
|
|
> **Note:** The `turbo` model is an excellent compromise for RTX 4060 Ti (8 GB VRAM).
|
|
|
|
### Volumes
|
|
|
|
- `/mnt/e/volumes/faster-whisper/audio` → `/app` : Audio files directory to transcribe
|
|
- `/mnt/e/volumes/faster-whisper/models` → `/root/.cache/whisper` : Downloaded models cache
|
|
|
|
> **Windows Note:** The path `/mnt/e/` in WSL2 corresponds to `E:\` drive on Windows.
|
|
|
|
## 🎯 Usage
|
|
|
|
### REST API
|
|
|
|
The service exposes a REST API on port **10300**.
|
|
|
|
#### Transcribe an audio file
|
|
|
|
```bash
|
|
# Place the file in /mnt/e/volumes/faster-whisper/audio/
|
|
# Or on Windows: E:\volumes\faster-whisper\audio\
|
|
|
|
# From WSL2:
|
|
curl -X POST http://localhost:10300/transcribe \
|
|
-F "file=@audio.mp3"
|
|
|
|
# From Windows PowerShell:
|
|
curl.exe -X POST http://localhost:10300/transcribe -F "file=@audio.mp3"
|
|
```
|
|
|
|
#### Check service status
|
|
|
|
```bash
|
|
curl http://localhost:10300/health
|
|
```
|
|
|
|
### Web Interface
|
|
|
|
Access the web interface: `http://localhost:10300`
|
|
|
|
The interface is accessible from both Windows and WSL2.
|
|
|
|
## 🔧 Administration
|
|
|
|
### Check GPU Usage
|
|
|
|
```bash
|
|
# From WSL2 host
|
|
nvidia-smi
|
|
|
|
# From inside the container
|
|
docker exec faster-whisper nvidia-smi
|
|
|
|
# Monitor GPU in real-time
|
|
watch -n 1 nvidia-smi
|
|
```
|
|
|
|
### Update the Image
|
|
|
|
```bash
|
|
docker compose pull faster-whisper
|
|
docker compose up -d faster-whisper
|
|
```
|
|
|
|
### Change Model
|
|
|
|
1. Edit `WHISPER_MODEL` in docker-compose.yml
|
|
2. Restart the container:
|
|
```bash
|
|
docker compose up -d faster-whisper
|
|
```
|
|
|
|
The new model will be downloaded automatically on first startup.
|
|
|
|
### Performance Optimization
|
|
|
|
#### Adjust Beam Search
|
|
|
|
- `WHISPER_BEAM=1`: Maximum speed, reduced accuracy
|
|
- `WHISPER_BEAM=5`: Good compromise (default)
|
|
- `WHISPER_BEAM=10`: Maximum accuracy, slower
|
|
|
|
#### Monitor Memory Usage
|
|
|
|
```bash
|
|
docker stats faster-whisper
|
|
```
|
|
|
|
### Clean Old Models
|
|
|
|
Models are stored in `/mnt/e/volumes/faster-whisper/models/` (WSL2) or `E:\volumes\faster-whisper\models\` (Windows).
|
|
|
|
```bash
|
|
# From WSL2 - List downloaded models
|
|
ls -lh /mnt/e/volumes/faster-whisper/models/
|
|
|
|
# Delete an unused model
|
|
rm -rf /mnt/e/volumes/faster-whisper/models/<model-name>
|
|
```
|
|
|
|
```powershell
|
|
# From Windows PowerShell
|
|
Get-ChildItem E:\volumes\faster-whisper\models\
|
|
|
|
# Delete an unused model
|
|
Remove-Item -Recurse E:\volumes\faster-whisper\models\<model-name>
|
|
```
|
|
|
|
## 📊 Monitoring
|
|
|
|
### Real-time Logs
|
|
|
|
```bash
|
|
docker logs faster-whisper -f --tail 100
|
|
```
|
|
|
|
### Check Container Status
|
|
|
|
```bash
|
|
docker ps | grep faster-whisper
|
|
```
|
|
|
|
### Restart on Issues
|
|
|
|
```bash
|
|
docker restart faster-whisper
|
|
```
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Container Won't Start
|
|
|
|
1. Verify NVIDIA Container Toolkit is installed in WSL2:
|
|
```bash
|
|
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
|
|
```
|
|
|
|
2. Check permissions on volumes:
|
|
```bash
|
|
ls -la /mnt/e/volumes/faster-whisper/
|
|
```
|
|
|
|
3. Ensure Docker Desktop WSL2 integration is enabled:
|
|
- Open Docker Desktop → Settings → Resources → WSL Integration
|
|
- Enable integration with Ubuntu-24.04
|
|
|
|
### "Out of Memory" Error
|
|
|
|
- Reduce the model (e.g., from `turbo` to `small`)
|
|
- Reduce `WHISPER_BEAM` to 3 or 1
|
|
- Close other GPU-intensive applications on Windows
|
|
- Check GPU memory usage: `nvidia-smi`
|
|
|
|
### Poor Transcription Quality
|
|
|
|
- Increase the model (e.g., from `small` to `turbo`)
|
|
- Increase `WHISPER_BEAM` to 7 or 10
|
|
- Check audio quality of source file
|
|
- Verify the correct language is set in `WHISPER_LANG`
|
|
|
|
### WSL2 Specific Issues
|
|
|
|
#### GPU Not Detected
|
|
|
|
```bash
|
|
# Check Windows GPU driver version (from PowerShell)
|
|
nvidia-smi
|
|
|
|
# Update WSL2 kernel
|
|
wsl --update
|
|
|
|
# Restart WSL2
|
|
wsl --shutdown
|
|
# Then reopen Ubuntu
|
|
```
|
|
|
|
#### Volume Access Issues
|
|
|
|
```bash
|
|
# Check if drive is mounted in WSL2
|
|
ls /mnt/e/
|
|
|
|
# If not mounted, add to /etc/wsl.conf
|
|
sudo nano /etc/wsl.conf
|
|
|
|
# Add these lines:
|
|
[automount]
|
|
enabled = true
|
|
options = "metadata,uid=1000,gid=1000"
|
|
|
|
# Restart WSL2
|
|
wsl --shutdown
|
|
```
|
|
|
|
## 📁 File Structure
|
|
|
|
```
|
|
Windows: E:\volumes\faster-whisper\
|
|
WSL2: /mnt/e/volumes/faster-whisper/
|
|
├── audio/ # Audio files to transcribe
|
|
└── models/ # Whisper models cache
|
|
```
|
|
|
|
## 🪟 Windows Integration
|
|
|
|
### Access Files from Windows Explorer
|
|
|
|
- Navigate to `\\wsl$\Ubuntu-24.04\mnt\e\volumes\faster-whisper\`
|
|
- Or directly to `E:\volumes\faster-whisper\`
|
|
|
|
### Copy Files to Transcribe
|
|
|
|
From Windows:
|
|
```powershell
|
|
Copy-Item "C:\path\to\audio.mp3" -Destination "E:\volumes\faster-whisper\audio\"
|
|
```
|
|
|
|
From WSL2:
|
|
```bash
|
|
cp /mnt/c/path/to/audio.mp3 /mnt/e/volumes/faster-whisper/audio/
|
|
```
|
|
|
|
## 🔗 Useful Links
|
|
|
|
- [LinuxServer Docker Image Documentation](https://docs.linuxserver.io/images/docker-faster-whisper)
|
|
- [Faster Whisper GitHub](https://github.com/SYSTRAN/faster-whisper)
|
|
- [OpenAI Whisper Documentation](https://github.com/openai/whisper)
|
|
- [WSL2 GPU Support](https://docs.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute)
|
|
|
|
## 📝 Notes
|
|
|
|
- Service automatically restarts unless manually stopped (`restart: unless-stopped`)
|
|
- On first startup, the model will be downloaded (may take a few minutes)
|
|
- Supported audio formats: MP3, WAV, M4A, FLAC, OGG, etc.
|
|
- The service runs in WSL2 but is accessible from Windows
|
|
- GPU computations are performed on the Windows NVIDIA GPU |