Files
AITools/Readme_FasterWhisper.md
2025-10-11 23:13:08 +02:00

302 lines
7.0 KiB
Markdown

# Faster Whisper - Audio Transcription Service
Audio transcription service using Faster Whisper with GPU acceleration (NVIDIA).
## 📋 Prerequisites
- Windows with WSL2 (Ubuntu 24.04)
- Docker Desktop for Windows with WSL2 backend
- NVIDIA GPU with drivers installed on Windows
- NVIDIA Container Toolkit configured in WSL2
- Access to mounted volumes (`/mnt/e/volumes/faster-whisper/`)
### WSL2 GPU Setup
Ensure your WSL2 Ubuntu has access to the NVIDIA GPU:
```bash
# Check GPU availability in WSL2
nvidia-smi
# If not available, install NVIDIA Container Toolkit in WSL2
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```
## 🚀 Quick Start
```bash
# Start the service
docker compose up -d faster-whisper
# Check logs
docker logs faster-whisper -f
# Stop the service
docker compose down
```
## ⚙️ Configuration
### Environment Variables
| Variable | Value | Description |
|----------|-------|-------------|
| `PUID` | 1000 | User ID for file permissions |
| `PGID` | 1000 | Group ID for file permissions |
| `TZ` | Europe/Paris | Timezone |
| `WHISPER_MODEL` | turbo | Model to use (tiny, base, small, medium, large, turbo) |
| `WHISPER_LANG` | fr | Transcription language |
| `WHISPER_BEAM` | 5 | Beam search size (1-10, accuracy vs speed tradeoff) |
### Available Models
| Model | Size | VRAM | Speed | Accuracy |
|-------|------|------|-------|----------|
| `tiny` | ~75 MB | ~1 GB | Very fast | Low |
| `base` | ~142 MB | ~1 GB | Fast | Medium |
| `small` | ~466 MB | ~2 GB | Medium | Good |
| `medium` | ~1.5 GB | ~5 GB | Slow | Very good |
| `large` | ~2.9 GB | ~10 GB | Very slow | Excellent |
| `turbo` | ~809 MB | ~6 GB | Fast | Excellent |
> **Note:** The `turbo` model is an excellent compromise for RTX 4060 Ti (8 GB VRAM).
### Volumes
- `/mnt/e/volumes/faster-whisper/audio``/app` : Audio files directory to transcribe
- `/mnt/e/volumes/faster-whisper/models``/root/.cache/whisper` : Downloaded models cache
> **Windows Note:** The path `/mnt/e/` in WSL2 corresponds to `E:\` drive on Windows.
## 🎯 Usage
### REST API
The service exposes a REST API on port **10300**.
#### Transcribe an audio file
```bash
# Place the file in /mnt/e/volumes/faster-whisper/audio/
# Or on Windows: E:\volumes\faster-whisper\audio\
# From WSL2:
curl -X POST http://localhost:10300/transcribe \
-F "file=@audio.mp3"
# From Windows PowerShell:
curl.exe -X POST http://localhost:10300/transcribe -F "file=@audio.mp3"
```
#### Check service status
```bash
curl http://localhost:10300/health
```
### Web Interface
Access the web interface: `http://localhost:10300`
The interface is accessible from both Windows and WSL2.
## 🔧 Administration
### Check GPU Usage
```bash
# From WSL2 host
nvidia-smi
# From inside the container
docker exec faster-whisper nvidia-smi
# Monitor GPU in real-time
watch -n 1 nvidia-smi
```
### Update the Image
```bash
docker compose pull faster-whisper
docker compose up -d faster-whisper
```
### Change Model
1. Edit `WHISPER_MODEL` in docker-compose.yml
2. Restart the container:
```bash
docker compose up -d faster-whisper
```
The new model will be downloaded automatically on first startup.
### Performance Optimization
#### Adjust Beam Search
- `WHISPER_BEAM=1`: Maximum speed, reduced accuracy
- `WHISPER_BEAM=5`: Good compromise (default)
- `WHISPER_BEAM=10`: Maximum accuracy, slower
#### Monitor Memory Usage
```bash
docker stats faster-whisper
```
### Clean Old Models
Models are stored in `/mnt/e/volumes/faster-whisper/models/` (WSL2) or `E:\volumes\faster-whisper\models\` (Windows).
```bash
# From WSL2 - List downloaded models
ls -lh /mnt/e/volumes/faster-whisper/models/
# Delete an unused model
rm -rf /mnt/e/volumes/faster-whisper/models/<model-name>
```
```powershell
# From Windows PowerShell
Get-ChildItem E:\volumes\faster-whisper\models\
# Delete an unused model
Remove-Item -Recurse E:\volumes\faster-whisper\models\<model-name>
```
## 📊 Monitoring
### Real-time Logs
```bash
docker logs faster-whisper -f --tail 100
```
### Check Container Status
```bash
docker ps | grep faster-whisper
```
### Restart on Issues
```bash
docker restart faster-whisper
```
## 🐛 Troubleshooting
### Container Won't Start
1. Verify NVIDIA Container Toolkit is installed in WSL2:
```bash
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
```
2. Check permissions on volumes:
```bash
ls -la /mnt/e/volumes/faster-whisper/
```
3. Ensure Docker Desktop WSL2 integration is enabled:
- Open Docker Desktop → Settings → Resources → WSL Integration
- Enable integration with Ubuntu-24.04
### "Out of Memory" Error
- Reduce the model (e.g., from `turbo` to `small`)
- Reduce `WHISPER_BEAM` to 3 or 1
- Close other GPU-intensive applications on Windows
- Check GPU memory usage: `nvidia-smi`
### Poor Transcription Quality
- Increase the model (e.g., from `small` to `turbo`)
- Increase `WHISPER_BEAM` to 7 or 10
- Check audio quality of source file
- Verify the correct language is set in `WHISPER_LANG`
### WSL2 Specific Issues
#### GPU Not Detected
```bash
# Check Windows GPU driver version (from PowerShell)
nvidia-smi
# Update WSL2 kernel
wsl --update
# Restart WSL2
wsl --shutdown
# Then reopen Ubuntu
```
#### Volume Access Issues
```bash
# Check if drive is mounted in WSL2
ls /mnt/e/
# If not mounted, add to /etc/wsl.conf
sudo nano /etc/wsl.conf
# Add these lines:
[automount]
enabled = true
options = "metadata,uid=1000,gid=1000"
# Restart WSL2
wsl --shutdown
```
## 📁 File Structure
```
Windows: E:\volumes\faster-whisper\
WSL2: /mnt/e/volumes/faster-whisper/
├── audio/ # Audio files to transcribe
└── models/ # Whisper models cache
```
## 🪟 Windows Integration
### Access Files from Windows Explorer
- Navigate to `\\wsl$\Ubuntu-24.04\mnt\e\volumes\faster-whisper\`
- Or directly to `E:\volumes\faster-whisper\`
### Copy Files to Transcribe
From Windows:
```powershell
Copy-Item "C:\path\to\audio.mp3" -Destination "E:\volumes\faster-whisper\audio\"
```
From WSL2:
```bash
cp /mnt/c/path/to/audio.mp3 /mnt/e/volumes/faster-whisper/audio/
```
## 🔗 Useful Links
- [LinuxServer Docker Image Documentation](https://docs.linuxserver.io/images/docker-faster-whisper)
- [Faster Whisper GitHub](https://github.com/SYSTRAN/faster-whisper)
- [OpenAI Whisper Documentation](https://github.com/openai/whisper)
- [WSL2 GPU Support](https://docs.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute)
## 📝 Notes
- Service automatically restarts unless manually stopped (`restart: unless-stopped`)
- On first startup, the model will be downloaded (may take a few minutes)
- Supported audio formats: MP3, WAV, M4A, FLAC, OGG, etc.
- The service runs in WSL2 but is accessible from Windows
- GPU computations are performed on the Windows NVIDIA GPU