AITools/Readme_FasterWhisper.md

# Faster Whisper - Audio Transcription Service

Audio transcription service using Faster Whisper with GPU acceleration (NVIDIA).

## 📋 Prerequisites

- Windows with WSL2 (Ubuntu 24.04)
- Docker Desktop for Windows with WSL2 backend
- NVIDIA GPU with drivers installed on Windows
- NVIDIA Container Toolkit configured in WSL2
- Access to mounted volumes (`/mnt/e/volumes/faster-whisper/`)

### WSL2 GPU Setup

Ensure your WSL2 Ubuntu has access to the NVIDIA GPU:

```bash
# Check GPU availability in WSL2
nvidia-smi

# If not available, install NVIDIA Container Toolkit in WSL2
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```

## 🚀 Quick Start

```bash
# Start the service
docker compose up -d faster-whisper

# Check logs
docker logs faster-whisper -f

# Stop the service
docker compose down
```

## ⚙️ Configuration

### Environment Variables

| Variable | Value | Description |
|----------|-------|-------------|
| `PUID` | 1000 | User ID for file permissions |
| `PGID` | 1000 | Group ID for file permissions |
| `TZ` | Europe/Paris | Timezone |
| `WHISPER_MODEL` | turbo | Model to use (tiny, base, small, medium, large, turbo) |
| `WHISPER_LANG` | fr | Transcription language |
| `WHISPER_BEAM` | 5 | Beam search size (1-10, accuracy vs speed tradeoff) |

### Available Models

| Model | Size | VRAM | Speed | Accuracy |
|-------|------|------|-------|----------|
| `tiny` | ~75 MB | ~1 GB | Very fast | Low |
| `base` | ~142 MB | ~1 GB | Fast | Medium |
| `small` | ~466 MB | ~2 GB | Medium | Good |
| `medium` | ~1.5 GB | ~5 GB | Slow | Very good |
| `large` | ~2.9 GB | ~10 GB | Very slow | Excellent |
| `turbo` | ~809 MB | ~6 GB | Fast | Excellent |

> **Note:** The `turbo` model is an excellent compromise for RTX 4060 Ti (8 GB VRAM).

### Volumes

- `/mnt/e/volumes/faster-whisper/audio` → `/app` : Audio files directory to transcribe
- `/mnt/e/volumes/faster-whisper/models` → `/root/.cache/whisper` : Downloaded models cache

> **Windows Note:** The path `/mnt/e/` in WSL2 corresponds to `E:\` drive on Windows.

## 🎯 Usage

### REST API

The service exposes a REST API on port **10300**.

#### Transcribe an audio file

```bash
# Place the file in /mnt/e/volumes/faster-whisper/audio/
# Or on Windows: E:\volumes\faster-whisper\audio\

# From WSL2:
curl -X POST http://localhost:10300/transcribe \
  -F "file=@audio.mp3"

# From Windows PowerShell:
curl.exe -X POST http://localhost:10300/transcribe -F "file=@audio.mp3"
```

#### Check service status

```bash
curl http://localhost:10300/health
```

### Web Interface

Access the web interface: `http://localhost:10300`

The interface is accessible from both Windows and WSL2.

## 🔧 Administration

### Check GPU Usage

```bash
# From WSL2 host
nvidia-smi

# From inside the container
docker exec faster-whisper nvidia-smi

# Monitor GPU in real-time
watch -n 1 nvidia-smi
```

### Update the Image

```bash
docker compose pull faster-whisper
docker compose up -d faster-whisper
```

### Change Model

1. Edit `WHISPER_MODEL` in docker-compose.yml
2. Restart the container:
   ```bash
   docker compose up -d faster-whisper
   ```

The new model will be downloaded automatically on first startup.

### Performance Optimization

#### Adjust Beam Search

- `WHISPER_BEAM=1`: Maximum speed, reduced accuracy
- `WHISPER_BEAM=5`: Good compromise (default)
- `WHISPER_BEAM=10`: Maximum accuracy, slower

#### Monitor Memory Usage

```bash
docker stats faster-whisper
```

### Clean Old Models

Models are stored in `/mnt/e/volumes/faster-whisper/models/` (WSL2) or `E:\volumes\faster-whisper\models\` (Windows).

```bash
# From WSL2 - List downloaded models
ls -lh /mnt/e/volumes/faster-whisper/models/

# Delete an unused model
rm -rf /mnt/e/volumes/faster-whisper/models/<model-name>
```

```powershell
# From Windows PowerShell
Get-ChildItem E:\volumes\faster-whisper\models\

# Delete an unused model
Remove-Item -Recurse E:\volumes\faster-whisper\models\<model-name>
```

## 📊 Monitoring

### Real-time Logs

```bash
docker logs faster-whisper -f --tail 100
```

### Check Container Status

```bash
docker ps | grep faster-whisper
```

### Restart on Issues

```bash
docker restart faster-whisper
```

## 🐛 Troubleshooting

### Container Won't Start

1. Verify NVIDIA Container Toolkit is installed in WSL2:
   ```bash
   docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
   ```

2. Check permissions on volumes:
   ```bash
   ls -la /mnt/e/volumes/faster-whisper/
   ```

3. Ensure Docker Desktop WSL2 integration is enabled:
   - Open Docker Desktop → Settings → Resources → WSL Integration
   - Enable integration with Ubuntu-24.04

### "Out of Memory" Error

- Reduce the model (e.g., from `turbo` to `small`)
- Reduce `WHISPER_BEAM` to 3 or 1
- Close other GPU-intensive applications on Windows
- Check GPU memory usage: `nvidia-smi`

### Poor Transcription Quality

- Increase the model (e.g., from `small` to `turbo`)
- Increase `WHISPER_BEAM` to 7 or 10
- Check audio quality of source file
- Verify the correct language is set in `WHISPER_LANG`

### WSL2 Specific Issues

#### GPU Not Detected

```bash
# Check Windows GPU driver version (from PowerShell)
nvidia-smi

# Update WSL2 kernel
wsl --update

# Restart WSL2
wsl --shutdown
# Then reopen Ubuntu
```

#### Volume Access Issues

```bash
# Check if drive is mounted in WSL2
ls /mnt/e/

# If not mounted, add to /etc/wsl.conf
sudo nano /etc/wsl.conf

# Add these lines:
[automount]
enabled = true
options = "metadata,uid=1000,gid=1000"

# Restart WSL2
wsl --shutdown
```

## 📁 File Structure

```
Windows: E:\volumes\faster-whisper\
WSL2:    /mnt/e/volumes/faster-whisper/
├── audio/          # Audio files to transcribe
└── models/         # Whisper models cache
```

## 🪟 Windows Integration

### Access Files from Windows Explorer

- Navigate to `\\wsl$\Ubuntu-24.04\mnt\e\volumes\faster-whisper\`
- Or directly to `E:\volumes\faster-whisper\`

### Copy Files to Transcribe

From Windows:
```powershell
Copy-Item "C:\path\to\audio.mp3" -Destination "E:\volumes\faster-whisper\audio\"
```

From WSL2:
```bash
cp /mnt/c/path/to/audio.mp3 /mnt/e/volumes/faster-whisper/audio/
```

## 🔗 Useful Links

- [LinuxServer Docker Image Documentation](https://docs.linuxserver.io/images/docker-faster-whisper)
- [Faster Whisper GitHub](https://github.com/SYSTRAN/faster-whisper)
- [OpenAI Whisper Documentation](https://github.com/openai/whisper)
- [WSL2 GPU Support](https://docs.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute)

## 📝 Notes

- Service automatically restarts unless manually stopped (`restart: unless-stopped`)
- On first startup, the model will be downloaded (may take a few minutes)
- Supported audio formats: MP3, WAV, M4A, FLAC, OGG, etc.
- The service runs in WSL2 but is accessible from Windows
- GPU computations are performed on the Windows NVIDIA GPU