kodjo/AITools

Fork 0

Files

Kodjo Sossouvi c29c7e11ec Added whisper

2025-10-11 23:13:08 +02:00

7.0 KiB

Raw Permalink Blame History

Faster Whisper - Audio Transcription Service

Audio transcription service using Faster Whisper with GPU acceleration (NVIDIA).

📋 Prerequisites

Windows with WSL2 (Ubuntu 24.04)
Docker Desktop for Windows with WSL2 backend
NVIDIA GPU with drivers installed on Windows
NVIDIA Container Toolkit configured in WSL2
Access to mounted volumes (/mnt/e/volumes/faster-whisper/)

WSL2 GPU Setup

Ensure your WSL2 Ubuntu has access to the NVIDIA GPU:

# Check GPU availability in WSL2
nvidia-smi

# If not available, install NVIDIA Container Toolkit in WSL2
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

🚀 Quick Start

# Start the service
docker compose up -d faster-whisper

# Check logs
docker logs faster-whisper -f

# Stop the service
docker compose down

⚙️ Configuration

Environment Variables

Variable	Value	Description
`PUID`	1000	User ID for file permissions
`PGID`	1000	Group ID for file permissions
`TZ`	Europe/Paris	Timezone
`WHISPER_MODEL`	turbo	Model to use (tiny, base, small, medium, large, turbo)
`WHISPER_LANG`	fr	Transcription language
`WHISPER_BEAM`	5	Beam search size (1-10, accuracy vs speed tradeoff)

Available Models

Model	Size	VRAM	Speed	Accuracy
`tiny`	~75 MB	~1 GB	Very fast	Low
`base`	~142 MB	~1 GB	Fast	Medium
`small`	~466 MB	~2 GB	Medium	Good
`medium`	~1.5 GB	~5 GB	Slow	Very good
`large`	~2.9 GB	~10 GB	Very slow	Excellent
`turbo`	~809 MB	~6 GB	Fast	Excellent

Note: The turbo model is an excellent compromise for RTX 4060 Ti (8 GB VRAM).

Volumes

/mnt/e/volumes/faster-whisper/audio → /app : Audio files directory to transcribe
/mnt/e/volumes/faster-whisper/models → /root/.cache/whisper : Downloaded models cache

Windows Note: The path /mnt/e/ in WSL2 corresponds to E:\ drive on Windows.

🎯 Usage

REST API

The service exposes a REST API on port 10300.

Transcribe an audio file

# Place the file in /mnt/e/volumes/faster-whisper/audio/
# Or on Windows: E:\volumes\faster-whisper\audio\

# From WSL2:
curl -X POST http://localhost:10300/transcribe \
  -F "file=@audio.mp3"

# From Windows PowerShell:
curl.exe -X POST http://localhost:10300/transcribe -F "file=@audio.mp3"

Check service status

curl http://localhost:10300/health

Web Interface

Access the web interface: http://localhost:10300

The interface is accessible from both Windows and WSL2.

🔧 Administration

Check GPU Usage

# From WSL2 host
nvidia-smi

# From inside the container
docker exec faster-whisper nvidia-smi

# Monitor GPU in real-time
watch -n 1 nvidia-smi

Update the Image

docker compose pull faster-whisper
docker compose up -d faster-whisper

Change Model

Edit WHISPER_MODEL in docker-compose.yml
Restart the container:
```
docker compose up -d faster-whisper
```

The new model will be downloaded automatically on first startup.

Performance Optimization

Adjust Beam Search

WHISPER_BEAM=1: Maximum speed, reduced accuracy
WHISPER_BEAM=5: Good compromise (default)
WHISPER_BEAM=10: Maximum accuracy, slower

Monitor Memory Usage

docker stats faster-whisper

Clean Old Models

Models are stored in /mnt/e/volumes/faster-whisper/models/ (WSL2) or E:\volumes\faster-whisper\models\ (Windows).

# From WSL2 - List downloaded models
ls -lh /mnt/e/volumes/faster-whisper/models/

# Delete an unused model
rm -rf /mnt/e/volumes/faster-whisper/models/<model-name>

# From Windows PowerShell
Get-ChildItem E:\volumes\faster-whisper\models\

# Delete an unused model
Remove-Item -Recurse E:\volumes\faster-whisper\models\<model-name>

📊 Monitoring

Real-time Logs

docker logs faster-whisper -f --tail 100

Check Container Status

docker ps | grep faster-whisper

Restart on Issues

docker restart faster-whisper

🐛 Troubleshooting

Container Won't Start

Verify NVIDIA Container Toolkit is installed in WSL2:

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Check permissions on volumes:
```
ls -la /mnt/e/volumes/faster-whisper/
```
Ensure Docker Desktop WSL2 integration is enabled:
- Open Docker Desktop → Settings → Resources → WSL Integration
- Enable integration with Ubuntu-24.04

"Out of Memory" Error

Reduce the model (e.g., from turbo to small)
Reduce WHISPER_BEAM to 3 or 1
Close other GPU-intensive applications on Windows
Check GPU memory usage: nvidia-smi

Poor Transcription Quality

Increase the model (e.g., from small to turbo)
Increase WHISPER_BEAM to 7 or 10
Check audio quality of source file
Verify the correct language is set in WHISPER_LANG

WSL2 Specific Issues

GPU Not Detected

# Check Windows GPU driver version (from PowerShell)
nvidia-smi

# Update WSL2 kernel
wsl --update

# Restart WSL2
wsl --shutdown
# Then reopen Ubuntu

Volume Access Issues

# Check if drive is mounted in WSL2
ls /mnt/e/

# If not mounted, add to /etc/wsl.conf
sudo nano /etc/wsl.conf

# Add these lines:
[automount]
enabled = true
options = "metadata,uid=1000,gid=1000"

# Restart WSL2
wsl --shutdown

📁 File Structure

Windows: E:\volumes\faster-whisper\
WSL2:    /mnt/e/volumes/faster-whisper/
├── audio/          # Audio files to transcribe
└── models/         # Whisper models cache

🪟 Windows Integration

Access Files from Windows Explorer

Navigate to \\wsl$\Ubuntu-24.04\mnt\e\volumes\faster-whisper\
Or directly to E:\volumes\faster-whisper\

Copy Files to Transcribe

From Windows:

Copy-Item "C:\path\to\audio.mp3" -Destination "E:\volumes\faster-whisper\audio\"

From WSL2:

cp /mnt/c/path/to/audio.mp3 /mnt/e/volumes/faster-whisper/audio/

🔗 Useful Links

📝 Notes

Service automatically restarts unless manually stopped (restart: unless-stopped)
On first startup, the model will be downloaded (may take a few minutes)
Supported audio formats: MP3, WAV, M4A, FLAC, OGG, etc.
The service runs in WSL2 but is accessible from Windows
GPU computations are performed on the Windows NVIDIA GPU

7.0 KiB Raw Permalink Blame History