Added whisper
This commit is contained in:
302
Readme_FasterWhisper.md
Normal file
302
Readme_FasterWhisper.md
Normal file
@@ -0,0 +1,302 @@
|
||||
# Faster Whisper - Audio Transcription Service
|
||||
|
||||
Audio transcription service using Faster Whisper with GPU acceleration (NVIDIA).
|
||||
|
||||
## 📋 Prerequisites
|
||||
|
||||
- Windows with WSL2 (Ubuntu 24.04)
|
||||
- Docker Desktop for Windows with WSL2 backend
|
||||
- NVIDIA GPU with drivers installed on Windows
|
||||
- NVIDIA Container Toolkit configured in WSL2
|
||||
- Access to mounted volumes (`/mnt/e/volumes/faster-whisper/`)
|
||||
|
||||
### WSL2 GPU Setup
|
||||
|
||||
Ensure your WSL2 Ubuntu has access to the NVIDIA GPU:
|
||||
|
||||
```bash
|
||||
# Check GPU availability in WSL2
|
||||
nvidia-smi
|
||||
|
||||
# If not available, install NVIDIA Container Toolkit in WSL2
|
||||
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
||||
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
|
||||
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
|
||||
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y nvidia-container-toolkit
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
```bash
|
||||
# Start the service
|
||||
docker compose up -d faster-whisper
|
||||
|
||||
# Check logs
|
||||
docker logs faster-whisper -f
|
||||
|
||||
# Stop the service
|
||||
docker compose down
|
||||
```
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Value | Description |
|
||||
|----------|-------|-------------|
|
||||
| `PUID` | 1000 | User ID for file permissions |
|
||||
| `PGID` | 1000 | Group ID for file permissions |
|
||||
| `TZ` | Europe/Paris | Timezone |
|
||||
| `WHISPER_MODEL` | turbo | Model to use (tiny, base, small, medium, large, turbo) |
|
||||
| `WHISPER_LANG` | fr | Transcription language |
|
||||
| `WHISPER_BEAM` | 5 | Beam search size (1-10, accuracy vs speed tradeoff) |
|
||||
|
||||
### Available Models
|
||||
|
||||
| Model | Size | VRAM | Speed | Accuracy |
|
||||
|-------|------|------|-------|----------|
|
||||
| `tiny` | ~75 MB | ~1 GB | Very fast | Low |
|
||||
| `base` | ~142 MB | ~1 GB | Fast | Medium |
|
||||
| `small` | ~466 MB | ~2 GB | Medium | Good |
|
||||
| `medium` | ~1.5 GB | ~5 GB | Slow | Very good |
|
||||
| `large` | ~2.9 GB | ~10 GB | Very slow | Excellent |
|
||||
| `turbo` | ~809 MB | ~6 GB | Fast | Excellent |
|
||||
|
||||
> **Note:** The `turbo` model is an excellent compromise for RTX 4060 Ti (8 GB VRAM).
|
||||
|
||||
### Volumes
|
||||
|
||||
- `/mnt/e/volumes/faster-whisper/audio` → `/app` : Audio files directory to transcribe
|
||||
- `/mnt/e/volumes/faster-whisper/models` → `/root/.cache/whisper` : Downloaded models cache
|
||||
|
||||
> **Windows Note:** The path `/mnt/e/` in WSL2 corresponds to `E:\` drive on Windows.
|
||||
|
||||
## 🎯 Usage
|
||||
|
||||
### REST API
|
||||
|
||||
The service exposes a REST API on port **10300**.
|
||||
|
||||
#### Transcribe an audio file
|
||||
|
||||
```bash
|
||||
# Place the file in /mnt/e/volumes/faster-whisper/audio/
|
||||
# Or on Windows: E:\volumes\faster-whisper\audio\
|
||||
|
||||
# From WSL2:
|
||||
curl -X POST http://localhost:10300/transcribe \
|
||||
-F "file=@audio.mp3"
|
||||
|
||||
# From Windows PowerShell:
|
||||
curl.exe -X POST http://localhost:10300/transcribe -F "file=@audio.mp3"
|
||||
```
|
||||
|
||||
#### Check service status
|
||||
|
||||
```bash
|
||||
curl http://localhost:10300/health
|
||||
```
|
||||
|
||||
### Web Interface
|
||||
|
||||
Access the web interface: `http://localhost:10300`
|
||||
|
||||
The interface is accessible from both Windows and WSL2.
|
||||
|
||||
## 🔧 Administration
|
||||
|
||||
### Check GPU Usage
|
||||
|
||||
```bash
|
||||
# From WSL2 host
|
||||
nvidia-smi
|
||||
|
||||
# From inside the container
|
||||
docker exec faster-whisper nvidia-smi
|
||||
|
||||
# Monitor GPU in real-time
|
||||
watch -n 1 nvidia-smi
|
||||
```
|
||||
|
||||
### Update the Image
|
||||
|
||||
```bash
|
||||
docker compose pull faster-whisper
|
||||
docker compose up -d faster-whisper
|
||||
```
|
||||
|
||||
### Change Model
|
||||
|
||||
1. Edit `WHISPER_MODEL` in docker-compose.yml
|
||||
2. Restart the container:
|
||||
```bash
|
||||
docker compose up -d faster-whisper
|
||||
```
|
||||
|
||||
The new model will be downloaded automatically on first startup.
|
||||
|
||||
### Performance Optimization
|
||||
|
||||
#### Adjust Beam Search
|
||||
|
||||
- `WHISPER_BEAM=1`: Maximum speed, reduced accuracy
|
||||
- `WHISPER_BEAM=5`: Good compromise (default)
|
||||
- `WHISPER_BEAM=10`: Maximum accuracy, slower
|
||||
|
||||
#### Monitor Memory Usage
|
||||
|
||||
```bash
|
||||
docker stats faster-whisper
|
||||
```
|
||||
|
||||
### Clean Old Models
|
||||
|
||||
Models are stored in `/mnt/e/volumes/faster-whisper/models/` (WSL2) or `E:\volumes\faster-whisper\models\` (Windows).
|
||||
|
||||
```bash
|
||||
# From WSL2 - List downloaded models
|
||||
ls -lh /mnt/e/volumes/faster-whisper/models/
|
||||
|
||||
# Delete an unused model
|
||||
rm -rf /mnt/e/volumes/faster-whisper/models/<model-name>
|
||||
```
|
||||
|
||||
```powershell
|
||||
# From Windows PowerShell
|
||||
Get-ChildItem E:\volumes\faster-whisper\models\
|
||||
|
||||
# Delete an unused model
|
||||
Remove-Item -Recurse E:\volumes\faster-whisper\models\<model-name>
|
||||
```
|
||||
|
||||
## 📊 Monitoring
|
||||
|
||||
### Real-time Logs
|
||||
|
||||
```bash
|
||||
docker logs faster-whisper -f --tail 100
|
||||
```
|
||||
|
||||
### Check Container Status
|
||||
|
||||
```bash
|
||||
docker ps | grep faster-whisper
|
||||
```
|
||||
|
||||
### Restart on Issues
|
||||
|
||||
```bash
|
||||
docker restart faster-whisper
|
||||
```
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Container Won't Start
|
||||
|
||||
1. Verify NVIDIA Container Toolkit is installed in WSL2:
|
||||
```bash
|
||||
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
|
||||
```
|
||||
|
||||
2. Check permissions on volumes:
|
||||
```bash
|
||||
ls -la /mnt/e/volumes/faster-whisper/
|
||||
```
|
||||
|
||||
3. Ensure Docker Desktop WSL2 integration is enabled:
|
||||
- Open Docker Desktop → Settings → Resources → WSL Integration
|
||||
- Enable integration with Ubuntu-24.04
|
||||
|
||||
### "Out of Memory" Error
|
||||
|
||||
- Reduce the model (e.g., from `turbo` to `small`)
|
||||
- Reduce `WHISPER_BEAM` to 3 or 1
|
||||
- Close other GPU-intensive applications on Windows
|
||||
- Check GPU memory usage: `nvidia-smi`
|
||||
|
||||
### Poor Transcription Quality
|
||||
|
||||
- Increase the model (e.g., from `small` to `turbo`)
|
||||
- Increase `WHISPER_BEAM` to 7 or 10
|
||||
- Check audio quality of source file
|
||||
- Verify the correct language is set in `WHISPER_LANG`
|
||||
|
||||
### WSL2 Specific Issues
|
||||
|
||||
#### GPU Not Detected
|
||||
|
||||
```bash
|
||||
# Check Windows GPU driver version (from PowerShell)
|
||||
nvidia-smi
|
||||
|
||||
# Update WSL2 kernel
|
||||
wsl --update
|
||||
|
||||
# Restart WSL2
|
||||
wsl --shutdown
|
||||
# Then reopen Ubuntu
|
||||
```
|
||||
|
||||
#### Volume Access Issues
|
||||
|
||||
```bash
|
||||
# Check if drive is mounted in WSL2
|
||||
ls /mnt/e/
|
||||
|
||||
# If not mounted, add to /etc/wsl.conf
|
||||
sudo nano /etc/wsl.conf
|
||||
|
||||
# Add these lines:
|
||||
[automount]
|
||||
enabled = true
|
||||
options = "metadata,uid=1000,gid=1000"
|
||||
|
||||
# Restart WSL2
|
||||
wsl --shutdown
|
||||
```
|
||||
|
||||
## 📁 File Structure
|
||||
|
||||
```
|
||||
Windows: E:\volumes\faster-whisper\
|
||||
WSL2: /mnt/e/volumes/faster-whisper/
|
||||
├── audio/ # Audio files to transcribe
|
||||
└── models/ # Whisper models cache
|
||||
```
|
||||
|
||||
## 🪟 Windows Integration
|
||||
|
||||
### Access Files from Windows Explorer
|
||||
|
||||
- Navigate to `\\wsl$\Ubuntu-24.04\mnt\e\volumes\faster-whisper\`
|
||||
- Or directly to `E:\volumes\faster-whisper\`
|
||||
|
||||
### Copy Files to Transcribe
|
||||
|
||||
From Windows:
|
||||
```powershell
|
||||
Copy-Item "C:\path\to\audio.mp3" -Destination "E:\volumes\faster-whisper\audio\"
|
||||
```
|
||||
|
||||
From WSL2:
|
||||
```bash
|
||||
cp /mnt/c/path/to/audio.mp3 /mnt/e/volumes/faster-whisper/audio/
|
||||
```
|
||||
|
||||
## 🔗 Useful Links
|
||||
|
||||
- [LinuxServer Docker Image Documentation](https://docs.linuxserver.io/images/docker-faster-whisper)
|
||||
- [Faster Whisper GitHub](https://github.com/SYSTRAN/faster-whisper)
|
||||
- [OpenAI Whisper Documentation](https://github.com/openai/whisper)
|
||||
- [WSL2 GPU Support](https://docs.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute)
|
||||
|
||||
## 📝 Notes
|
||||
|
||||
- Service automatically restarts unless manually stopped (`restart: unless-stopped`)
|
||||
- On first startup, the model will be downloaded (may take a few minutes)
|
||||
- Supported audio formats: MP3, WAV, M4A, FLAC, OGG, etc.
|
||||
- The service runs in WSL2 but is accessible from Windows
|
||||
- GPU computations are performed on the Windows NVIDIA GPU
|
||||
541
Readme_Ollama.md
Normal file
541
Readme_Ollama.md
Normal file
@@ -0,0 +1,541 @@
|
||||
# Ollama Docker Setup 🦙 (WSL2 + Windows 11)
|
||||
|
||||
Complete guide for running Ollama with Docker Compose and GPU acceleration on WSL2.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [WSL2 Setup](#wsl2-setup)
|
||||
- [Installation](#installation)
|
||||
- [Starting Ollama](#starting-ollama)
|
||||
- [Model Management](#model-management)
|
||||
- [Usage Examples](#usage-examples)
|
||||
- [API Reference](#api-reference)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [Performance Tips](#performance-tips)
|
||||
|
||||
## 🔧 Prerequisites
|
||||
|
||||
### Required Software
|
||||
|
||||
- **Windows 11** with WSL2 enabled
|
||||
- **Ubuntu 24.04** on WSL2
|
||||
- **Docker Desktop for Windows** with WSL2 backend
|
||||
- **NVIDIA GPU** with CUDA support (RTX series recommended)
|
||||
- **NVIDIA Driver** for Windows (latest version)
|
||||
|
||||
### System Requirements
|
||||
|
||||
- Windows 11 Build 22000 or higher
|
||||
- 16GB RAM minimum (32GB recommended for larger models)
|
||||
- 50GB+ free disk space for models
|
||||
- NVIDIA GPU with 8GB+ VRAM
|
||||
|
||||
## 🪟 WSL2 Setup
|
||||
|
||||
### 1. Enable WSL2 (if not already done)
|
||||
|
||||
```powershell
|
||||
# Run in PowerShell as Administrator
|
||||
wsl --install
|
||||
wsl --set-default-version 2
|
||||
|
||||
# Install Ubuntu 24.04
|
||||
wsl --install -d Ubuntu-24.04
|
||||
|
||||
# Verify WSL2 is active
|
||||
wsl --list --verbose
|
||||
```
|
||||
|
||||
### 2. Install Docker Desktop for Windows
|
||||
|
||||
1. Download from [Docker Desktop](https://www.docker.com/products/docker-desktop)
|
||||
2. Install and enable **WSL2 backend** in settings
|
||||
3. Enable integration with Ubuntu-24.04 distro in: Settings → Resources → WSL Integration
|
||||
|
||||
### 3. Verify GPU Support in WSL2
|
||||
|
||||
```bash
|
||||
# Open WSL2 Ubuntu terminal
|
||||
wsl
|
||||
|
||||
# Check NVIDIA driver
|
||||
nvidia-smi
|
||||
|
||||
# You should see your GPU listed
|
||||
```
|
||||
|
||||
**Important**: You do NOT need to install NVIDIA Container Toolkit in WSL2. Docker Desktop handles GPU passthrough automatically.
|
||||
|
||||
### 4. Test Docker GPU Access
|
||||
|
||||
```bash
|
||||
# In WSL2 terminal
|
||||
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
|
||||
```
|
||||
|
||||
If this works, you're ready to go! 🎉
|
||||
|
||||
## 🚀 Installation
|
||||
|
||||
### 1. Create Project Structure in WSL2
|
||||
|
||||
```bash
|
||||
# Open WSL2 terminal
|
||||
wsl
|
||||
|
||||
# Create project directory
|
||||
mkdir -p ~/ollama-docker
|
||||
cd ~/ollama-docker
|
||||
```
|
||||
|
||||
### 2. Create `docker-compose.yml`
|
||||
|
||||
Use the provided `docker-compose.yml` file with the WSL2 path:
|
||||
- Windows path: `E:\volumes\ollama\data`
|
||||
- WSL2 path: `/mnt/e/volumes/ollama/data`
|
||||
|
||||
### 3. Create Volume Directory
|
||||
|
||||
```bash
|
||||
# From WSL2 terminal
|
||||
sudo mkdir -p /mnt/e/volumes/ollama/data
|
||||
|
||||
# Or from Windows PowerShell
|
||||
mkdir E:\volumes\ollama\data
|
||||
```
|
||||
|
||||
## ▶️ Starting Ollama
|
||||
|
||||
```bash
|
||||
# Navigate to project directory
|
||||
cd ~/ollama-docker
|
||||
|
||||
# Start the service
|
||||
docker compose up -d
|
||||
|
||||
# Check logs
|
||||
docker compose logs -f ollama
|
||||
|
||||
# Verify service is running
|
||||
curl http://localhost:11434
|
||||
```
|
||||
|
||||
Expected response: `Ollama is running`
|
||||
|
||||
### Access from Windows
|
||||
|
||||
Ollama is accessible from both WSL2 and Windows:
|
||||
- **WSL2**: `http://localhost:11434`
|
||||
- **Windows**: `http://localhost:11434`
|
||||
|
||||
## 📦 Model Management
|
||||
|
||||
### List Available Models
|
||||
|
||||
```bash
|
||||
# Inside container
|
||||
docker exec -it ollama ollama list
|
||||
|
||||
# Or from WSL2 (if ollama CLI installed)
|
||||
ollama list
|
||||
```
|
||||
|
||||
### Pull/Download Models
|
||||
|
||||
```bash
|
||||
# Pull a model
|
||||
docker exec -it ollama ollama pull llama3.2
|
||||
|
||||
# Popular models
|
||||
docker exec -it ollama ollama pull mistral
|
||||
docker exec -it ollama ollama pull codellama
|
||||
docker exec -it ollama ollama pull phi3
|
||||
docker exec -it ollama ollama pull llama3.2:70b
|
||||
```
|
||||
|
||||
### Model Sizes Reference
|
||||
|
||||
| Model | Parameters | Size | RAM Required | VRAM Required |
|
||||
|-------|-----------|------|--------------|---------------|
|
||||
| `phi3` | 3.8B | ~2.3 GB | 8 GB | 4 GB |
|
||||
| `llama3.2` | 8B | ~4.7 GB | 8 GB | 6 GB |
|
||||
| `mistral` | 7B | ~4.1 GB | 8 GB | 6 GB |
|
||||
| `llama3.2:70b` | 70B | ~40 GB | 64 GB | 48 GB |
|
||||
| `codellama` | 7B | ~3.8 GB | 8 GB | 6 GB |
|
||||
|
||||
### Remove/Unload Models
|
||||
|
||||
```bash
|
||||
# Remove a model from disk
|
||||
docker exec -it ollama ollama rm llama3.2
|
||||
|
||||
# Stop a running model (unload from memory)
|
||||
docker exec -it ollama ollama stop llama3.2
|
||||
|
||||
# Show running models
|
||||
docker exec -it ollama ollama ps
|
||||
```
|
||||
|
||||
### Copy Models Between Systems
|
||||
|
||||
```bash
|
||||
# Export model
|
||||
docker exec ollama ollama show llama3.2 --modelfile > Modelfile
|
||||
|
||||
# Import on another system
|
||||
cat Modelfile | docker exec -i ollama ollama create my-model -f -
|
||||
```
|
||||
|
||||
## 💡 Usage Examples
|
||||
|
||||
### Interactive Chat
|
||||
|
||||
```bash
|
||||
# Start interactive session
|
||||
docker exec -it ollama ollama run llama3.2
|
||||
|
||||
# Chat with specific model
|
||||
docker exec -it ollama ollama run mistral "Explain quantum computing"
|
||||
```
|
||||
|
||||
### Using the API
|
||||
|
||||
#### Generate Completion
|
||||
|
||||
```bash
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "llama3.2",
|
||||
"prompt": "Why is the sky blue?",
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
|
||||
#### Chat Completion
|
||||
|
||||
```bash
|
||||
curl http://localhost:11434/api/chat -d '{
|
||||
"model": "llama3.2",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello! Can you help me with Python?"
|
||||
}
|
||||
],
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
|
||||
#### Streaming Response
|
||||
|
||||
```bash
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "llama3.2",
|
||||
"prompt": "Write a haiku about programming",
|
||||
"stream": true
|
||||
}'
|
||||
```
|
||||
|
||||
### Python Example (from Windows or WSL2)
|
||||
|
||||
```python
|
||||
import requests
|
||||
import json
|
||||
|
||||
def chat_with_ollama(prompt, model="llama3.2"):
|
||||
url = "http://localhost:11434/api/generate"
|
||||
payload = {
|
||||
"model": model,
|
||||
"prompt": prompt,
|
||||
"stream": False
|
||||
}
|
||||
|
||||
response = requests.post(url, json=payload)
|
||||
return response.json()["response"]
|
||||
|
||||
# Usage
|
||||
result = chat_with_ollama("What is Docker?")
|
||||
print(result)
|
||||
```
|
||||
|
||||
### JavaScript Example (from Windows or WSL2)
|
||||
|
||||
```javascript
|
||||
async function chatWithOllama(prompt, model = "llama3.2") {
|
||||
const response = await fetch("http://localhost:11434/api/generate", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({
|
||||
model: model,
|
||||
prompt: prompt,
|
||||
stream: false
|
||||
})
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
return data.response;
|
||||
}
|
||||
|
||||
// Usage
|
||||
chatWithOllama("Explain REST APIs").then(console.log);
|
||||
```
|
||||
|
||||
## 🔌 API Reference
|
||||
|
||||
### Main Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/api/generate` | POST | Generate text completion |
|
||||
| `/api/chat` | POST | Chat completion with conversation history |
|
||||
| `/api/tags` | GET | List available models |
|
||||
| `/api/pull` | POST | Download a model |
|
||||
| `/api/push` | POST | Upload a custom model |
|
||||
| `/api/embeddings` | POST | Generate embeddings |
|
||||
|
||||
### Generate Parameters
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "llama3.2",
|
||||
"prompt": "Your prompt here",
|
||||
"stream": false,
|
||||
"options": {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.9,
|
||||
"top_k": 40,
|
||||
"num_predict": 128,
|
||||
"stop": ["\n"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Container Won't Start
|
||||
|
||||
```bash
|
||||
# Check logs
|
||||
docker compose logs ollama
|
||||
|
||||
# Common issues:
|
||||
# 1. GPU not accessible
|
||||
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
|
||||
|
||||
# 2. Port already in use
|
||||
netstat -ano | findstr :11434 # From Windows PowerShell
|
||||
ss -tulpn | grep 11434 # From WSL2
|
||||
```
|
||||
|
||||
### GPU Not Detected in WSL2
|
||||
|
||||
```powershell
|
||||
# Update NVIDIA driver (from Windows)
|
||||
# Download latest driver from: https://www.nvidia.com/Download/index.aspx
|
||||
|
||||
# Restart WSL2 (from PowerShell)
|
||||
wsl --shutdown
|
||||
wsl
|
||||
|
||||
# Verify GPU
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
### Model Download Fails
|
||||
|
||||
```bash
|
||||
# Check disk space
|
||||
docker exec ollama df -h /root/.ollama
|
||||
|
||||
# Check WSL2 disk space
|
||||
df -h /mnt/e
|
||||
|
||||
# Retry with verbose logging
|
||||
docker exec -it ollama ollama pull llama3.2 --verbose
|
||||
```
|
||||
|
||||
### Out of Memory Errors
|
||||
|
||||
```bash
|
||||
# Check GPU memory
|
||||
nvidia-smi
|
||||
|
||||
# Use smaller model or reduce context
|
||||
docker exec ollama ollama run llama3.2 --num-ctx 2048
|
||||
```
|
||||
|
||||
### WSL2 Disk Space Issues
|
||||
|
||||
```powershell
|
||||
# Compact WSL2 virtual disk (from PowerShell as Admin)
|
||||
wsl --shutdown
|
||||
Optimize-VHD -Path "$env:LOCALAPPDATA\Packages\CanonicalGroupLimited.Ubuntu24.04LTS_*\LocalState\ext4.vhdx" -Mode Full
|
||||
```
|
||||
|
||||
### Docker Desktop Integration Issues
|
||||
|
||||
1. Open Docker Desktop
|
||||
2. Go to **Settings → Resources → WSL Integration**
|
||||
3. Enable integration with **Ubuntu-24.04**
|
||||
4. Click **Apply & Restart**
|
||||
|
||||
### Permission Denied on Volume
|
||||
|
||||
```bash
|
||||
# From WSL2
|
||||
sudo chmod -R 755 /mnt/e/volumes/ollama/data
|
||||
```
|
||||
|
||||
## ⚡ Performance Tips
|
||||
|
||||
### 1. WSL2 Memory Configuration
|
||||
|
||||
Create/edit `.wslconfig` in Windows user directory (`C:\Users\YourName\.wslconfig`):
|
||||
|
||||
```ini
|
||||
[wsl2]
|
||||
memory=16GB
|
||||
processors=8
|
||||
swap=8GB
|
||||
```
|
||||
|
||||
Apply changes:
|
||||
```powershell
|
||||
wsl --shutdown
|
||||
wsl
|
||||
```
|
||||
|
||||
### 2. GPU Memory Optimization
|
||||
|
||||
```yaml
|
||||
# In docker-compose.yml
|
||||
environment:
|
||||
- CUDA_VISIBLE_DEVICES=0
|
||||
- OLLAMA_NUM_GPU=1
|
||||
```
|
||||
|
||||
### 3. Concurrent Requests
|
||||
|
||||
```yaml
|
||||
# In docker-compose.yml
|
||||
environment:
|
||||
- OLLAMA_MAX_LOADED_MODELS=3
|
||||
- OLLAMA_NUM_PARALLEL=4
|
||||
```
|
||||
|
||||
### 4. Context Window
|
||||
|
||||
```bash
|
||||
# Reduce for faster responses
|
||||
docker exec ollama ollama run llama3.2 --num-ctx 2048
|
||||
|
||||
# Increase for longer conversations
|
||||
docker exec ollama ollama run llama3.2 --num-ctx 8192
|
||||
```
|
||||
|
||||
### 5. Model Quantization
|
||||
|
||||
Use quantized models for better performance:
|
||||
```bash
|
||||
# 4-bit quantization (faster, less accurate)
|
||||
docker exec ollama ollama pull llama3.2:q4_0
|
||||
|
||||
# 8-bit quantization (balanced)
|
||||
docker exec ollama ollama pull llama3.2:q8_0
|
||||
```
|
||||
|
||||
### 6. Store Models on SSD
|
||||
|
||||
For best performance, ensure `E:\volumes` is on an SSD, not HDD.
|
||||
|
||||
## 📊 Monitoring
|
||||
|
||||
### Check Resource Usage
|
||||
|
||||
```bash
|
||||
# Container stats
|
||||
docker stats ollama
|
||||
|
||||
# GPU utilization (from WSL2 or Windows)
|
||||
nvidia-smi
|
||||
|
||||
# Continuous monitoring
|
||||
watch -n 1 nvidia-smi
|
||||
```
|
||||
|
||||
### Model Status
|
||||
|
||||
```bash
|
||||
# Show running models
|
||||
docker exec ollama ollama ps
|
||||
|
||||
# Model information
|
||||
docker exec ollama ollama show llama3.2
|
||||
```
|
||||
|
||||
### WSL2 Resource Usage
|
||||
|
||||
```powershell
|
||||
# From Windows PowerShell
|
||||
wsl --list --verbose
|
||||
```
|
||||
|
||||
## 🛑 Stopping and Cleanup
|
||||
|
||||
```bash
|
||||
# Stop service
|
||||
docker compose down
|
||||
|
||||
# Stop and remove volumes
|
||||
docker compose down -v
|
||||
|
||||
# Remove all models
|
||||
docker exec ollama sh -c "rm -rf /root/.ollama/models/*"
|
||||
|
||||
# Shutdown WSL2 (from Windows PowerShell)
|
||||
wsl --shutdown
|
||||
```
|
||||
|
||||
## 🔗 Useful Links
|
||||
|
||||
- [Ollama Official Documentation](https://github.com/ollama/ollama)
|
||||
- [Ollama Model Library](https://ollama.com/library)
|
||||
- [API Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md)
|
||||
- [WSL2 GPU Documentation](https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute)
|
||||
- [Docker Desktop WSL2 Backend](https://docs.docker.com/desktop/wsl/)
|
||||
|
||||
## 🎯 Quick Reference
|
||||
|
||||
### Common Commands
|
||||
|
||||
```bash
|
||||
# Start Ollama
|
||||
docker compose up -d
|
||||
|
||||
# Pull a model
|
||||
docker exec -it ollama ollama pull llama3.2
|
||||
|
||||
# Run interactive chat
|
||||
docker exec -it ollama ollama run llama3.2
|
||||
|
||||
# List models
|
||||
docker exec -it ollama ollama list
|
||||
|
||||
# Check GPU
|
||||
nvidia-smi
|
||||
|
||||
# Stop Ollama
|
||||
docker compose down
|
||||
```
|
||||
|
||||
## 📝 Notes for WSL2 Users
|
||||
|
||||
- **Path Conversion**: Windows `E:\folder` = WSL2 `/mnt/e/folder`
|
||||
- **Performance**: Models stored on Windows drives are accessible but slightly slower
|
||||
- **GPU Passthrough**: Handled automatically by Docker Desktop
|
||||
- **Networking**: `localhost` works from both Windows and WSL2
|
||||
- **Memory**: Configure WSL2 memory in `.wslconfig` for large models
|
||||
|
||||
---
|
||||
|
||||
**Need help?** Open an issue or check the [Ollama Discord](https://discord.gg/ollama)
|
||||
51
docker-compose.yml
Normal file
51
docker-compose.yml
Normal file
@@ -0,0 +1,51 @@
|
||||
services:
|
||||
ollama:
|
||||
image: ollama/ollama:latest
|
||||
container_name: ollama
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "11434:11434"
|
||||
volumes:
|
||||
- /mnt/e/volumes/ollama/data:/root/.ollama
|
||||
environment:
|
||||
- OLLAMA_HOST=0.0.0.0:11434
|
||||
# Optional: Set GPU device if you have multiple GPUs
|
||||
# - NVIDIA_VISIBLE_DEVICES=0
|
||||
command: serve
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
networks:
|
||||
- app-network
|
||||
healthcheck:
|
||||
test: ["CMD", "ollama", "list"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
|
||||
faster-whisper:
|
||||
image: lscr.io/linuxserver/faster-whisper:gpu-legacy
|
||||
container_name: faster-whisper
|
||||
gpus: all
|
||||
environment:
|
||||
- PUID=1000
|
||||
- PGID=1000
|
||||
- TZ=Europe/Paris
|
||||
- WHISPER_MODEL=turbo # bon compromis pour RTX 4060 Ti
|
||||
- WHISPER_LANG=fr
|
||||
- WHISPER_BEAM=5 # précision vs rapidité
|
||||
volumes:
|
||||
- /mnt/e/volumes/faster-whisper/audio:/app
|
||||
- /mnt/e/volumes/faster-whisper/models:/root/.cache/whisper
|
||||
ports:
|
||||
- 10300:10300
|
||||
restart: unless-stopped
|
||||
|
||||
networks:
|
||||
app-network:
|
||||
driver: bridge
|
||||
Reference in New Issue
Block a user