Fixed docker config. Added services

This commit is contained in:
2025-09-17 22:45:33 +02:00
parent da63f1b75b
commit df86a3d998
12 changed files with 513 additions and 41 deletions

View File

@@ -2,11 +2,14 @@
## Overview
MyDocManager is a real-time document processing application that automatically detects files in a monitored directory, processes them asynchronously, and stores the results in a database. The application uses a modern microservices architecture with Redis for task queuing and MongoDB for data persistence.
MyDocManager is a real-time document processing application that automatically detects files in a monitored directory,
processes them asynchronously, and stores the results in a database. The application uses a modern microservices
architecture with Redis for task queuing and MongoDB for data persistence.
## Architecture
### Technology Stack
- **Backend API**: FastAPI (Python 3.12)
- **Task Processing**: Celery with Redis broker
- **Document Processing**: EasyOCR, PyMuPDF, python-docx, pdfplumber
@@ -16,6 +19,7 @@ MyDocManager is a real-time document processing application that automatically d
- **File Monitoring**: Python watchdog library
### Services Architecture
┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Frontend │ │ file- │ │ Redis │ │ Worker │ │ MongoDB │
│ (React) │◄──►│ processor │───►│ (Broker) │◄──►│ (Celery) │───►│ (Results) │
@@ -24,13 +28,13 @@ MyDocManager is a real-time document processing application that automatically d
└─────────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
### Docker Services
1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch
2. **worker**: Celery workers for document processing (OCR, text extraction)
3. **redis**: Message broker for Celery tasks
4. **mongodb**: Final database for processing results
5. **frontend**: React interface for monitoring and file access
## Data Flow
1. **File Detection**: Watchdog monitors target directory in real-time
@@ -42,11 +46,13 @@ MyDocManager is a real-time document processing application that automatically d
## Document Processing Capabilities
### Supported File Types
- **PDF**: Direct text extraction + OCR for scanned documents
- **Word Documents**: .docx text extraction
- **Images**: OCR text recognition (JPG, PNG, etc.)
### Processing Libraries
- **EasyOCR**: Modern OCR engine (80+ languages, deep learning-based)
- **PyMuPDF**: PDF text extraction and manipulation
- **python-docx**: Word document processing
@@ -55,12 +61,15 @@ MyDocManager is a real-time document processing application that automatically d
## Development Environment
### Container-Based Development
The application is designed for container-based development with hot-reload capabilities:
- Source code mounted as volumes for real-time updates
- All services orchestrated via Docker Compose
- Development and production parity
### Key Features
- **Real-time Processing**: Immediate file detection and processing
- **Horizontal Scaling**: Multiple workers can be added easily
- **Fault Tolerance**: Celery provides automatic retry mechanisms
@@ -68,6 +77,7 @@ The application is designed for container-based development with hot-reload capa
- **Hot Reload**: Development changes reflected instantly in containers
### Docker Services
1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch
2. **worker**: Celery workers for document processing (OCR, text extraction)
3. **redis**: Message broker for Celery tasks
@@ -138,6 +148,7 @@ MyDocManager/
## Authentication & User Management
### Security Features
- **JWT Authentication**: Stateless authentication with 24-hour token expiration
- **Password Security**: bcrypt hashing with automatic salting
- **Role-Based Access**: Admin and User roles with granular permissions
@@ -145,16 +156,19 @@ MyDocManager/
- **Auto Admin Creation**: Default admin user created on first startup
### User Roles
- **Admin**: Full access to user management (create, read, update, delete users)
- **User**: Limited access (view own profile, access document processing features)
### Authentication Flow
1. **Login**: User provides credentials → Server validates → Returns JWT token
2. **API Access**: Client includes JWT in Authorization header
3. **Token Validation**: Server verifies token signature and expiration
4. **Role Check**: Server validates user permissions for requested resource
### User Management APIs
```
POST /auth/login # Generate JWT token
GET /users # List all users (admin only)
@@ -164,7 +178,6 @@ DELETE /users/{user_id} # Delete user (admin only)
GET /users/me # Get current user profile (authenticated users)
```
## Docker Commands Reference
### Initial Setup & Build
@@ -248,9 +261,9 @@ docker-compose up --scale worker=3
### Hot-Reload Configuration
- **file-processor**: Hot-reload enabled via `--reload` flag
- Code changes in `src/file-processor/app/` automatically restart FastAPI
- Code changes in `src/file-processor/app/` automatically restart FastAPI
- **worker**: No hot-reload (manual restart required for stability)
- Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
- Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
### Useful Service URLs
@@ -274,41 +287,48 @@ curl -X POST http://localhost:8000/test-task \
# Monitor Celery tasks
docker-compose logs -f worker
```
## Default Admin User
On first startup, the application automatically creates a default admin user:
- **Username**: `admin`
- **Password**: `admin`
- **Role**: `admin`
- **Email**: `admin@mydocmanager.local`
**⚠️ Important**: Change the default admin password immediately after first login in production environments.
**⚠️ Important**: Change the default admin password immediately after first login in production environments.
## Key Implementation Notes
### Python Standards
- **Style**: PEP 8 compliance
- **Documentation**: Google/NumPy docstring format
- **Naming**: snake_case for variables and functions
- **Testing**: pytest with test_i_can_xxx / test_i_cannot_xxx patterns
### Security Best Practices
- **Password Storage**: Never store plain text passwords, always use bcrypt hashing
- **JWT Secrets**: Use strong, randomly generated secret keys in production
- **Token Expiration**: 24-hour expiration with secure signature validation
- **Role Validation**: Server-side role checking for all protected endpoints
### Dependencies Management
- **Package Manager**: pip (standard)
- **External Dependencies**: Listed in each service's requirements.txt
- **Standard Library First**: Prefer standard library when possible
### Testing Strategy
- All code must be testable
- Unit tests for each authentication and user management function
- Integration tests for complete authentication flow
- Tests validated before implementation
### Critical Architecture Decisions Made
1. **JWT Authentication**: Simple token-based auth with 24-hour expiration
2. **Role-Based Access**: Admin/User roles for granular permissions
3. **bcrypt Password Hashing**: Industry-standard password security
@@ -320,31 +340,24 @@ On first startup, the application automatically creates a default admin user:
9. **Container Development**: Hot-reload setup required for development workflow
### Development Process Requirements
1. **Collaborative Validation**: All options must be explained before coding
2. **Test-First Approach**: Test cases defined and validated before implementation
3. **Incremental Development**: Start simple, extend functionality progressively
4. **Error Handling**: Clear problem explanation required before proposing fixes
### Next Implementation Steps
1. ✅ Create docker-compose.yml with all services
2.Define user management and authentication architecture
3. Implement user models and authentication services
4. Create protected API routes for user management
5. Add automatic admin user creation
1.Create docker-compose.yml with all services => Done
2. ✅ Define user management and authentication architecture => Done
3. ✅ Implement user models and authentication services =>
1. models/user.py => Done
2. models/auth.py => Done
3. database/repositories/user_repository.py => Done
4. Add automatic admin user creation if it does not exists
5. Create protected API routes for user management
6. Implement basic FastAPI service structure
7. Add watchdog file monitoring
8. Create Celery task structure
9. Implement document processing tasks
10. Build React monitoring interface with authentication
### prochaines étapes
MongoDB CRUD
Nous devons absolument mocker MongoDB pour les tests unitaires avec pytest-mock
Fichiers à créer:
* app/models/auht.py => déjà fait
* app/models/user.py => déjà fait
* app/database/connection.py
* Utilise les settings pour l'URL MongoDB. Il faut créer un fichier de configuration (app/config/settings.py)
* Fonction get_database() + gestion des erreurs
* Configuration via variables d'environnement
* app/database/repositories/user_repository.py