Fixed docker config. Added services
This commit is contained in:
59
Readme.md
59
Readme.md
@@ -2,11 +2,14 @@
|
||||
|
||||
## Overview
|
||||
|
||||
MyDocManager is a real-time document processing application that automatically detects files in a monitored directory, processes them asynchronously, and stores the results in a database. The application uses a modern microservices architecture with Redis for task queuing and MongoDB for data persistence.
|
||||
MyDocManager is a real-time document processing application that automatically detects files in a monitored directory,
|
||||
processes them asynchronously, and stores the results in a database. The application uses a modern microservices
|
||||
architecture with Redis for task queuing and MongoDB for data persistence.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Technology Stack
|
||||
|
||||
- **Backend API**: FastAPI (Python 3.12)
|
||||
- **Task Processing**: Celery with Redis broker
|
||||
- **Document Processing**: EasyOCR, PyMuPDF, python-docx, pdfplumber
|
||||
@@ -16,6 +19,7 @@ MyDocManager is a real-time document processing application that automatically d
|
||||
- **File Monitoring**: Python watchdog library
|
||||
|
||||
### Services Architecture
|
||||
|
||||
┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Frontend │ │ file- │ │ Redis │ │ Worker │ │ MongoDB │
|
||||
│ (React) │◄──►│ processor │───►│ (Broker) │◄──►│ (Celery) │───►│ (Results) │
|
||||
@@ -24,13 +28,13 @@ MyDocManager is a real-time document processing application that automatically d
|
||||
└─────────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
||||
|
||||
### Docker Services
|
||||
|
||||
1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch
|
||||
2. **worker**: Celery workers for document processing (OCR, text extraction)
|
||||
3. **redis**: Message broker for Celery tasks
|
||||
4. **mongodb**: Final database for processing results
|
||||
5. **frontend**: React interface for monitoring and file access
|
||||
|
||||
|
||||
## Data Flow
|
||||
|
||||
1. **File Detection**: Watchdog monitors target directory in real-time
|
||||
@@ -42,11 +46,13 @@ MyDocManager is a real-time document processing application that automatically d
|
||||
## Document Processing Capabilities
|
||||
|
||||
### Supported File Types
|
||||
|
||||
- **PDF**: Direct text extraction + OCR for scanned documents
|
||||
- **Word Documents**: .docx text extraction
|
||||
- **Images**: OCR text recognition (JPG, PNG, etc.)
|
||||
|
||||
### Processing Libraries
|
||||
|
||||
- **EasyOCR**: Modern OCR engine (80+ languages, deep learning-based)
|
||||
- **PyMuPDF**: PDF text extraction and manipulation
|
||||
- **python-docx**: Word document processing
|
||||
@@ -55,12 +61,15 @@ MyDocManager is a real-time document processing application that automatically d
|
||||
## Development Environment
|
||||
|
||||
### Container-Based Development
|
||||
|
||||
The application is designed for container-based development with hot-reload capabilities:
|
||||
|
||||
- Source code mounted as volumes for real-time updates
|
||||
- All services orchestrated via Docker Compose
|
||||
- Development and production parity
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Real-time Processing**: Immediate file detection and processing
|
||||
- **Horizontal Scaling**: Multiple workers can be added easily
|
||||
- **Fault Tolerance**: Celery provides automatic retry mechanisms
|
||||
@@ -68,6 +77,7 @@ The application is designed for container-based development with hot-reload capa
|
||||
- **Hot Reload**: Development changes reflected instantly in containers
|
||||
|
||||
### Docker Services
|
||||
|
||||
1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch
|
||||
2. **worker**: Celery workers for document processing (OCR, text extraction)
|
||||
3. **redis**: Message broker for Celery tasks
|
||||
@@ -138,6 +148,7 @@ MyDocManager/
|
||||
## Authentication & User Management
|
||||
|
||||
### Security Features
|
||||
|
||||
- **JWT Authentication**: Stateless authentication with 24-hour token expiration
|
||||
- **Password Security**: bcrypt hashing with automatic salting
|
||||
- **Role-Based Access**: Admin and User roles with granular permissions
|
||||
@@ -145,16 +156,19 @@ MyDocManager/
|
||||
- **Auto Admin Creation**: Default admin user created on first startup
|
||||
|
||||
### User Roles
|
||||
|
||||
- **Admin**: Full access to user management (create, read, update, delete users)
|
||||
- **User**: Limited access (view own profile, access document processing features)
|
||||
|
||||
### Authentication Flow
|
||||
|
||||
1. **Login**: User provides credentials → Server validates → Returns JWT token
|
||||
2. **API Access**: Client includes JWT in Authorization header
|
||||
3. **Token Validation**: Server verifies token signature and expiration
|
||||
4. **Role Check**: Server validates user permissions for requested resource
|
||||
|
||||
### User Management APIs
|
||||
|
||||
```
|
||||
POST /auth/login # Generate JWT token
|
||||
GET /users # List all users (admin only)
|
||||
@@ -164,7 +178,6 @@ DELETE /users/{user_id} # Delete user (admin only)
|
||||
GET /users/me # Get current user profile (authenticated users)
|
||||
```
|
||||
|
||||
|
||||
## Docker Commands Reference
|
||||
|
||||
### Initial Setup & Build
|
||||
@@ -248,9 +261,9 @@ docker-compose up --scale worker=3
|
||||
### Hot-Reload Configuration
|
||||
|
||||
- **file-processor**: Hot-reload enabled via `--reload` flag
|
||||
- Code changes in `src/file-processor/app/` automatically restart FastAPI
|
||||
- Code changes in `src/file-processor/app/` automatically restart FastAPI
|
||||
- **worker**: No hot-reload (manual restart required for stability)
|
||||
- Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
|
||||
- Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
|
||||
|
||||
### Useful Service URLs
|
||||
|
||||
@@ -274,41 +287,48 @@ curl -X POST http://localhost:8000/test-task \
|
||||
# Monitor Celery tasks
|
||||
docker-compose logs -f worker
|
||||
```
|
||||
|
||||
## Default Admin User
|
||||
|
||||
On first startup, the application automatically creates a default admin user:
|
||||
|
||||
- **Username**: `admin`
|
||||
- **Password**: `admin`
|
||||
- **Role**: `admin`
|
||||
- **Email**: `admin@mydocmanager.local`
|
||||
**⚠️ Important**: Change the default admin password immediately after first login in production environments.
|
||||
**⚠️ Important**: Change the default admin password immediately after first login in production environments.
|
||||
|
||||
## Key Implementation Notes
|
||||
|
||||
### Python Standards
|
||||
|
||||
- **Style**: PEP 8 compliance
|
||||
- **Documentation**: Google/NumPy docstring format
|
||||
- **Naming**: snake_case for variables and functions
|
||||
- **Testing**: pytest with test_i_can_xxx / test_i_cannot_xxx patterns
|
||||
|
||||
### Security Best Practices
|
||||
|
||||
- **Password Storage**: Never store plain text passwords, always use bcrypt hashing
|
||||
- **JWT Secrets**: Use strong, randomly generated secret keys in production
|
||||
- **Token Expiration**: 24-hour expiration with secure signature validation
|
||||
- **Role Validation**: Server-side role checking for all protected endpoints
|
||||
|
||||
### Dependencies Management
|
||||
|
||||
- **Package Manager**: pip (standard)
|
||||
- **External Dependencies**: Listed in each service's requirements.txt
|
||||
- **Standard Library First**: Prefer standard library when possible
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
- All code must be testable
|
||||
- Unit tests for each authentication and user management function
|
||||
- Integration tests for complete authentication flow
|
||||
- Tests validated before implementation
|
||||
|
||||
### Critical Architecture Decisions Made
|
||||
|
||||
1. **JWT Authentication**: Simple token-based auth with 24-hour expiration
|
||||
2. **Role-Based Access**: Admin/User roles for granular permissions
|
||||
3. **bcrypt Password Hashing**: Industry-standard password security
|
||||
@@ -320,31 +340,24 @@ On first startup, the application automatically creates a default admin user:
|
||||
9. **Container Development**: Hot-reload setup required for development workflow
|
||||
|
||||
### Development Process Requirements
|
||||
|
||||
1. **Collaborative Validation**: All options must be explained before coding
|
||||
2. **Test-First Approach**: Test cases defined and validated before implementation
|
||||
3. **Incremental Development**: Start simple, extend functionality progressively
|
||||
4. **Error Handling**: Clear problem explanation required before proposing fixes
|
||||
|
||||
### Next Implementation Steps
|
||||
1. ✅ Create docker-compose.yml with all services
|
||||
2. ✅ Define user management and authentication architecture
|
||||
3. Implement user models and authentication services
|
||||
4. Create protected API routes for user management
|
||||
5. Add automatic admin user creation
|
||||
|
||||
1. ✅ Create docker-compose.yml with all services => Done
|
||||
2. ✅ Define user management and authentication architecture => Done
|
||||
3. ✅ Implement user models and authentication services =>
|
||||
1. models/user.py => Done
|
||||
2. models/auth.py => Done
|
||||
3. database/repositories/user_repository.py => Done
|
||||
4. Add automatic admin user creation if it does not exists
|
||||
5. Create protected API routes for user management
|
||||
6. Implement basic FastAPI service structure
|
||||
7. Add watchdog file monitoring
|
||||
8. Create Celery task structure
|
||||
9. Implement document processing tasks
|
||||
10. Build React monitoring interface with authentication
|
||||
|
||||
### prochaines étapes
|
||||
MongoDB CRUD
|
||||
Nous devons absolument mocker MongoDB pour les tests unitaires avec pytest-mock
|
||||
Fichiers à créer:
|
||||
* app/models/auht.py => déjà fait
|
||||
* app/models/user.py => déjà fait
|
||||
* app/database/connection.py
|
||||
* Utilise les settings pour l'URL MongoDB. Il faut créer un fichier de configuration (app/config/settings.py)
|
||||
* Fonction get_database() + gestion des erreurs
|
||||
* Configuration via variables d'environnement
|
||||
* app/database/repositories/user_repository.py
|
||||
Reference in New Issue
Block a user