Fixed docker config. Added services

This commit is contained in:
2025-09-17 22:45:33 +02:00
parent da63f1b75b
commit df86a3d998
12 changed files with 513 additions and 41 deletions

View File

@@ -2,11 +2,14 @@
## Overview
MyDocManager is a real-time document processing application that automatically detects files in a monitored directory, processes them asynchronously, and stores the results in a database. The application uses a modern microservices architecture with Redis for task queuing and MongoDB for data persistence.
MyDocManager is a real-time document processing application that automatically detects files in a monitored directory,
processes them asynchronously, and stores the results in a database. The application uses a modern microservices
architecture with Redis for task queuing and MongoDB for data persistence.
## Architecture
### Technology Stack
- **Backend API**: FastAPI (Python 3.12)
- **Task Processing**: Celery with Redis broker
- **Document Processing**: EasyOCR, PyMuPDF, python-docx, pdfplumber
@@ -16,6 +19,7 @@ MyDocManager is a real-time document processing application that automatically d
- **File Monitoring**: Python watchdog library
### Services Architecture
┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Frontend │ │ file- │ │ Redis │ │ Worker │ │ MongoDB │
│ (React) │◄──►│ processor │───►│ (Broker) │◄──►│ (Celery) │───►│ (Results) │
@@ -24,13 +28,13 @@ MyDocManager is a real-time document processing application that automatically d
└─────────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
### Docker Services
1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch
2. **worker**: Celery workers for document processing (OCR, text extraction)
3. **redis**: Message broker for Celery tasks
4. **mongodb**: Final database for processing results
5. **frontend**: React interface for monitoring and file access
## Data Flow
1. **File Detection**: Watchdog monitors target directory in real-time
@@ -42,11 +46,13 @@ MyDocManager is a real-time document processing application that automatically d
## Document Processing Capabilities
### Supported File Types
- **PDF**: Direct text extraction + OCR for scanned documents
- **Word Documents**: .docx text extraction
- **Images**: OCR text recognition (JPG, PNG, etc.)
### Processing Libraries
- **EasyOCR**: Modern OCR engine (80+ languages, deep learning-based)
- **PyMuPDF**: PDF text extraction and manipulation
- **python-docx**: Word document processing
@@ -55,12 +61,15 @@ MyDocManager is a real-time document processing application that automatically d
## Development Environment
### Container-Based Development
The application is designed for container-based development with hot-reload capabilities:
- Source code mounted as volumes for real-time updates
- All services orchestrated via Docker Compose
- Development and production parity
### Key Features
- **Real-time Processing**: Immediate file detection and processing
- **Horizontal Scaling**: Multiple workers can be added easily
- **Fault Tolerance**: Celery provides automatic retry mechanisms
@@ -68,6 +77,7 @@ The application is designed for container-based development with hot-reload capa
- **Hot Reload**: Development changes reflected instantly in containers
### Docker Services
1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch
2. **worker**: Celery workers for document processing (OCR, text extraction)
3. **redis**: Message broker for Celery tasks
@@ -138,6 +148,7 @@ MyDocManager/
## Authentication & User Management
### Security Features
- **JWT Authentication**: Stateless authentication with 24-hour token expiration
- **Password Security**: bcrypt hashing with automatic salting
- **Role-Based Access**: Admin and User roles with granular permissions
@@ -145,16 +156,19 @@ MyDocManager/
- **Auto Admin Creation**: Default admin user created on first startup
### User Roles
- **Admin**: Full access to user management (create, read, update, delete users)
- **User**: Limited access (view own profile, access document processing features)
### Authentication Flow
1. **Login**: User provides credentials → Server validates → Returns JWT token
2. **API Access**: Client includes JWT in Authorization header
3. **Token Validation**: Server verifies token signature and expiration
4. **Role Check**: Server validates user permissions for requested resource
### User Management APIs
```
POST /auth/login # Generate JWT token
GET /users # List all users (admin only)
@@ -164,7 +178,6 @@ DELETE /users/{user_id} # Delete user (admin only)
GET /users/me # Get current user profile (authenticated users)
```
## Docker Commands Reference
### Initial Setup & Build
@@ -274,41 +287,48 @@ curl -X POST http://localhost:8000/test-task \
# Monitor Celery tasks
docker-compose logs -f worker
```
## Default Admin User
On first startup, the application automatically creates a default admin user:
- **Username**: `admin`
- **Password**: `admin`
- **Role**: `admin`
- **Email**: `admin@mydocmanager.local`
**⚠️ Important**: Change the default admin password immediately after first login in production environments.
**⚠️ Important**: Change the default admin password immediately after first login in production environments.
## Key Implementation Notes
### Python Standards
- **Style**: PEP 8 compliance
- **Documentation**: Google/NumPy docstring format
- **Naming**: snake_case for variables and functions
- **Testing**: pytest with test_i_can_xxx / test_i_cannot_xxx patterns
### Security Best Practices
- **Password Storage**: Never store plain text passwords, always use bcrypt hashing
- **JWT Secrets**: Use strong, randomly generated secret keys in production
- **Token Expiration**: 24-hour expiration with secure signature validation
- **Role Validation**: Server-side role checking for all protected endpoints
### Dependencies Management
- **Package Manager**: pip (standard)
- **External Dependencies**: Listed in each service's requirements.txt
- **Standard Library First**: Prefer standard library when possible
### Testing Strategy
- All code must be testable
- Unit tests for each authentication and user management function
- Integration tests for complete authentication flow
- Tests validated before implementation
### Critical Architecture Decisions Made
1. **JWT Authentication**: Simple token-based auth with 24-hour expiration
2. **Role-Based Access**: Admin/User roles for granular permissions
3. **bcrypt Password Hashing**: Industry-standard password security
@@ -320,31 +340,24 @@ On first startup, the application automatically creates a default admin user:
9. **Container Development**: Hot-reload setup required for development workflow
### Development Process Requirements
1. **Collaborative Validation**: All options must be explained before coding
2. **Test-First Approach**: Test cases defined and validated before implementation
3. **Incremental Development**: Start simple, extend functionality progressively
4. **Error Handling**: Clear problem explanation required before proposing fixes
### Next Implementation Steps
1. ✅ Create docker-compose.yml with all services
2.Define user management and authentication architecture
3. Implement user models and authentication services
4. Create protected API routes for user management
5. Add automatic admin user creation
1.Create docker-compose.yml with all services => Done
2. ✅ Define user management and authentication architecture => Done
3. ✅ Implement user models and authentication services =>
1. models/user.py => Done
2. models/auth.py => Done
3. database/repositories/user_repository.py => Done
4. Add automatic admin user creation if it does not exists
5. Create protected API routes for user management
6. Implement basic FastAPI service structure
7. Add watchdog file monitoring
8. Create Celery task structure
9. Implement document processing tasks
10. Build React monitoring interface with authentication
### prochaines étapes
MongoDB CRUD
Nous devons absolument mocker MongoDB pour les tests unitaires avec pytest-mock
Fichiers à créer:
* app/models/auht.py => déjà fait
* app/models/user.py => déjà fait
* app/database/connection.py
* Utilise les settings pour l'URL MongoDB. Il faut créer un fichier de configuration (app/config/settings.py)
* Fonction get_database() + gestion des erreurs
* Configuration via variables d'environnement
* app/database/repositories/user_repository.py

View File

@@ -1,5 +1,3 @@
version: '3.8'
services:
# Redis - Message broker for Celery
redis:
@@ -36,15 +34,16 @@ services:
environment:
- REDIS_URL=redis://redis:6379/0
- MONGODB_URL=mongodb://admin:password123@mongodb:27017/mydocmanager?authSource=admin
- PYTHONPATH=/app
volumes:
- ./src/file-processor/app:/app
- ./src/file-processor:/app
- ./volumes/watched_files:/watched_files
depends_on:
- redis
- mongodb
networks:
- mydocmanager-network
command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Worker - Celery workers for document processing
worker:
@@ -55,6 +54,7 @@ services:
environment:
- REDIS_URL=redis://redis:6379/0
- MONGODB_URL=mongodb://admin:password123@mongodb:27017/mydocmanager?authSource=admin
- PYTHONPATH=/app
volumes:
- ./src/worker/tasks:/app
- ./volumes/watched_files:/watched_files

View File

@@ -8,10 +8,12 @@ COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY app/ .
COPY . .
ENV PYTHONPATH=/app
# Expose port
EXPOSE 8000
# Command will be overridden by docker-compose
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

View File

@@ -11,7 +11,7 @@ from pymongo import MongoClient
from pymongo.database import Database
from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError
from config.settings import get_mongodb_url, get_mongodb_database_name
from app.config.settings import get_mongodb_url, get_mongodb_database_name
# Global variables for singleton pattern
_client: Optional[MongoClient] = None

View File

@@ -13,7 +13,7 @@ from pymongo.errors import DuplicateKeyError
from pymongo.collection import Collection
from app.models.user import UserCreate, UserInDB, UserUpdate
from utils.security import hash_password
from app.utils.security import hash_password
class UserRepository:

View File

@@ -4,19 +4,74 @@ FastAPI application for MyDocManager file processor service.
This service provides API endpoints for health checks and task dispatching.
"""
import logging
import os
from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
import redis
from celery import Celery
from database.connection import test_database_connection
from app.database.connection import test_database_connection, get_database
from app.database.repositories.user_repository import UserRepository
from app.models.user import UserCreate
from app.services.init_service import InitializationService
from app.services.user_service import UserService
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@asynccontextmanager
async def lifespan(app: FastAPI):
"""
Application lifespan manager for startup and shutdown tasks.
Handles initialization tasks that need to run when the application starts,
including admin user creation and other setup procedures.
"""
# Startup tasks
logger.info("Starting MyDocManager application...")
try:
# Initialize database connection
database = get_database()
# Initialize repositories and services
user_repository = UserRepository(database)
user_service = UserService(user_repository)
init_service = InitializationService(user_service)
# Run initialization tasks
initialization_result = init_service.initialize_application()
if initialization_result["initialization_success"]:
logger.info("Application startup completed successfully")
if initialization_result["admin_user_created"]:
logger.info("Default admin user was created during startup")
else:
logger.error("Application startup completed with errors:")
for error in initialization_result["errors"]:
logger.error(f" - {error}")
except Exception as e:
logger.error(f"Critical error during application startup: {str(e)}")
# You might want to decide if the app should continue or exit here
# For now, we log the error but continue
yield # Application is running
# Shutdown tasks (if needed)
logger.info("Shutting down MyDocManager application...")
# Initialize FastAPI app
app = FastAPI(
title="MyDocManager File Processor",
description="File processing and task dispatch service",
version="1.0.0"
version="1.0.0",
lifespan=lifespan
)
# Environment variables
@@ -44,6 +99,27 @@ class TestTaskRequest(BaseModel):
message: str
def get_user_service() -> UserService:
"""
Dependency to get user service instance.
This should be properly implemented with database connection management
in your actual application.
"""
database = get_database()
user_repository = UserRepository(database)
return UserService(user_repository)
# Your API routes would use the service like this:
@app.post("/api/users")
async def create_user(
user_data: UserCreate,
user_service: UserService = Depends(get_user_service)
):
return user_service.create_user(user_data)
@app.get("/health")
async def health_check():
"""

View File

@@ -100,14 +100,19 @@ def validate_username_not_empty(username: str) -> str:
return username.strip()
class UserCreate(BaseModel):
class UserCreateNoValidation(BaseModel):
"""Model for creating a new user."""
username: str
email: EmailStr
email: str
password: str
role: UserRole = UserRole.USER
class UserCreate(UserCreateNoValidation):
"""Model for creating a new user."""
email: EmailStr
@field_validator('username')
@classmethod
def validate_username(cls, v):

View File

@@ -0,0 +1,58 @@
"""
Authentication service for password hashing and verification.
This module provides authentication-related functionality including
password hashing, verification, and JWT token management.
"""
from app.utils.security import hash_password, verify_password
class AuthService:
"""
Service class for authentication operations.
Handles password hashing, verification, and other authentication
related operations with proper security practices.
"""
@staticmethod
def hash_user_password(password: str) -> str:
"""
Hash a plaintext password for secure storage.
Args:
password (str): Plaintext password to hash
Returns:
str: Hashed password safe for database storage
Example:
>>> auth = AuthService()
>>> hashed = auth.hash_user_password("mypassword123")
>>> len(hashed) > 0
True
"""
return hash_password(password)
@staticmethod
def verify_user_password(password: str, hashed_password: str) -> bool:
"""
Verify a password against its hash.
Args:
password (str): Plaintext password to verify
hashed_password (str): Stored hashed password
Returns:
bool: True if password matches hash, False otherwise
Example:
>>> auth = AuthService()
>>> hashed = auth.hash_user_password("mypassword123")
>>> auth.verify_user_password("mypassword123", hashed)
True
>>> auth.verify_user_password("wrongpassword", hashed)
False
"""
return verify_password(password, hashed_password)

View File

@@ -0,0 +1,134 @@
"""
Initialization service for application startup tasks.
This module handles application initialization tasks including
creating default admin user if none exists.
"""
import logging
from typing import Optional
from app.models.user import UserCreate, UserInDB, UserCreateNoValidation
from app.models.auth import UserRole
from app.services.user_service import UserService
logger = logging.getLogger(__name__)
class InitializationService:
"""
Service for handling application initialization tasks.
This service manages startup operations like ensuring required
users exist and system is properly configured.
"""
def __init__(self, user_service: UserService):
"""
Initialize service with user service dependency.
Args:
user_service (UserService): Service for user operations
"""
self.user_service = user_service
def ensure_admin_user_exists(self) -> Optional[UserInDB]:
"""
Ensure default admin user exists in the system.
Creates a default admin user if no admin user exists in the system.
Uses default credentials that should be changed after first login.
Returns:
UserInDB or None: Created admin user if created, None if already exists
Raises:
Exception: If admin user creation fails
"""
logger.info("Checking if admin user exists...")
# Check if any admin user already exists
if self._admin_user_exists():
logger.info("Admin user already exists, skipping creation")
return None
logger.info("No admin user found, creating default admin user...")
try:
# Create default admin user
admin_data = UserCreateNoValidation(
username="admin",
email="admin@mydocmanager.local",
password="admin", # Should be changed after first login
role=UserRole.ADMIN
)
created_user = self.user_service.create_user(admin_data)
logger.info(f"Default admin user created successfully with ID: {created_user.id}")
logger.warning(
"Default admin user created with username 'admin' and password 'admin'. "
"Please change these credentials immediately for security!"
)
return created_user
except Exception as e:
logger.error(f"Failed to create default admin user: {str(e)}")
raise Exception(f"Admin user creation failed: {str(e)}")
def _admin_user_exists(self) -> bool:
"""
Check if any admin user exists in the system.
Returns:
bool: True if at least one admin user exists, False otherwise
"""
try:
# Get all users and check if any have admin role
users = self.user_service.list_users(limit=1000) # Reasonable limit for admin check
for user in users:
if user.role == UserRole.ADMIN and user.is_active:
return True
return False
except Exception as e:
logger.error(f"Error checking for admin users: {str(e)}")
# In case of error, assume admin exists to avoid creating duplicates
return True
def initialize_application(self) -> dict:
"""
Perform all application initialization tasks.
This method runs all necessary initialization procedures including
admin user creation and any other startup requirements.
Returns:
dict: Summary of initialization tasks performed
"""
logger.info("Starting application initialization...")
initialization_summary = {
"admin_user_created": False,
"initialization_success": False,
"errors": []
}
try:
# Ensure admin user exists
created_admin = self.ensure_admin_user_exists()
if created_admin:
initialization_summary["admin_user_created"] = True
initialization_summary["initialization_success"] = True
logger.info("Application initialization completed successfully")
except Exception as e:
error_msg = f"Application initialization failed: {str(e)}"
logger.error(error_msg)
initialization_summary["errors"].append(error_msg)
return initialization_summary

View File

@@ -0,0 +1,181 @@
"""
User service for business logic operations.
This module provides user-related business logic including user creation,
retrieval, updates, and authentication operations with proper error handling.
"""
from typing import Optional, List
from pymongo.errors import DuplicateKeyError
from app.models.user import UserCreate, UserInDB, UserUpdate, UserResponse, UserCreateNoValidation
from app.models.auth import UserRole
from app.database.repositories.user_repository import UserRepository
from app.services.auth_service import AuthService
class UserService:
"""
Service class for user business logic operations.
This class handles user-related operations including creation,
authentication, and data management with proper validation.
"""
def __init__(self, user_repository: UserRepository):
"""
Initialize user service with repository dependency.
Args:
user_repository (UserRepository): Repository for user data operations
"""
self.user_repository = user_repository
self.auth_service = AuthService()
def create_user(self, user_data: UserCreate | UserCreateNoValidation) -> UserInDB:
"""
Create a new user with business logic validation.
Args:
user_data (UserCreate): User creation data
Returns:
UserInDB: Created user with database information
Raises:
ValueError: If user already exists or validation fails
"""
# Check if user already exists
if self.user_repository.user_exists(user_data.username):
raise ValueError(f"User with username '{user_data.username}' already exists")
# Check if email already exists
existing_user = self.user_repository.find_user_by_email(user_data.email)
if existing_user:
raise ValueError(f"User with email '{user_data.email}' already exists")
try:
return self.user_repository.create_user(user_data)
except DuplicateKeyError:
raise ValueError(f"User with username '{user_data.username}' already exists")
def get_user_by_username(self, username: str) -> Optional[UserInDB]:
"""
Retrieve user by username.
Args:
username (str): Username to search for
Returns:
UserInDB or None: User if found, None otherwise
"""
return self.user_repository.find_user_by_username(username)
def get_user_by_id(self, user_id: str) -> Optional[UserInDB]:
"""
Retrieve user by ID.
Args:
user_id (str): User ID to search for
Returns:
UserInDB or None: User if found, None otherwise
"""
return self.user_repository.find_user_by_id(user_id)
def authenticate_user(self, username: str, password: str) -> Optional[UserInDB]:
"""
Authenticate user with username and password.
Args:
username (str): Username for authentication
password (str): Password for authentication
Returns:
UserInDB or None: Authenticated user if valid, None otherwise
"""
user = self.user_repository.find_user_by_username(username)
if not user:
return None
if not user.is_active:
return None
if not self.auth_service.verify_user_password(password, user.hashed_password):
return None
return user
def update_user(self, user_id: str, user_update: UserUpdate) -> Optional[UserInDB]:
"""
Update user information.
Args:
user_id (str): User ID to update
user_update (UserUpdate): Updated user data
Returns:
UserInDB or None: Updated user if successful, None otherwise
Raises:
ValueError: If username or email already exists for different user
"""
# Validate username uniqueness if being updated
if user_update.username is not None:
existing_user = self.user_repository.find_user_by_username(user_update.username)
if existing_user and str(existing_user.id) != user_id:
raise ValueError(f"Username '{user_update.username}' is already taken")
# Validate email uniqueness if being updated
if user_update.email is not None:
existing_user = self.user_repository.find_user_by_email(user_update.email)
if existing_user and str(existing_user.id) != user_id:
raise ValueError(f"Email '{user_update.email}' is already taken")
return self.user_repository.update_user(user_id, user_update)
def delete_user(self, user_id: str) -> bool:
"""
Delete user from system.
Args:
user_id (str): User ID to delete
Returns:
bool: True if user was deleted, False otherwise
"""
return self.user_repository.delete_user(user_id)
def list_users(self, skip: int = 0, limit: int = 100) -> List[UserInDB]:
"""
List users with pagination.
Args:
skip (int): Number of users to skip (default: 0)
limit (int): Maximum number of users to return (default: 100)
Returns:
List[UserInDB]: List of users
"""
return self.user_repository.list_users(skip=skip, limit=limit)
def count_users(self) -> int:
"""
Count total number of users.
Returns:
int: Total number of users in system
"""
return self.user_repository.count_users()
def user_exists(self, username: str) -> bool:
"""
Check if user exists by username.
Args:
username (str): Username to check
Returns:
bool: True if user exists, False otherwise
"""
return self.user_repository.user_exists(username)

View File

@@ -1,6 +1,9 @@
fastapi==0.116.1
uvicorn==0.35.0
bcrypt==4.3.0
celery==5.5.3
redis==6.4.0
email-validator==2.3.0
fastapi==0.116.1
httptools==0.6.4
pymongo==4.15.0
pydantic==2.11.9
redis==6.4.0
uvicorn==0.35.0