diff --git a/Readme.md b/Readme.md index cb2dc79..31be15e 100644 --- a/Readme.md +++ b/Readme.md @@ -2,11 +2,14 @@ ## Overview -MyDocManager is a real-time document processing application that automatically detects files in a monitored directory, processes them asynchronously, and stores the results in a database. The application uses a modern microservices architecture with Redis for task queuing and MongoDB for data persistence. +MyDocManager is a real-time document processing application that automatically detects files in a monitored directory, +processes them asynchronously, and stores the results in a database. The application uses a modern microservices +architecture with Redis for task queuing and MongoDB for data persistence. ## Architecture ### Technology Stack + - **Backend API**: FastAPI (Python 3.12) - **Task Processing**: Celery with Redis broker - **Document Processing**: EasyOCR, PyMuPDF, python-docx, pdfplumber @@ -16,6 +19,7 @@ MyDocManager is a real-time document processing application that automatically d - **File Monitoring**: Python watchdog library ### Services Architecture + ┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Frontend │ │ file- │ │ Redis │ │ Worker │ │ MongoDB │ │ (React) │◄──►│ processor │───►│ (Broker) │◄──►│ (Celery) │───►│ (Results) │ @@ -24,13 +28,13 @@ MyDocManager is a real-time document processing application that automatically d └─────────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ ### Docker Services + 1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch 2. **worker**: Celery workers for document processing (OCR, text extraction) 3. **redis**: Message broker for Celery tasks 4. **mongodb**: Final database for processing results 5. **frontend**: React interface for monitoring and file access - ## Data Flow 1. **File Detection**: Watchdog monitors target directory in real-time @@ -42,11 +46,13 @@ MyDocManager is a real-time document processing application that automatically d ## Document Processing Capabilities ### Supported File Types + - **PDF**: Direct text extraction + OCR for scanned documents - **Word Documents**: .docx text extraction - **Images**: OCR text recognition (JPG, PNG, etc.) ### Processing Libraries + - **EasyOCR**: Modern OCR engine (80+ languages, deep learning-based) - **PyMuPDF**: PDF text extraction and manipulation - **python-docx**: Word document processing @@ -55,12 +61,15 @@ MyDocManager is a real-time document processing application that automatically d ## Development Environment ### Container-Based Development + The application is designed for container-based development with hot-reload capabilities: + - Source code mounted as volumes for real-time updates - All services orchestrated via Docker Compose - Development and production parity ### Key Features + - **Real-time Processing**: Immediate file detection and processing - **Horizontal Scaling**: Multiple workers can be added easily - **Fault Tolerance**: Celery provides automatic retry mechanisms @@ -68,6 +77,7 @@ The application is designed for container-based development with hot-reload capa - **Hot Reload**: Development changes reflected instantly in containers ### Docker Services + 1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch 2. **worker**: Celery workers for document processing (OCR, text extraction) 3. **redis**: Message broker for Celery tasks @@ -138,6 +148,7 @@ MyDocManager/ ## Authentication & User Management ### Security Features + - **JWT Authentication**: Stateless authentication with 24-hour token expiration - **Password Security**: bcrypt hashing with automatic salting - **Role-Based Access**: Admin and User roles with granular permissions @@ -145,16 +156,19 @@ MyDocManager/ - **Auto Admin Creation**: Default admin user created on first startup ### User Roles + - **Admin**: Full access to user management (create, read, update, delete users) - **User**: Limited access (view own profile, access document processing features) ### Authentication Flow + 1. **Login**: User provides credentials → Server validates → Returns JWT token 2. **API Access**: Client includes JWT in Authorization header 3. **Token Validation**: Server verifies token signature and expiration 4. **Role Check**: Server validates user permissions for requested resource ### User Management APIs + ``` POST /auth/login # Generate JWT token GET /users # List all users (admin only) @@ -164,7 +178,6 @@ DELETE /users/{user_id} # Delete user (admin only) GET /users/me # Get current user profile (authenticated users) ``` - ## Docker Commands Reference ### Initial Setup & Build @@ -248,9 +261,9 @@ docker-compose up --scale worker=3 ### Hot-Reload Configuration - **file-processor**: Hot-reload enabled via `--reload` flag - - Code changes in `src/file-processor/app/` automatically restart FastAPI + - Code changes in `src/file-processor/app/` automatically restart FastAPI - **worker**: No hot-reload (manual restart required for stability) - - Code changes in `src/worker/tasks/` require: `docker-compose restart worker` + - Code changes in `src/worker/tasks/` require: `docker-compose restart worker` ### Useful Service URLs @@ -274,41 +287,48 @@ curl -X POST http://localhost:8000/test-task \ # Monitor Celery tasks docker-compose logs -f worker ``` + ## Default Admin User On first startup, the application automatically creates a default admin user: + - **Username**: `admin` - **Password**: `admin` - **Role**: `admin` - **Email**: `admin@mydocmanager.local` -**⚠️ Important**: Change the default admin password immediately after first login in production environments. + **⚠️ Important**: Change the default admin password immediately after first login in production environments. ## Key Implementation Notes ### Python Standards + - **Style**: PEP 8 compliance - **Documentation**: Google/NumPy docstring format - **Naming**: snake_case for variables and functions - **Testing**: pytest with test_i_can_xxx / test_i_cannot_xxx patterns ### Security Best Practices + - **Password Storage**: Never store plain text passwords, always use bcrypt hashing - **JWT Secrets**: Use strong, randomly generated secret keys in production - **Token Expiration**: 24-hour expiration with secure signature validation - **Role Validation**: Server-side role checking for all protected endpoints ### Dependencies Management + - **Package Manager**: pip (standard) - **External Dependencies**: Listed in each service's requirements.txt - **Standard Library First**: Prefer standard library when possible ### Testing Strategy + - All code must be testable - Unit tests for each authentication and user management function - Integration tests for complete authentication flow - Tests validated before implementation ### Critical Architecture Decisions Made + 1. **JWT Authentication**: Simple token-based auth with 24-hour expiration 2. **Role-Based Access**: Admin/User roles for granular permissions 3. **bcrypt Password Hashing**: Industry-standard password security @@ -320,31 +340,24 @@ On first startup, the application automatically creates a default admin user: 9. **Container Development**: Hot-reload setup required for development workflow ### Development Process Requirements + 1. **Collaborative Validation**: All options must be explained before coding 2. **Test-First Approach**: Test cases defined and validated before implementation 3. **Incremental Development**: Start simple, extend functionality progressively 4. **Error Handling**: Clear problem explanation required before proposing fixes ### Next Implementation Steps -1. ✅ Create docker-compose.yml with all services -2. ✅ Define user management and authentication architecture -3. Implement user models and authentication services -4. Create protected API routes for user management -5. Add automatic admin user creation + +1. ✅ Create docker-compose.yml with all services => Done +2. ✅ Define user management and authentication architecture => Done +3. ✅ Implement user models and authentication services => + 1. models/user.py => Done + 2. models/auth.py => Done + 3. database/repositories/user_repository.py => Done +4. Add automatic admin user creation if it does not exists +5. Create protected API routes for user management 6. Implement basic FastAPI service structure 7. Add watchdog file monitoring 8. Create Celery task structure 9. Implement document processing tasks 10. Build React monitoring interface with authentication - -### prochaines étapes -MongoDB CRUD -Nous devons absolument mocker MongoDB pour les tests unitaires avec pytest-mock -Fichiers à créer: -* app/models/auht.py => déjà fait -* app/models/user.py => déjà fait -* app/database/connection.py - * Utilise les settings pour l'URL MongoDB. Il faut créer un fichier de configuration (app/config/settings.py) - * Fonction get_database() + gestion des erreurs - * Configuration via variables d'environnement -* app/database/repositories/user_repository.py \ No newline at end of file diff --git a/docker-compose.yml b/docker-compose.yml index cbc28bc..57b85e6 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,5 +1,3 @@ -version: '3.8' - services: # Redis - Message broker for Celery redis: @@ -36,15 +34,16 @@ services: environment: - REDIS_URL=redis://redis:6379/0 - MONGODB_URL=mongodb://admin:password123@mongodb:27017/mydocmanager?authSource=admin + - PYTHONPATH=/app volumes: - - ./src/file-processor/app:/app + - ./src/file-processor:/app - ./volumes/watched_files:/watched_files depends_on: - redis - mongodb networks: - mydocmanager-network - command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload + command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload # Worker - Celery workers for document processing worker: @@ -55,6 +54,7 @@ services: environment: - REDIS_URL=redis://redis:6379/0 - MONGODB_URL=mongodb://admin:password123@mongodb:27017/mydocmanager?authSource=admin + - PYTHONPATH=/app volumes: - ./src/worker/tasks:/app - ./volumes/watched_files:/watched_files diff --git a/src/file-processor/Dockerfile b/src/file-processor/Dockerfile index 86d00c5..62477fd 100644 --- a/src/file-processor/Dockerfile +++ b/src/file-processor/Dockerfile @@ -8,10 +8,12 @@ COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code -COPY app/ . +COPY . . + +ENV PYTHONPATH=/app # Expose port EXPOSE 8000 # Command will be overridden by docker-compose -CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] \ No newline at end of file +CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] \ No newline at end of file diff --git a/src/file-processor/app/database/connection.py b/src/file-processor/app/database/connection.py index 38fd462..bba8f82 100644 --- a/src/file-processor/app/database/connection.py +++ b/src/file-processor/app/database/connection.py @@ -11,7 +11,7 @@ from pymongo import MongoClient from pymongo.database import Database from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError -from config.settings import get_mongodb_url, get_mongodb_database_name +from app.config.settings import get_mongodb_url, get_mongodb_database_name # Global variables for singleton pattern _client: Optional[MongoClient] = None diff --git a/src/file-processor/app/database/repositories/user_repository.py b/src/file-processor/app/database/repositories/user_repository.py index 6398b3a..c227476 100644 --- a/src/file-processor/app/database/repositories/user_repository.py +++ b/src/file-processor/app/database/repositories/user_repository.py @@ -13,7 +13,7 @@ from pymongo.errors import DuplicateKeyError from pymongo.collection import Collection from app.models.user import UserCreate, UserInDB, UserUpdate -from utils.security import hash_password +from app.utils.security import hash_password class UserRepository: diff --git a/src/file-processor/app/main.py b/src/file-processor/app/main.py index 94afad8..f4e493f 100644 --- a/src/file-processor/app/main.py +++ b/src/file-processor/app/main.py @@ -4,19 +4,74 @@ FastAPI application for MyDocManager file processor service. This service provides API endpoints for health checks and task dispatching. """ +import logging import os -from fastapi import FastAPI, HTTPException +from contextlib import asynccontextmanager +from fastapi import FastAPI, HTTPException, Depends from pydantic import BaseModel import redis from celery import Celery -from database.connection import test_database_connection +from app.database.connection import test_database_connection, get_database +from app.database.repositories.user_repository import UserRepository +from app.models.user import UserCreate +from app.services.init_service import InitializationService +from app.services.user_service import UserService + +# Configure logging +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + + +@asynccontextmanager +async def lifespan(app: FastAPI): + """ + Application lifespan manager for startup and shutdown tasks. + + Handles initialization tasks that need to run when the application starts, + including admin user creation and other setup procedures. + """ + # Startup tasks + logger.info("Starting MyDocManager application...") + + try: + # Initialize database connection + database = get_database() + + # Initialize repositories and services + user_repository = UserRepository(database) + user_service = UserService(user_repository) + init_service = InitializationService(user_service) + + # Run initialization tasks + initialization_result = init_service.initialize_application() + + if initialization_result["initialization_success"]: + logger.info("Application startup completed successfully") + if initialization_result["admin_user_created"]: + logger.info("Default admin user was created during startup") + else: + logger.error("Application startup completed with errors:") + for error in initialization_result["errors"]: + logger.error(f" - {error}") + + except Exception as e: + logger.error(f"Critical error during application startup: {str(e)}") + # You might want to decide if the app should continue or exit here + # For now, we log the error but continue + + yield # Application is running + + # Shutdown tasks (if needed) + logger.info("Shutting down MyDocManager application...") + # Initialize FastAPI app app = FastAPI( title="MyDocManager File Processor", description="File processing and task dispatch service", - version="1.0.0" + version="1.0.0", + lifespan=lifespan ) # Environment variables @@ -44,6 +99,27 @@ class TestTaskRequest(BaseModel): message: str +def get_user_service() -> UserService: + """ + Dependency to get user service instance. + + This should be properly implemented with database connection management + in your actual application. + """ + database = get_database() + user_repository = UserRepository(database) + return UserService(user_repository) + + +# Your API routes would use the service like this: +@app.post("/api/users") +async def create_user( + user_data: UserCreate, + user_service: UserService = Depends(get_user_service) +): + return user_service.create_user(user_data) + + @app.get("/health") async def health_check(): """ @@ -125,4 +201,4 @@ async def root(): "service": "MyDocManager File Processor", "version": "1.0.0", "status": "running" - } \ No newline at end of file + } diff --git a/src/file-processor/app/models/user.py b/src/file-processor/app/models/user.py index 39b9fd0..c11a068 100644 --- a/src/file-processor/app/models/user.py +++ b/src/file-processor/app/models/user.py @@ -100,13 +100,18 @@ def validate_username_not_empty(username: str) -> str: return username.strip() -class UserCreate(BaseModel): +class UserCreateNoValidation(BaseModel): """Model for creating a new user.""" username: str - email: EmailStr + email: str password: str role: UserRole = UserRole.USER + + +class UserCreate(UserCreateNoValidation): + """Model for creating a new user.""" + email: EmailStr @field_validator('username') @classmethod diff --git a/src/file-processor/app/services/__init__.py b/src/file-processor/app/services/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/file-processor/app/services/auth_service.py b/src/file-processor/app/services/auth_service.py new file mode 100644 index 0000000..a7037d3 --- /dev/null +++ b/src/file-processor/app/services/auth_service.py @@ -0,0 +1,58 @@ +""" +Authentication service for password hashing and verification. + +This module provides authentication-related functionality including +password hashing, verification, and JWT token management. +""" + +from app.utils.security import hash_password, verify_password + + +class AuthService: + """ + Service class for authentication operations. + + Handles password hashing, verification, and other authentication + related operations with proper security practices. + """ + + @staticmethod + def hash_user_password(password: str) -> str: + """ + Hash a plaintext password for secure storage. + + Args: + password (str): Plaintext password to hash + + Returns: + str: Hashed password safe for database storage + + Example: + >>> auth = AuthService() + >>> hashed = auth.hash_user_password("mypassword123") + >>> len(hashed) > 0 + True + """ + return hash_password(password) + + @staticmethod + def verify_user_password(password: str, hashed_password: str) -> bool: + """ + Verify a password against its hash. + + Args: + password (str): Plaintext password to verify + hashed_password (str): Stored hashed password + + Returns: + bool: True if password matches hash, False otherwise + + Example: + >>> auth = AuthService() + >>> hashed = auth.hash_user_password("mypassword123") + >>> auth.verify_user_password("mypassword123", hashed) + True + >>> auth.verify_user_password("wrongpassword", hashed) + False + """ + return verify_password(password, hashed_password) \ No newline at end of file diff --git a/src/file-processor/app/services/init_service.py b/src/file-processor/app/services/init_service.py new file mode 100644 index 0000000..fd3464f --- /dev/null +++ b/src/file-processor/app/services/init_service.py @@ -0,0 +1,134 @@ +""" +Initialization service for application startup tasks. + +This module handles application initialization tasks including +creating default admin user if none exists. +""" + +import logging +from typing import Optional + +from app.models.user import UserCreate, UserInDB, UserCreateNoValidation +from app.models.auth import UserRole +from app.services.user_service import UserService + +logger = logging.getLogger(__name__) + + +class InitializationService: + """ + Service for handling application initialization tasks. + + This service manages startup operations like ensuring required + users exist and system is properly configured. + """ + + def __init__(self, user_service: UserService): + """ + Initialize service with user service dependency. + + Args: + user_service (UserService): Service for user operations + """ + self.user_service = user_service + + + def ensure_admin_user_exists(self) -> Optional[UserInDB]: + """ + Ensure default admin user exists in the system. + + Creates a default admin user if no admin user exists in the system. + Uses default credentials that should be changed after first login. + + Returns: + UserInDB or None: Created admin user if created, None if already exists + + Raises: + Exception: If admin user creation fails + """ + logger.info("Checking if admin user exists...") + + # Check if any admin user already exists + if self._admin_user_exists(): + logger.info("Admin user already exists, skipping creation") + return None + + logger.info("No admin user found, creating default admin user...") + + try: + # Create default admin user + admin_data = UserCreateNoValidation( + username="admin", + email="admin@mydocmanager.local", + password="admin", # Should be changed after first login + role=UserRole.ADMIN + ) + + created_user = self.user_service.create_user(admin_data) + logger.info(f"Default admin user created successfully with ID: {created_user.id}") + logger.warning( + "Default admin user created with username 'admin' and password 'admin'. " + "Please change these credentials immediately for security!" + ) + + return created_user + + except Exception as e: + logger.error(f"Failed to create default admin user: {str(e)}") + raise Exception(f"Admin user creation failed: {str(e)}") + + def _admin_user_exists(self) -> bool: + """ + Check if any admin user exists in the system. + + Returns: + bool: True if at least one admin user exists, False otherwise + """ + try: + # Get all users and check if any have admin role + users = self.user_service.list_users(limit=1000) # Reasonable limit for admin check + + for user in users: + if user.role == UserRole.ADMIN and user.is_active: + return True + + return False + + except Exception as e: + logger.error(f"Error checking for admin users: {str(e)}") + # In case of error, assume admin exists to avoid creating duplicates + return True + + def initialize_application(self) -> dict: + """ + Perform all application initialization tasks. + + This method runs all necessary initialization procedures including + admin user creation and any other startup requirements. + + Returns: + dict: Summary of initialization tasks performed + """ + logger.info("Starting application initialization...") + + initialization_summary = { + "admin_user_created": False, + "initialization_success": False, + "errors": [] + } + + try: + # Ensure admin user exists + created_admin = self.ensure_admin_user_exists() + if created_admin: + initialization_summary["admin_user_created"] = True + + initialization_summary["initialization_success"] = True + logger.info("Application initialization completed successfully") + + except Exception as e: + error_msg = f"Application initialization failed: {str(e)}" + logger.error(error_msg) + initialization_summary["errors"].append(error_msg) + + return initialization_summary \ No newline at end of file diff --git a/src/file-processor/app/services/user_service.py b/src/file-processor/app/services/user_service.py new file mode 100644 index 0000000..de9fcef --- /dev/null +++ b/src/file-processor/app/services/user_service.py @@ -0,0 +1,181 @@ +""" +User service for business logic operations. + +This module provides user-related business logic including user creation, +retrieval, updates, and authentication operations with proper error handling. +""" + +from typing import Optional, List +from pymongo.errors import DuplicateKeyError + +from app.models.user import UserCreate, UserInDB, UserUpdate, UserResponse, UserCreateNoValidation +from app.models.auth import UserRole +from app.database.repositories.user_repository import UserRepository +from app.services.auth_service import AuthService + + +class UserService: + """ + Service class for user business logic operations. + + This class handles user-related operations including creation, + authentication, and data management with proper validation. + """ + + def __init__(self, user_repository: UserRepository): + """ + Initialize user service with repository dependency. + + Args: + user_repository (UserRepository): Repository for user data operations + """ + self.user_repository = user_repository + self.auth_service = AuthService() + + def create_user(self, user_data: UserCreate | UserCreateNoValidation) -> UserInDB: + """ + Create a new user with business logic validation. + + Args: + user_data (UserCreate): User creation data + + Returns: + UserInDB: Created user with database information + + Raises: + ValueError: If user already exists or validation fails + """ + # Check if user already exists + if self.user_repository.user_exists(user_data.username): + raise ValueError(f"User with username '{user_data.username}' already exists") + + # Check if email already exists + existing_user = self.user_repository.find_user_by_email(user_data.email) + if existing_user: + raise ValueError(f"User with email '{user_data.email}' already exists") + + try: + return self.user_repository.create_user(user_data) + except DuplicateKeyError: + raise ValueError(f"User with username '{user_data.username}' already exists") + + def get_user_by_username(self, username: str) -> Optional[UserInDB]: + """ + Retrieve user by username. + + Args: + username (str): Username to search for + + Returns: + UserInDB or None: User if found, None otherwise + """ + return self.user_repository.find_user_by_username(username) + + def get_user_by_id(self, user_id: str) -> Optional[UserInDB]: + """ + Retrieve user by ID. + + Args: + user_id (str): User ID to search for + + Returns: + UserInDB or None: User if found, None otherwise + """ + return self.user_repository.find_user_by_id(user_id) + + def authenticate_user(self, username: str, password: str) -> Optional[UserInDB]: + """ + Authenticate user with username and password. + + Args: + username (str): Username for authentication + password (str): Password for authentication + + Returns: + UserInDB or None: Authenticated user if valid, None otherwise + """ + user = self.user_repository.find_user_by_username(username) + if not user: + return None + + if not user.is_active: + return None + + if not self.auth_service.verify_user_password(password, user.hashed_password): + return None + + return user + + def update_user(self, user_id: str, user_update: UserUpdate) -> Optional[UserInDB]: + """ + Update user information. + + Args: + user_id (str): User ID to update + user_update (UserUpdate): Updated user data + + Returns: + UserInDB or None: Updated user if successful, None otherwise + + Raises: + ValueError: If username or email already exists for different user + """ + # Validate username uniqueness if being updated + if user_update.username is not None: + existing_user = self.user_repository.find_user_by_username(user_update.username) + if existing_user and str(existing_user.id) != user_id: + raise ValueError(f"Username '{user_update.username}' is already taken") + + # Validate email uniqueness if being updated + if user_update.email is not None: + existing_user = self.user_repository.find_user_by_email(user_update.email) + if existing_user and str(existing_user.id) != user_id: + raise ValueError(f"Email '{user_update.email}' is already taken") + + return self.user_repository.update_user(user_id, user_update) + + def delete_user(self, user_id: str) -> bool: + """ + Delete user from system. + + Args: + user_id (str): User ID to delete + + Returns: + bool: True if user was deleted, False otherwise + """ + return self.user_repository.delete_user(user_id) + + def list_users(self, skip: int = 0, limit: int = 100) -> List[UserInDB]: + """ + List users with pagination. + + Args: + skip (int): Number of users to skip (default: 0) + limit (int): Maximum number of users to return (default: 100) + + Returns: + List[UserInDB]: List of users + """ + return self.user_repository.list_users(skip=skip, limit=limit) + + def count_users(self) -> int: + """ + Count total number of users. + + Returns: + int: Total number of users in system + """ + return self.user_repository.count_users() + + def user_exists(self, username: str) -> bool: + """ + Check if user exists by username. + + Args: + username (str): Username to check + + Returns: + bool: True if user exists, False otherwise + """ + return self.user_repository.user_exists(username) diff --git a/src/file-processor/requirements.txt b/src/file-processor/requirements.txt index 768ae61..0e686f9 100644 --- a/src/file-processor/requirements.txt +++ b/src/file-processor/requirements.txt @@ -1,6 +1,9 @@ -fastapi==0.116.1 -uvicorn==0.35.0 +bcrypt==4.3.0 celery==5.5.3 -redis==6.4.0 +email-validator==2.3.0 +fastapi==0.116.1 +httptools==0.6.4 pymongo==4.15.0 -pydantic==2.11.9 \ No newline at end of file +pydantic==2.11.9 +redis==6.4.0 +uvicorn==0.35.0