Implemented default pipeline

Adding document service
Fixed unit tests
2025-09-26 22:08:39 +02:00 · 2025-09-19 22:59:41 +02:00 · 2025-09-19 21:06:09 +02:00 · 2025-09-18 22:53:51 +02:00 · 2025-09-17 22:45:33 +02:00 · 2025-09-17 21:24:03 +02:00
57 changed files with 6198 additions and 1128 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,5 @@
+volumes
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[codz]
--- a/Readme.md
+++ b/Readme.md
@@ -2,20 +2,24 @@

 ## Overview

-MyDocManager is a real-time document processing application that automatically detects files in a monitored directory, processes them asynchronously, and stores the results in a database. The application uses a modern microservices architecture with Redis for task queuing and MongoDB for data persistence.
+MyDocManager is a real-time document processing application that automatically detects files in a monitored directory,
+processes them asynchronously, and stores the results in a database. The application uses a modern microservices
+architecture with Redis for task queuing and MongoDB for data persistence.

 ## Architecture

 ### Technology Stack
+
 - **Backend API**: FastAPI (Python 3.12)
 - **Task Processing**: Celery with Redis broker
 - **Document Processing**: EasyOCR, PyMuPDF, python-docx, pdfplumber
- **Database**: MongoDB
+- **Database**: MongoDB (pymongo)
 - **Frontend**: React
 - **Containerization**: Docker & Docker Compose
 - **File Monitoring**: Python watchdog library

 ### Services Architecture
+
    ┌─────────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │   Frontend      │    │ file-       │    │    Redis    │    │   Worker    │    │  MongoDB    │
    │   (React)       │◄──►│ processor   │───►│  (Broker)   │◄──►│  (Celery)   │───►│ (Results)   │
@@ -24,13 +28,13 @@ MyDocManager is a real-time document processing application that automatically d
    └─────────────────┘    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

 ### Docker Services
+
 1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch
 2. **worker**: Celery workers for document processing (OCR, text extraction)
 3. **redis**: Message broker for Celery tasks
 4. **mongodb**: Final database for processing results
 5. **frontend**: React interface for monitoring and file access

-
 ## Data Flow

 1. **File Detection**: Watchdog monitors target directory in real-time
@@ -42,11 +46,13 @@ MyDocManager is a real-time document processing application that automatically d
 ## Document Processing Capabilities

 ### Supported File Types
+
 - **PDF**: Direct text extraction + OCR for scanned documents
 - **Word Documents**: .docx text extraction
 - **Images**: OCR text recognition (JPG, PNG, etc.)

 ### Processing Libraries
+
 - **EasyOCR**: Modern OCR engine (80+ languages, deep learning-based)
 - **PyMuPDF**: PDF text extraction and manipulation
 - **python-docx**: Word document processing
@@ -55,12 +61,15 @@ MyDocManager is a real-time document processing application that automatically d
 ## Development Environment

 ### Container-Based Development
+
 The application is designed for container-based development with hot-reload capabilities:
+
 - Source code mounted as volumes for real-time updates
 - All services orchestrated via Docker Compose
 - Development and production parity

 ### Key Features
+
 - **Real-time Processing**: Immediate file detection and processing
 - **Horizontal Scaling**: Multiple workers can be added easily
 - **Fault Tolerance**: Celery provides automatic retry mechanisms
@@ -68,6 +77,7 @@ The application is designed for container-based development with hot-reload capa
 - **Hot Reload**: Development changes reflected instantly in containers

 ### Docker Services
+
 1. **file-processor**: FastAPI + real-time file monitoring + Celery task dispatch
 2. **worker**: Celery workers for document processing (OCR, text extraction)
 3. **redis**: Message broker for Celery tasks
@@ -85,25 +95,32 @@ MyDocManager/
 │   │   ├── requirements.txt
 │   │   ├── app/
 │   │   │   ├── main.py
-│   │   │   ├── file_watcher.py
-│   │   │   ├── celery_app.py
+│   │   │   ├── file_watcher.py             # FileWatcher class with observer thread
+│   │   │   ├── celery_app.py               # Celery Configuration 
 │   │   │   ├── config/
 │   │   │   │   ├── __init__.py
 │   │   │   │   └── settings.py              # JWT, MongoDB config
 │   │   │   ├── models/
 │   │   │   │   ├── __init__.py
 │   │   │   │   ├── user.py                  # User Pydantic models
-│   │   │   │   └── auth.py                  # Auth Pydantic models
+│   │   │   │   ├── auth.py                  # Auth Pydantic models
+│   │   │   │   ├── document.py              # Document Pydantic models
+│   │   │   │   ├── job.py                   # Job Processing Pydantic models
+│   │   │   │   └── types.py                 # PyObjectId and other useful types
 │   │   │   ├── database/
 │   │   │   │   ├── __init__.py
-│   │   │   │   ├── connection.py            # MongoDB connection
+│   │   │   │   ├── connection.py            # MongoDB connection (pymongo)
 │   │   │   │   └── repositories/
 │   │   │   │       ├── __init__.py
-│   │   │   │       └── user_repository.py   # User CRUD operations
+│   │   │   │       ├── user_repository.py      # User CRUD operations (synchronous)
+│   │   │   │       ├── document_repository.py  # Document CRUD operations (synchronous)
+│   │   │   │       └── job_repository.py       # Job CRUD operations (synchronous)
 │   │   │   ├── services/
 │   │   │   │   ├── __init__.py
-│   │   │   │   ├── auth_service.py          # JWT & password logic
-│   │   │   │   ├── user_service.py          # User business logic
+│   │   │   │   ├── auth_service.py          # JWT & password logic (synchronous)
+│   │   │   │   ├── user_service.py          # User business logic (synchronous)
+│   │   │   │   ├── document_service.py      # Document business logic (synchronous)
+│   │   │   │   ├── job_service.py           # Job processing logic (synchronous)
 │   │   │   │   └── init_service.py          # Admin creation at startup
 │   │   │   ├── api/
 │   │   │   │   ├── __init__.py
@@ -115,7 +132,7 @@ MyDocManager/
 │   │   │   └── utils/
 │   │   │       ├── __init__.py
 │   │   │       ├── security.py             # Password utilities
-│   │   │       └── exceptions.py           # Custom exceptions
+│   │   │       └── document_matching.py    # Fuzzy matching Algorithms
 │   ├── worker/
 │   │   ├── Dockerfile
 │   │   ├── requirements.txt
@@ -123,7 +140,13 @@ MyDocManager/
 │   └── frontend/
 │       ├── Dockerfile
 │       ├── package.json
+│       ├── index.html
 │       └── src/
+│           ├── assets/
+│           ├── App.css
+│           ├── App.jsx
+│           ├── main.css
+│           └── main.jsx
 ├── tests/
 │   ├── file-processor/
 │   │   ├── test_auth/
@@ -138,6 +161,7 @@ MyDocManager/
 ## Authentication & User Management

 ### Security Features
+
 - **JWT Authentication**: Stateless authentication with 24-hour token expiration
 - **Password Security**: bcrypt hashing with automatic salting
 - **Role-Based Access**: Admin and User roles with granular permissions
@@ -145,16 +169,19 @@ MyDocManager/
 - **Auto Admin Creation**: Default admin user created on first startup

 ### User Roles
+
 - **Admin**: Full access to user management (create, read, update, delete users)
 - **User**: Limited access (view own profile, access document processing features)

 ### Authentication Flow
+
 1. **Login**: User provides credentials → Server validates → Returns JWT token
 2. **API Access**: Client includes JWT in Authorization header
 3. **Token Validation**: Server verifies token signature and expiration
 4. **Role Check**: Server validates user permissions for requested resource

 ### User Management APIs
+
 ```
 POST /auth/login              # Generate JWT token
 GET  /users                   # List all users (admin only)
@@ -164,10 +191,323 @@ DELETE /users/{user_id}       # Delete user (admin only)
 GET  /users/me                # Get current user profile (authenticated users)
 ```

+### Useful Service URLs

-## Docker Commands Reference
+- **FastAPI API**: http://localhost:8000
+- **FastAPI Docs**: http://localhost:8000/docs
+- **Health Check**: http://localhost:8000/health
+- **Redis**: localhost:6379
+- **MongoDB**: localhost:27017

-### Initial Setup & Build
+### Testing Commands
+
+```bash
+# Test FastAPI health
+curl http://localhost:8000/health
+
+# Test Celery task dispatch
+curl -X POST http://localhost:8000/test-task \
+  -H "Content-Type: application/json" \
+  -d '{"message": "Hello from test!"}'
+
+# Monitor Celery tasks
+docker-compose logs -f worker
+```
+
+## Default Admin User
+
+On first startup, the application automatically creates a default admin user:
+
+- **Username**: `admin`
+- **Password**: `admin`
+- **Role**: `admin`
+- **Email**: `admin@mydocmanager.local`
+  **⚠️ Important**: Change the default admin password immediately after first login in production environments.
+
+## File Processing Architecture
+
+### Document Processing Flow
+
+1. **File Detection**: Watchdog monitors `/volumes/watched_files/` directory in real-time
+2. **Task Creation**: File watcher creates Celery task for each detected file
+3. **Document Processing**: Celery worker processes the document and extracts content
+4. **Database Storage**: Processed data stored in MongoDB collections
+
+### MongoDB Collections Design
+
+#### Files Collection
+
+Stores file metadata and extracted content using Pydantic models:
+
+```python
+class FileDocument(BaseModel):
+  """
+  Model for file documents stored in the 'files' collection.
+
+  Represents a file detected in the watched directory with its
+  metadata and extracted content.
+  """
+  
+  id: Optional[PyObjectId] = Field(default=None, alias="_id")
+  filename: str = Field(..., description="Original filename")
+  filepath: str = Field(..., description="Full path to the file")
+  file_type: FileType = Field(..., description="Type of the file")
+  extraction_method: Optional[ExtractionMethod] = Field(default=None, description="Method used to extract content")
+  metadata: Dict[str, Any] = Field(default_factory=dict, description="File-specific metadata")
+  detected_at: Optional[datetime] = Field(default=None, description="Timestamp when file was detected")
+  file_hash: Optional[str] = Field(default=None, description="SHA256 hash of file content")
+  encoding: str = Field(default="utf-8", description="Character encoding for text files")
+  file_size: int = Field(..., ge=0, description="File size in bytes")
+  mime_type: str = Field(..., description="MIME type detected")
+  
+  @field_validator('filepath')
+  @classmethod
+  def validate_filepath(cls, v: str) -> str:
+    """Validate filepath format."""
+    if not v.strip():
+      raise ValueError("Filepath cannot be empty")
+    return v.strip()
+  
+  @field_validator('filename')
+  @classmethod
+  def validate_filename(cls, v: str) -> str:
+    """Validate filename format."""
+    if not v.strip():
+      raise ValueError("Filename cannot be empty")
+    return v.strip()
+```
+
+#### Processing Jobs Collection
+
+Tracks processing status and lifecycle:
+
+```python
+class ProcessingJob(BaseModel):
+  """
+  Model for processing jobs stored in the 'processing_jobs' collection.
+
+  Tracks the lifecycle and status of document processing tasks.
+  """
+  
+  id: Optional[PyObjectId] = Field(default=None, alias="_id")
+  file_id: PyObjectId = Field(..., description="Reference to file document")
+  status: ProcessingStatus = Field(default=ProcessingStatus.PENDING, description="Current processing status")
+  task_id: Optional[str] = Field(default=None, description="Celery task UUID")
+  created_at: Optional[datetime] = Field(default=None, description="Timestamp when job was created")
+  started_at: Optional[datetime] = Field(default=None, description="Timestamp when processing started")
+  completed_at: Optional[datetime] = Field(default=None, description="Timestamp when processing completed")
+  error_message: Optional[str] = Field(default=None, description="Error message if processing failed")
+  
+  @field_validator('error_message')
+  @classmethod
+  def validate_error_message(cls, v: Optional[str]) -> Optional[str]:
+    """Clean up error message."""
+    if v is not None:
+      return v.strip() if v.strip() else None
+    return v
+```
+
+### Supported File Types (Initial Implementation)
+
+- **Text Files** (`.txt`): Direct content reading
+- **PDF Documents** (`.pdf`): Text extraction via PyMuPDF/pdfplumber
+- **Word Documents** (`.docx`): Content extraction via python-docx
+
+### File Processing Architecture Decisions
+
+#### Watchdog Implementation
+
+- **Choice**: Dedicated observer thread
+- **Rationale**: Standard approach, clean separation of concerns
+- **Implementation**: Watchdog observer runs in separate thread from FastAPI
+
+#### Task Dispatch Strategy
+
+- **Choice**: Direct Celery task creation from file watcher
+- **Rationale**: Minimal latency, straightforward flow
+- **Implementation**: File detected → Immediate Celery task dispatch
+
+#### Data Storage Strategy
+
+- **Choice**: Separate collections for files and processing status
+- **Rationale**: Clean separation of file data vs processing lifecycle
+- **Benefits**:
+    - Better query performance
+    - Clear data model boundaries
+    - Easy processing status tracking
+
+#### Content Storage Location
+
+- **Choice**: Store files in the file system, using the SHA256 hash as filename
+- **Rationale**: MongoDB is not meant for large files, better performance. Files remain in the file system for easy
+  access.
+
+#### Repository and Services Implementation
+
+- **Choice**: Synchronous implementation using pymongo
+- **Rationale**: Full compatibility with Celery workers and simplified workflow
+- **Implementation**: All repositories and services operate synchronously for seamless integration
+
+### Implementation Status
+
+1. ✅ Pydantic models for MongoDB collections
+2. ✅ Repository layer for data access (files + processing_jobs + users + documents) - synchronous
+3. ✅ Service layer for business logic (auth, user, document, job) - synchronous
+4. ✅ Celery tasks for document processing
+5. ✅ Watchdog file monitoring implementation
+6. ✅ FastAPI integration and startup coordination
+
+## Job Management Layer
+
+### Repository Pattern Implementation
+
+The job management system follows the repository pattern for clean separation between data access and business logic.
+
+#### JobRepository
+
+Handles direct MongoDB operations for processing jobs using synchronous pymongo:
+
+**CRUD Operations:**
+- `create_job()` - Create new processing job with automatic `created_at` timestamp
+- `get_job_by_id()` - Retrieve job by ObjectId
+- `update_job_status()` - Update job status with automatic timestamp management
+- `delete_job()` - Remove job from database
+- `get_jobs_by_file_id()` - Get all jobs for specific file
+- `get_jobs_by_status()` - Get jobs filtered by processing status
+
+**Automatic Timestamp Management:**
+- `created_at`: Set automatically during job creation
+- `started_at`: Set automatically when status changes to PROCESSING  
+- `completed_at`: Set automatically when status changes to COMPLETED or FAILED
+
+#### JobService
+
+Provides synchronous business logic layer with strict status transition validation:
+
+**Status Transition Methods:**
+- `mark_job_as_started()` - PENDING → PROCESSING
+- `mark_job_as_completed()` - PROCESSING → COMPLETED
+- `mark_job_as_failed()` - PROCESSING → FAILED
+
+**Validation Rules:**
+- Strict status transitions (invalid transitions raise exceptions)
+- Job existence verification before any operation
+- Automatic timestamp management through repository layer
+
+#### Custom Exceptions
+
+**InvalidStatusTransitionError**: Raised for invalid status transitions  
+**JobRepositoryError**: Raised for MongoDB operation failures
+
+#### Valid Status Transitions
+
+```
+PENDING → PROCESSING    (via mark_job_as_started)
+PROCESSING → COMPLETED  (via mark_job_as_completed)
+PROCESSING → FAILED     (via mark_job_as_failed)
+```
+
+All other transitions are forbidden and will raise `InvalidStatusTransitionError`.
+
+### File Structure
+
+```
+src/file-processor/app/
+├── database/repositories/
+│   ├── job_repository.py           # JobRepository class (synchronous)
+│   ├── user_repository.py          # UserRepository class (synchronous)
+│   ├── document_repository.py      # DocumentRepository class (synchronous)
+│   └── file_repository.py          # FileRepository class (synchronous)
+├── services/  
+│   ├── job_service.py              # JobService class (synchronous)
+│   ├── auth_service.py             # AuthService class (synchronous)
+│   ├── user_service.py             # UserService class (synchronous)
+│   └── document_service.py         # DocumentService class (synchronous)
+└── exceptions/
+    └── job_exceptions.py           # Custom exceptions
+```
+
+### Processing Pipeline Features
+
+- **Duplicate Detection**: SHA256 hashing prevents reprocessing same files
+- **Error Handling**: Failed processing tracked with error messages
+- **Status Tracking**: Real-time processing status via `processing_jobs` collection
+- **Extensible Metadata**: Flexible metadata storage per file type
+- **Multiple Extraction Methods**: Support for direct text, OCR, and hybrid approaches
+- **Synchronous Operations**: All database operations use pymongo for Celery compatibility
+
+## Key Implementation Notes
+
+### Python Standards
+
+- **Style**: PEP 8 compliance
+- **Documentation**: Google/NumPy docstring format
+- **Naming**: snake_case for variables and functions
+- **Testing**: pytest with test_i_can_xxx / test_i_cannot_xxx patterns
+
+### Security Best Practices
+
+- **Password Storage**: Never store plain text passwords, always use bcrypt hashing
+- **JWT Secrets**: Use strong, randomly generated secret keys in production
+- **Token Expiration**: 24-hour expiration with secure signature validation
+- **Role Validation**: Server-side role checking for all protected endpoints
+
+### Dependencies Management
+
+- **Package Manager**: pip (standard)
+- **External Dependencies**: Listed in each service's requirements.txt
+- **Standard Library First**: Prefer standard library when possible
+- **Database Driver**: pymongo for synchronous MongoDB operations
+
+### Testing Strategy
+
+- All code must be testable
+- Unit tests for each authentication and user management function
+- Integration tests for complete authentication flow
+- Tests validated before implementation
+
+### Critical Architecture Decisions Made
+
+1. **JWT Authentication**: Simple token-based auth with 24-hour expiration
+2. **Role-Based Access**: Admin/User roles for granular permissions
+3. **bcrypt Password Hashing**: Industry-standard password security
+4. **MongoDB User Storage**: Centralized user management in main database
+5. **Auto Admin Creation**: Automatic setup for first-time deployment
+6. **Single FastAPI Service**: Handles both API and file watching with authentication
+7. **Celery with Redis**: Chosen over other async patterns for scalability
+8. **EasyOCR Preferred**: Selected over Tesseract for modern OCR needs
+9. **Container Development**: Hot-reload setup required for development workflow
+10. **Dedicated Watchdog Observer**: Thread-based file monitoring for reliability
+11. **Separate MongoDB Collections**: Files and processing jobs stored separately
+12. **Content in Files Collection**: Extracted content stored with file metadata
+13. **Direct Task Dispatch**: File watcher directly creates Celery tasks
+14. **SHA256 Duplicate Detection**: Prevents reprocessing identical files
+15. **Synchronous Implementation**: All repositories and services use pymongo for Celery compatibility
+
+### Development Process Requirements
+
+1. **Collaborative Validation**: All options must be explained before coding
+2. **Test-First Approach**: Test cases defined and validated before implementation
+3. **Incremental Development**: Start simple, extend functionality progressively
+4. **Error Handling**: Clear problem explanation required before proposing fixes
+
+### Next Implementation Steps
+
+1. **TODO**: Complete file processing pipeline =>
+    1. ✅ Create Pydantic models for files and processing_jobs collections
+    2. ✅ Implement repository layer for file and processing job data access (synchronous)
+    3. ✅ Implement service layer for business logic (synchronous)
+    4. ✅ Create Celery tasks for document processing (.txt, .pdf, .docx)
+    5. ✅ Implement Watchdog file monitoring with dedicated observer
+    6. ✅ Integrate file watcher with FastAPI startup
+2. Create protected API routes for user management
+3. Build React monitoring interface with authentication
+
+## Annexes
+
+### Docker Commands Reference
+
+#### Initial Setup & Build

 ```bash
 # Build and start all services (first time)
@@ -181,7 +521,7 @@ docker-compose build file-processor
 docker-compose build worker
 ```

-### Development Workflow
+#### Development Workflow

 ```bash
 # Start all services
@@ -203,7 +543,7 @@ docker-compose restart redis
 docker-compose restart mongodb
 ```

-### Monitoring & Debugging
+#### Monitoring & Debugging

 ```bash
 # View logs of all services
@@ -228,7 +568,7 @@ docker-compose exec worker bash
 docker-compose exec mongodb mongosh
 ```

-### Service Management
+#### Service Management

 ```bash
 # Start only specific services
@@ -248,103 +588,6 @@ docker-compose up --scale worker=3
 ### Hot-Reload Configuration

 - **file-processor**: Hot-reload enabled via `--reload` flag
-  - Code changes in `src/file-processor/app/` automatically restart FastAPI
+    - Code changes in `src/file-processor/app/` automatically restart FastAPI
 - **worker**: No hot-reload (manual restart required for stability)
-  - Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
-
-### Useful Service URLs
-
- **FastAPI API**: http://localhost:8000
- **FastAPI Docs**: http://localhost:8000/docs
- **Health Check**: http://localhost:8000/health
- **Redis**: localhost:6379
- **MongoDB**: localhost:27017
-
-### Testing Commands
-
-```bash
-# Test FastAPI health
-curl http://localhost:8000/health
-
-# Test Celery task dispatch
-curl -X POST http://localhost:8000/test-task \
-  -H "Content-Type: application/json" \
-  -d '{"message": "Hello from test!"}'
-
-# Monitor Celery tasks
-docker-compose logs -f worker
-```
-## Default Admin User
-
-On first startup, the application automatically creates a default admin user:
- **Username**: `admin`
- **Password**: `admin`
- **Role**: `admin`
- **Email**: `admin@mydocmanager.local`
-**⚠️ Important**: Change the default admin password immediately after first login in production environments.
-
-## Key Implementation Notes
-
-### Python Standards
- **Style**: PEP 8 compliance
- **Documentation**: Google/NumPy docstring format
- **Naming**: snake_case for variables and functions
- **Testing**: pytest with test_i_can_xxx / test_i_cannot_xxx patterns
-
-### Security Best Practices
- **Password Storage**: Never store plain text passwords, always use bcrypt hashing
- **JWT Secrets**: Use strong, randomly generated secret keys in production
- **Token Expiration**: 24-hour expiration with secure signature validation
- **Role Validation**: Server-side role checking for all protected endpoints
-
-### Dependencies Management
- **Package Manager**: pip (standard)
- **External Dependencies**: Listed in each service's requirements.txt
- **Standard Library First**: Prefer standard library when possible
-
-### Testing Strategy
- All code must be testable
- Unit tests for each authentication and user management function
- Integration tests for complete authentication flow
- Tests validated before implementation
-
-### Critical Architecture Decisions Made
-1. **JWT Authentication**: Simple token-based auth with 24-hour expiration
-2. **Role-Based Access**: Admin/User roles for granular permissions
-3. **bcrypt Password Hashing**: Industry-standard password security
-4. **MongoDB User Storage**: Centralized user management in main database
-5. **Auto Admin Creation**: Automatic setup for first-time deployment
-6. **Single FastAPI Service**: Handles both API and file watching with authentication
-7. **Celery with Redis**: Chosen over other async patterns for scalability
-8. **EasyOCR Preferred**: Selected over Tesseract for modern OCR needs
-9. **Container Development**: Hot-reload setup required for development workflow
-
-### Development Process Requirements
-1. **Collaborative Validation**: All options must be explained before coding
-2. **Test-First Approach**: Test cases defined and validated before implementation
-3. **Incremental Development**: Start simple, extend functionality progressively
-4. **Error Handling**: Clear problem explanation required before proposing fixes
-
-### Next Implementation Steps
-1. ✅ Create docker-compose.yml with all services
-2. ✅ Define user management and authentication architecture
-3. Implement user models and authentication services
-4. Create protected API routes for user management
-5. Add automatic admin user creation
-6. Implement basic FastAPI service structure
-7. Add watchdog file monitoring
-8. Create Celery task structure
-9. Implement document processing tasks
-10. Build React monitoring interface with authentication
-
-### prochaines étapes
-MongoDB CRUD
-Nous devons absolument mocker MongoDB pour les tests unitaires avec pytest-mock
-Fichiers à créer: 
-* app/models/auht.py => déjà fait
-* app/models/user.py => déjà fait
-* app/database/connection.py
-  * Utilise les settings pour l'URL MongoDB. Il faut créer un fichier de configuration (app/config/settings.py)
-  * Fonction get_database() + gestion des erreurs
-  * Configuration via variables d'environnement
-* app/database/repositories/user_repository.py
+    - Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,5 +1,3 @@
-version: '3.8'
-
 services:
  # Redis - Message broker for Celery
  redis:
@@ -21,7 +19,7 @@ services:
      MONGO_INITDB_ROOT_PASSWORD: password123
      MONGO_INITDB_DATABASE: mydocmanager
    volumes:
-      - mongodb-data:/data/db
+      - ./volumes/db:/data/db
    networks:
      - mydocmanager-network

@@ -36,15 +34,18 @@ services:
    environment:
      - REDIS_URL=redis://redis:6379/0
      - MONGODB_URL=mongodb://admin:password123@mongodb:27017/mydocmanager?authSource=admin
+      - PYTHONPATH=/app:/tasks  # Added /tasks to Python path
    volumes:
-      - ./src/file-processor/app:/app
+      - ./src/file-processor:/app
+      - ./src/worker/tasks:/app/tasks          # <- Added: shared access to worker tasks
      - ./volumes/watched_files:/watched_files
+      - ./volumes/objects:/objects
    depends_on:
      - redis
      - mongodb
    networks:
      - mydocmanager-network
-    command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
+    command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

  # Worker - Celery workers for document processing
  worker:
@@ -55,15 +56,31 @@ services:
    environment:
      - REDIS_URL=redis://redis:6379/0
      - MONGODB_URL=mongodb://admin:password123@mongodb:27017/mydocmanager?authSource=admin
+      - PYTHONPATH=/app
    volumes:
-      - ./src/worker/tasks:/app
+      - ./src/worker:/app
+      - ./src/file-processor/app:/app/app     # <- Added: shared access file-processor app
      - ./volumes/watched_files:/watched_files
    depends_on:
      - redis
      - mongodb
    networks:
      - mydocmanager-network
-    command: celery -A main worker --loglevel=info
+    command: celery -A tasks.main worker --loglevel=info
+
+  # Frontend - React application with Vite
+  frontend:
+    build:
+      context: ./src/frontend
+      dockerfile: Dockerfile
+    container_name: mydocmanager-frontend
+    ports:
+      - "5173:5173"
+    volumes:
+      - ./src/frontend:/app
+      - /app/node_modules  # Anonymous volume to prevent node_modules override
+    networks:
+      - mydocmanager-network

 volumes:
  mongodb-data:
--- a/pytest.ini
+++ b/pytest.ini
@@ -0,0 +1,7 @@
+[tool:pytest]
+asyncio_mode = auto
+testpaths = tests
+python_files = test_*.py
+python_classes = Test*
+python_functions = test_*
+pythonpath = src/file-processor
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,33 +1,57 @@
 amqp==5.3.1
 annotated-types==0.7.0
 anyio==4.10.0
+asgiref==3.9.1
 bcrypt==4.3.0
 billiard==4.2.1
-bson==0.5.10
 celery==5.5.3
+certifi==2025.8.3
+cffi==2.0.0
 click==8.2.1
 click-didyoumean==0.3.1
 click-plugins==1.1.1.2
 click-repl==0.3.0
+cryptography==46.0.1
 dnspython==2.8.0
+ecdsa==0.19.1
 email-validator==2.3.0
 fastapi==0.116.1
 h11==0.16.0
+hiredis==3.2.1
+httpcore==1.0.9
 httptools==0.6.4
+httpx==0.28.1
 idna==3.10
+importlib_metadata==8.7.0
 iniconfig==2.1.0
+izulu==0.50.0
 kombu==5.5.4
+mongomock==4.3.0
+mongomock-motor==0.0.36
+motor==3.7.1
 packaging==25.0
+pipdeptree==2.28.0
 pluggy==1.6.0
 prompt_toolkit==3.0.52
+pyasn1==0.6.1
+pycparser==2.23
+pycron==3.2.0
 pydantic==2.11.9
 pydantic_core==2.33.2
 Pygments==2.19.2
-pymongo==4.15.0
+PyJWT==2.10.1
+pymongo==4.15.1
 pytest==8.4.2
+pytest-asyncio==1.2.0
+pytest-mock==3.15.1
 python-dateutil==2.9.0.post0
 python-dotenv==1.1.1
+python-magic==0.4.27
+pytz==2025.2
 PyYAML==6.0.2
+redis==6.4.0
+rsa==4.9.1
+sentinels==1.1.1
 six==1.17.0
 sniffio==1.3.1
 starlette==0.47.3
@@ -37,6 +61,8 @@ tzdata==2025.2
 uvicorn==0.35.0
 uvloop==0.21.0
 vine==5.1.0
+watchdog==6.0.0
 watchfiles==1.1.0
 wcwidth==0.2.13
 websockets==15.0.1
+zipp==3.23.0
--- a/src/file-processor/Dockerfile
+++ b/src/file-processor/Dockerfile
@@ -3,15 +3,23 @@ FROM python:3.12-slim
 # Set working directory
 WORKDIR /app

+# Install libmagic
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libmagic1 \
+    file \
+ && rm -rf /var/lib/apt/lists/*
+
 # Copy requirements and install dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt

 # Copy application code
-COPY app/ .
+COPY . .
+
+ENV PYTHONPATH=/app

 # Expose port
 EXPOSE 8000

 # Command will be overridden by docker-compose
-CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/src/file-processor/app/api/init.py
+++ b/src/file-processor/app/api/init.py
--- a/src/file-processor/app/api/dependencies.py
+++ b/src/file-processor/app/api/dependencies.py
@@ -0,0 +1,100 @@
+import jwt
+from fastapi import Depends, HTTPException
+from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
+from jwt import InvalidTokenError
+from starlette import status
+
+from app.config import settings
+from app.database.connection import get_database
+from app.models.auth import UserRole
+from app.models.user import UserInDB
+from app.services.auth_service import AuthService
+from app.services.user_service import UserService
+
+security = HTTPBearer()
+
+
+def get_auth_service() -> AuthService:
+  """Dependency to get AuthService instance."""
+  return AuthService()
+
+
+def get_user_service() -> UserService:
+  """Dependency to get UserService instance."""
+  database = get_database()
+  return UserService(database)
+
+
+def get_current_user(
+    credentials: HTTPAuthorizationCredentials = Depends(security),
+    user_service: UserService = Depends(get_user_service)
+) -> UserInDB:
+  """
+  Dependency to get current authenticated user from JWT token.
+
+  Args:
+      credentials: HTTP Bearer credentials
+      user_service: Auth service instance
+
+  Returns:
+      User: Current authenticated user
+
+  Raises:
+      HTTPException: If token is invalid or user not found
+  """
+  try:
+    payload = jwt.decode(
+      credentials.credentials,
+      settings.get_jwt_secret_key(),
+      algorithms=[settings.get_jwt_algorithm()]
+    )
+    username: str = payload.get("sub")
+    if username is None:
+      raise HTTPException(
+        status_code=status.HTTP_401_UNAUTHORIZED,
+        detail="Could not validate credentials",
+        headers={"WWW-Authenticate": "Bearer"},
+      )
+  except InvalidTokenError:
+    raise HTTPException(
+      status_code=status.HTTP_401_UNAUTHORIZED,
+      detail="Could not validate credentials",
+      headers={"WWW-Authenticate": "Bearer"},
+    )
+  
+  user = user_service.get_user_by_username(username)
+  if user is None:
+    raise HTTPException(
+      status_code=status.HTTP_401_UNAUTHORIZED,
+      detail="Could not validate credentials",
+      headers={"WWW-Authenticate": "Bearer"},
+    )
+  
+  if not user.is_active:
+    raise HTTPException(
+      status_code=status.HTTP_400_BAD_REQUEST,
+      detail="Inactive user"
+    )
+  
+  return user
+
+
+def   get_admin_user(current_user: UserInDB = Depends(get_current_user)) -> UserInDB:
+  """
+  Dependency to ensure current user has admin role.
+
+  Args:
+      current_user: Current authenticated user
+
+  Returns:
+      User: Current user if admin
+
+  Raises:
+      HTTPException: If user is not admin
+  """
+  if current_user.role != UserRole.ADMIN:
+    raise HTTPException(
+      status_code=status.HTTP_403_FORBIDDEN,
+      detail="Not enough permissions"
+    )
+  return current_user
--- a/src/file-processor/app/api/routes/init.py
+++ b/src/file-processor/app/api/routes/init.py
--- a/src/file-processor/app/api/routes/auth.py
+++ b/src/file-processor/app/api/routes/auth.py
@@ -0,0 +1,80 @@
+from fastapi import APIRouter, Depends, HTTPException, status
+from fastapi.security import OAuth2PasswordRequestForm
+
+from app.api.dependencies import get_auth_service, get_current_user, get_user_service
+from app.models.auth import LoginResponse, UserResponse
+from app.models.user import UserInDB
+from app.services.auth_service import AuthService
+from app.services.user_service import UserService
+
+router = APIRouter(tags=["authentication"])
+
+
+@router.post("/login", response_model=LoginResponse)
+def login(
+    form_data: OAuth2PasswordRequestForm = Depends(),
+    auth_service: AuthService = Depends(get_auth_service),
+    user_service: UserService = Depends(get_user_service)
+):
+  """
+  Authenticate user and return JWT token.
+
+  Args:
+      form_data: OAuth2 password form data
+      auth_service: Auth service instance
+      user_service: User service instance
+
+  Returns:
+      LoginResponse: JWT token and user info
+
+  Raises:
+      HTTPException: If authentication fails
+  """
+  incorrect_username_or_pwd = HTTPException(
+    status_code=status.HTTP_401_UNAUTHORIZED,
+    detail="Incorrect username or password",
+    headers={"WWW-Authenticate": "Bearer"},
+  )
+  
+  user = user_service.get_user_by_username(form_data.username)
+  if (not user or
+      not user.is_active or
+      not auth_service.verify_user_password(form_data.password, user.hashed_password)):
+    raise incorrect_username_or_pwd
+  
+  access_token = auth_service.create_access_token(data={"sub": user.username})
+  
+  return LoginResponse(
+    access_token=access_token,
+    user=UserResponse(
+      _id=user.id,
+      username=user.username,
+      email=user.email,
+      role=user.role,
+      is_active=user.is_active,
+      created_at=user.created_at,
+      updated_at=user.updated_at
+    )
+  )
+
+
+@router.get("/me", response_model=UserResponse)
+def get_current_user_profile(current_user: UserInDB = Depends(get_current_user)):
+  """
+  Get current user profile.
+
+  Args:
+      current_user: Current authenticated user
+
+  Returns:
+      UserResponse: Current user profile without sensitive data
+  """
+  return UserResponse(
+    _id=current_user.id,
+    username=current_user.username,
+    email=current_user.email,
+    role=current_user.role,
+    is_active=current_user.is_active,
+    created_at=current_user.created_at,
+    updated_at=current_user.updated_at
+  )
--- a/src/file-processor/app/api/routes/users.py
+++ b/src/file-processor/app/api/routes/users.py
@@ -0,0 +1,172 @@
+from fastapi import APIRouter, Depends, HTTPException
+from starlette import status
+
+from app.api.dependencies import get_admin_user, get_user_service
+from app.models.auth import UserResponse, MessageResponse
+from app.models.types import PyObjectId
+from app.models.user import UserInDB, UserCreate, UserUpdate
+from app.services.user_service import UserService
+
+router = APIRouter(tags=["users"])
+
+
+@router.get("", response_model=list[UserInDB])
+def list_users(
+    admin_user: UserInDB = Depends(get_admin_user),
+    user_service: UserService = Depends(get_user_service)
+):
+  """
+  List all users (admin only).
+
+  Args:
+      admin_user: Current admin user
+      user_service: User service instance
+
+  Returns:
+      List[UserResponse]: List of all users without sensitive data
+  """
+  return user_service.list_users()
+
+
+@router.get("/{user_id}", response_model=UserResponse)
+def get_user_by_id(
+    user_id: PyObjectId,
+    admin_user: UserInDB = Depends(get_admin_user),
+    user_service: UserService = Depends(get_user_service)
+):
+  """
+  Get specific user by ID (admin only).
+
+  Args:
+      user_id: User ID to retrieve
+      admin_user: Current admin user
+      user_service: User service instance
+
+  Returns:
+      UserResponse: User information without sensitive data
+
+  Raises:
+      HTTPException: If user not found
+  """
+  user = user_service.get_user_by_id(str(user_id))
+  if not user:
+    raise HTTPException(
+      status_code=status.HTTP_404_NOT_FOUND,
+      detail="User not found"
+    )
+  
+  return user
+
+
+@router.post("", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
+def create_user(
+    user_data: UserCreate,
+    admin_user: UserInDB = Depends(get_admin_user),
+    user_service: UserService = Depends(get_user_service)
+):
+  """
+  Create new user (admin only).
+
+  Args:
+      user_data: User creation data
+      admin_user: Current admin user
+      user_service: User service instance
+
+  Returns:
+      UserResponse: Created user information without sensitive data
+
+  Raises:
+      HTTPException: If user creation fails
+  """
+  try:
+    user = user_service.create_user(user_data)
+    return UserResponse(
+      _id=user.id,
+      username=user.username,
+      email=user.email,
+      role=user.role,
+      is_active=user.is_active,
+      created_at=user.created_at,
+      updated_at=user.updated_at
+    )
+  except ValueError as e:
+    raise HTTPException(
+      status_code=status.HTTP_400_BAD_REQUEST,
+      detail=str(e)
+    )
+
+
+@router.put("/{user_id}", response_model=UserResponse)
+def update_user(
+    user_id: PyObjectId,
+    user_data: UserUpdate,
+    admin_user: UserInDB = Depends(get_admin_user),
+    user_service: UserService = Depends(get_user_service)
+):
+  """
+  Update existing user (admin only).
+
+  Args:
+      user_id: User ID to update
+      user_data: User update data
+      admin_user: Current admin user
+      user_service: User service instance
+
+  Returns:
+      UserResponse: Updated user information without sensitive data
+
+  Raises:
+      HTTPException: If user not found or update fails
+  """
+  try:
+    user = user_service.update_user(str(user_id), user_data)
+    if not user:
+      raise HTTPException(
+        status_code=status.HTTP_404_NOT_FOUND,
+        detail="User not found"
+      )
+    
+    return UserResponse(
+      _id=user.id,
+      username=user.username,
+      email=user.email,
+      role=user.role,
+      is_active=user.is_active,
+      created_at=user.created_at,
+      updated_at=user.updated_at
+    )
+  except ValueError as e:
+    raise HTTPException(
+      status_code=status.HTTP_400_BAD_REQUEST,
+      detail=str(e)
+    )
+
+
+@router.delete("/{user_id}", response_model=MessageResponse)
+def delete_user(
+    user_id: PyObjectId,
+    admin_user: UserInDB = Depends(get_admin_user),
+    user_service: UserService = Depends(get_user_service)
+):
+  """
+  Delete user by ID (admin only).
+
+  Args:
+      user_id: User ID to delete
+      admin_user: Current admin user
+      user_service: User service instance
+
+  Returns:
+      MessageResponse: Success message
+
+  Raises:
+      HTTPException: If user not found or deletion fails
+  """
+  success = user_service.delete_user(str(user_id))
+  if not success:
+    raise HTTPException(
+      status_code=status.HTTP_404_NOT_FOUND,
+      detail="User not found"
+    )
+  
+  return MessageResponse(message="User successfully deleted")
--- a/src/file-processor/app/config/settings.py
+++ b/src/file-processor/app/config/settings.py
@@ -6,7 +6,6 @@ using simple os.getenv() approach without external validation libraries.
 """

 import os
-from typing import Optional


 def get_mongodb_url() -> str:
@@ -31,6 +30,26 @@ def get_mongodb_database_name() -> str:
  return os.getenv("MONGODB_DATABASE", "mydocmanager")


+def get_redis_url() -> str:
+  return os.getenv("REDIS_URL", "redis://localhost:6379/0")
+
+
+# def get_redis_host() -> str:
+#   redis_url = get_redis_url()
+#   if redis_url.startswith("redis://"):
+#     return redis_url.split("redis://")[1].split("/")[0]
+#   else:
+#     return redis_url
+#
+#
+# def get_redis_port() -> int:
+#   redis_url = get_redis_url()
+#   if redis_url.startswith("redis://"):
+#     return int(redis_url.split("redis://")[1].split("/")[0].split(":")[1])
+#   else:
+#     return int(redis_url.split(":")[1])
+
+
 def get_jwt_secret_key() -> str:
  """
  Get JWT secret key from environment variables.
@@ -82,4 +101,19 @@ def is_development_environment() -> bool:
  Returns:
      bool: True if development environment
  """
-  return os.getenv("ENVIRONMENT", "development").lower() == "development"
+  return os.getenv("ENVIRONMENT", "development").lower() == "development"
+
+
+def get_objects_folder() -> str:
+  """
+  Get Vault path from environment variables.
+
+  Returns:
+      str: Vault path
+  """
+  return os.getenv("OBJECTS_FOLDER", "/objects")
+
+
+def watch_directory() -> str:
+  """Directory to monitor for new files"""
+  return os.getenv("WATCH_DIRECTORY", "/watched_files")
--- a/src/file-processor/app/database/connection.py
+++ b/src/file-processor/app/database/connection.py
@@ -7,11 +7,12 @@ The application will terminate if MongoDB is not accessible at startup.

 import sys
 from typing import Optional
+
 from pymongo import MongoClient
 from pymongo.database import Database
 from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError

-from ..config.settings import get_mongodb_url, get_mongodb_database_name
+from app.config.settings import get_mongodb_url, get_mongodb_database_name

 # Global variables for singleton pattern
 _client: Optional[MongoClient] = None
@@ -107,6 +108,15 @@ def get_mongodb_client() -> Optional[MongoClient]:
  return _client


+def get_extra_args(session):
+  # Build kwargs only if session is provided
+  kwargs = {}
+  if session is not None:
+    kwargs["session"] = session
+  
+  return kwargs
+
+
 def test_database_connection() -> bool:
  """
  Test if database connection is working.
@@ -122,4 +132,4 @@ def test_database_connection() -> bool:
    db.command('ping')
    return True
  except Exception:
-    return False
+    return False
--- a/src/file-processor/app/database/repositories/document_repository.py
+++ b/src/file-processor/app/database/repositories/document_repository.py
@@ -0,0 +1,261 @@
+"""
+File repository for database operations on FileDocument collection.
+
+This module provides data access operations for file documents stored
+in MongoDB with proper error handling and type safety.
+"""
+
+from typing import Optional, List
+
+from bson import ObjectId
+from pymongo.collection import Collection
+from pymongo.database import Database
+from pymongo.errors import DuplicateKeyError, PyMongoError
+
+from app.database.connection import get_extra_args
+from app.models.document import FileDocument
+from app.utils.document_matching import fuzzy_matching, subsequence_matching
+
+
+class MatchMethodBase:
+  pass
+
+
+class SubsequenceMatching(MatchMethodBase):
+  pass
+
+
+class FuzzyMatching(MatchMethodBase):
+  def __init__(self, threshold: float = 0.6):
+    self.threshold = threshold
+
+
+class FileDocumentRepository:
+  """
+  Repository class for file document database operations.
+  
+  This class handles all database operations for FileDocument objects
+  with proper error handling and data validation.
+  """
+  
+  def __init__(self, database: Database):
+    """Initialize file repository with database connection."""
+    self.db = database
+    self.collection: Collection = self.db.documents
+  
+  def initialize(self):
+    """
+    Initialize repository by ensuring required indexes exist.
+
+    Should be called after repository instantiation to setup database indexes.
+    """
+    self._ensure_indexes()
+    return self
+  
+  def _ensure_indexes(self):
+    """
+    Ensure required database indexes exist.
+
+    Creates unique index on username field to prevent duplicates.
+    """
+    pass
+  
+  def create_document(self, file_data: FileDocument, session=None) -> FileDocument:
+    """
+    Create a new file document in database.
+    
+    Args:
+        file_data (FileDocument): File document data to create
+        session (AsyncIOMotorClientSession, optional): MongoDB session
+        
+    Returns:
+        FileDocument: Created document with database ID
+        
+    Raises:
+        ValueError: If file creation fails due to validation
+        DuplicateKeyError: If a document with same hash already exists
+    """
+    try:
+      file_dict = file_data.model_dump(by_alias=True, exclude_unset=True)
+      if "_id" in file_dict and file_dict["_id"] is None:
+        del file_dict["_id"]
+      
+      result = self.collection.insert_one(file_dict, **get_extra_args(session))
+      file_data.id = result.inserted_id
+      return file_data
+    
+    except DuplicateKeyError as e:
+      raise DuplicateKeyError(f"File with same file path already exists: {e}")
+    except PyMongoError as e:
+      raise ValueError(f"Failed to create file document: {e}")
+  
+  def find_document_by_id(self, file_id: str) -> Optional[FileDocument]:
+    """
+    Find file document by ID.
+    
+    Args:
+        file_id (str): File document ID to search for
+        
+    Returns:
+        FileDocument or None: File document if found, None otherwise
+    """
+    try:
+      if not ObjectId.is_valid(file_id):
+        return None
+      
+      file_doc = self.collection.find_one({"_id": ObjectId(file_id)})
+      if file_doc:
+        return FileDocument(**file_doc)
+      return None
+    
+    except PyMongoError:
+      return None
+  
+  def find_document_by_hash(self, file_hash: str) -> Optional[FileDocument]:
+    """
+    Find file document by file hash to detect duplicates.
+    
+    Args:
+        file_hash (str): SHA256 hash of file content
+        
+    Returns:
+        FileDocument or None: File document if found, None otherwise
+    """
+    try:
+      file_doc = self.collection.find_one({"file_hash": file_hash})
+      if file_doc:
+        return FileDocument(**file_doc)
+      return None
+    
+    except PyMongoError:
+      return None
+  
+  def find_document_by_filepath(self, filepath: str) -> Optional[FileDocument]:
+    """
+    Find file document by exact filepath.
+    
+    Args:
+        filepath (str): Full path to the file
+        
+    Returns:
+        FileDocument or None: File document if found, None otherwise
+    """
+    try:
+      file_doc = self.collection.find_one({"filepath": filepath})
+      if file_doc:
+        return FileDocument(**file_doc)
+      return None
+    
+    except PyMongoError:
+      return None
+  
+  def find_document_by_name(self, filename: str, matching_method: MatchMethodBase = None) -> List[FileDocument]:
+    """
+    Find file documents by filename using fuzzy matching.
+    
+    Args:
+        filename (str): Filename to search for
+        matching_method (MatchMethodBase): Minimum similarity ratio (0.0 to 1.0)
+        
+    Returns:
+        List[FileDocument]: List of matching files sorted by similarity score
+    """
+    try:
+      # Get all files from database
+      cursor = self.collection.find({})
+      all_documents = [FileDocument(**file_doc) for file_doc in cursor]
+      
+      if isinstance(matching_method, FuzzyMatching):
+        return fuzzy_matching(filename, all_documents, matching_method.threshold)
+      
+      return subsequence_matching(filename, all_documents)
+    
+    except PyMongoError:
+      return []
+  
+  def list_documents(self, skip: int = 0, limit: int = 100) -> List[FileDocument]:
+    """
+    List file documents with pagination.
+    
+    Args:
+        skip (int): Number of documents to skip (default: 0)
+        limit (int): Maximum number of documents to return (default: 100)
+        
+    Returns:
+        List[FileDocument]: List of file documents
+    """
+    try:
+      cursor = self.collection.find({}).skip(skip).limit(limit).sort("detected_at", -1)
+      return [FileDocument(**doc) for doc in cursor]
+    
+    except PyMongoError:
+      return []
+  
+  def count_documents(self) -> int:
+    """
+    Count total number of file documents.
+    
+    Returns:
+        int: Total number of file documents in collection
+    """
+    try:
+      return self.collection.count_documents({})
+    except PyMongoError:
+      return 0
+  
+  def update_document(self, file_id: str, update_data: dict, session=None) -> Optional[FileDocument]:
+    """
+    Update file document with new data.
+    
+    Args:
+        file_id (str): File document ID to update
+        update_data (dict): Fields to update
+        session (AsyncIOMotorClientSession, optional): MongoDB session
+        
+    Returns:
+        FileDocument or None: Updated file document if successful, None otherwise
+    """
+    try:
+      if not ObjectId.is_valid(file_id):
+        return None
+      
+      # Remove None values from update data
+      clean_update_data = {k: v for k, v in update_data.items() if v is not None}
+      
+      if not clean_update_data:
+        return self.find_document_by_id(file_id)
+      
+      result = self.collection.find_one_and_update(
+        {"_id": ObjectId(file_id)},
+        {"$set": clean_update_data},
+        return_document=True,
+        **get_extra_args(session)
+      )
+      
+      if result:
+        return FileDocument(**result)
+      return None
+    
+    except PyMongoError:
+      return None
+  
+  def delete_document(self, file_id: str, session=None) -> bool:
+    """
+    Delete file document from database.
+    
+    Args:
+        file_id (str): File document ID to delete
+        session (AsyncIOMotorClientSession, optional): MongoDB session
+        
+    Returns:
+        bool: True if file was deleted, False otherwise
+    """
+    try:
+      if not ObjectId.is_valid(file_id):
+        return False
+      
+      result = self.collection.delete_one({"_id": ObjectId(file_id)}, **get_extra_args(session))
+      return result.deleted_count > 0
+    
+    except PyMongoError:
+      return False
--- a/src/file-processor/app/database/repositories/job_repository.py
+++ b/src/file-processor/app/database/repositories/job_repository.py
@@ -0,0 +1,230 @@
+"""
+Repository for managing processing jobs in MongoDB.
+
+This module provides data access layer for ProcessingJob operations
+with automatic timestamp management and error handling.
+"""
+
+from datetime import datetime
+from typing import List, Optional
+
+from pymongo.collection import Collection
+from pymongo.database import Database
+from pymongo.errors import PyMongoError
+
+from app.exceptions.job_exceptions import JobRepositoryError
+from app.models.job import ProcessingJob, ProcessingStatus
+from app.models.types import PyObjectId
+
+
+class JobRepository:
+  """
+  Repository for processing job data access operations.
+
+  Provides CRUD operations for ProcessingJob documents with automatic
+  timestamp management and proper error handling.
+  """
+  
+  def __init__(self, database: Database):
+    """Initialize repository with MongoDB collection reference."""
+    self.db = database
+    self.collection: Collection = self.db.processing_jobs
+  
+  def _ensure_indexes(self):
+    """
+    Ensure required database indexes exist.
+
+    Creates unique index on username field to prevent duplicates.
+    """
+    try:
+      self.collection.create_index("document_id", unique=True)
+    except PyMongoError:
+      # Index might already exist, ignore error
+      pass
+  
+  def initialize(self):
+    """
+    Initialize repository by ensuring required indexes exist.
+
+    Should be called after repository instantiation to setup database indexes.
+    """
+    self._ensure_indexes()
+    return self
+  
+  def create_job(self, document_id: PyObjectId, task_id: Optional[str] = None) -> ProcessingJob:
+    """
+    Create a new processing job.
+
+    Args:
+        file_id: Reference to the file document
+        task_id: Optional Celery task UUID
+
+    Returns:
+        The created ProcessingJob
+
+    Raises:
+        JobRepositoryError: If database operation fails
+    """
+    try:
+      job_data = {
+          "document_id": document_id,
+          "status": ProcessingStatus.PENDING,
+          "task_id": task_id,
+          "created_at": datetime.now(),
+          "started_at": None,
+          "completed_at": None,
+          "error_message": None
+      }
+      
+      result = self.collection.insert_one(job_data)
+      job_data["_id"] = result.inserted_id
+      
+      return ProcessingJob(**job_data)
+    
+    except PyMongoError as e:
+      raise JobRepositoryError("create_job", e)
+  
+  def find_job_by_id(self, job_id: PyObjectId) -> Optional[ProcessingJob]:
+    """
+    Retrieve a job by its ID.
+
+    Args:
+        job_id: The job ObjectId
+
+    Returns:
+        The ProcessingJob document
+
+    Raises:
+        JobNotFoundError: If job doesn't exist
+        JobRepositoryError: If database operation fails
+    """
+    try:
+      job_data = self.collection.find_one({"_id": job_id})
+      if job_data:
+        return ProcessingJob(**job_data)
+      
+      return None
+    
+    except PyMongoError as e:
+      raise JobRepositoryError("get_job_by_id", e)
+  
+  def update_job_status(
+      self,
+      job_id: PyObjectId,
+      status: ProcessingStatus,
+      error_message: Optional[str] = None
+  ) -> Optional[ProcessingJob]:
+    """
+    Update job status with automatic timestamp management.
+
+    Args:
+        job_id: The job ObjectId
+        status: New processing status
+        error_message: Optional error message for failed jobs
+
+    Returns:
+        The updated ProcessingJob
+
+    Raises:
+        JobNotFoundError: If job doesn't exist
+        JobRepositoryError: If database operation fails
+    """
+    try:
+      # Prepare update data
+      update_data = {"status": status}
+      
+      # Set appropriate timestamp based on status
+      current_time = datetime.now()
+      if status == ProcessingStatus.PROCESSING:
+        update_data["started_at"] = current_time
+      elif status in (ProcessingStatus.COMPLETED, ProcessingStatus.FAILED):
+        update_data["completed_at"] = current_time
+      
+      # Add error message if provided
+      if error_message is not None:
+        update_data["error_message"] = error_message
+      
+      result = self.collection.find_one_and_update(
+        {"_id": job_id},
+        {"$set": update_data},
+        return_document=True
+      )
+      
+      if result:
+        return ProcessingJob(**result)
+      
+      return None
+    
+    except PyMongoError as e:
+      raise JobRepositoryError("update_job_status", e)
+  
+  def delete_job(self, job_id: PyObjectId) -> bool:
+    """
+    Delete a job from the database.
+
+    Args:
+        job_id: The job ObjectId
+
+    Returns:
+        True if job was deleted, False if not found
+
+    Raises:
+        JobRepositoryError: If database operation fails
+    """
+    try:
+      result = self.collection.delete_one({"_id": job_id})
+      
+      return result.deleted_count > 0
+    
+    except PyMongoError as e:
+      raise JobRepositoryError("delete_job", e)
+  
+  def find_jobs_by_document_id(self, document_id: PyObjectId) -> List[ProcessingJob]:
+    """
+    Retrieve all jobs for a specific file.
+
+    Args:
+        document_id: The file ObjectId
+
+    Returns:
+        List of ProcessingJob documents
+
+    Raises:
+        JobRepositoryError: If database operation fails
+    """
+    try:
+      cursor = self.collection.find({"document_id": document_id})
+      
+      jobs = []
+      for job_data in cursor:
+        jobs.append(ProcessingJob(**job_data))
+      
+      return jobs
+    
+    except PyMongoError as e:
+      raise JobRepositoryError("get_jobs_by_file_id", e)
+  
+  def get_jobs_by_status(self, status: ProcessingStatus) -> List[ProcessingJob]:
+    """
+    Retrieve all jobs with a specific status.
+
+    Args:
+        status: The processing status to filter by
+
+    Returns:
+        List of ProcessingJob documents
+
+    Raises:
+        JobRepositoryError: If database operation fails
+    """
+    try:
+      cursor = self.collection.find({"status": status})
+      
+      jobs = []
+      for job_data in cursor:
+        jobs.append(ProcessingJob(**job_data))
+      
+      return jobs
+    
+    except PyMongoError as e:
+      raise JobRepositoryError("get_jobs_by_status", e)
--- a/src/file-processor/app/database/repositories/user_repository.py
+++ b/src/file-processor/app/database/repositories/user_repository.py
@@ -2,17 +2,19 @@
 User repository for MongoDB operations.

 This module implements the repository pattern for user CRUD operations
-with dependency injection of the database connection.
+with dependency injection of the database connection using async/await.
 """

-from typing import Optional, List, Dict, Any
 from datetime import datetime
+from typing import Optional, List
+
 from bson import ObjectId
-from pymongo.database import Database
-from pymongo.errors import DuplicateKeyError
 from pymongo.collection import Collection
+from pymongo.database import Database
+from pymongo.errors import DuplicateKeyError, PyMongoError

 from app.models.user import UserCreate, UserInDB, UserUpdate
+from app.utils.security import hash_password


 class UserRepository:
@@ -20,7 +22,7 @@ class UserRepository:
  Repository class for user CRUD operations in MongoDB.

  This class handles all database operations related to users,
-  following the repository pattern with dependency injection.
+  following the repository pattern with dependency injection and async/await.
  """
  
  def __init__(self, database: Database):
@@ -28,13 +30,19 @@ class UserRepository:
    Initialize repository with database dependency.

    Args:
-        database (Database): MongoDB database instance
+        database (AsyncIOMotorDatabase): MongoDB database instance
    """
    self.db = database
    self.collection: Collection = database.users
-    
-    # Create unique index on username for duplicate prevention
+  
+  def initialize(self):
+    """
+    Initialize repository by ensuring required indexes exist.
+
+    Should be called after repository instantiation to setup database indexes.
+    """
    self._ensure_indexes()
+    return self
  
  def _ensure_indexes(self):
    """
@@ -44,7 +52,7 @@ class UserRepository:
    """
    try:
      self.collection.create_index("username", unique=True)
-    except Exception:
+    except PyMongoError:
      # Index might already exist, ignore error
      pass
  
@@ -60,23 +68,26 @@ class UserRepository:

    Raises:
        DuplicateKeyError: If username already exists
+        ValueError: If user creation fails due to validation
    """
    user_dict = {
        "username": user_data.username,
        "email": user_data.email,
-        "hashed_password": user_data.hashed_password,
+        "hashed_password": hash_password(user_data.password),
        "role": user_data.role,
-        "is_active": user_data.is_active,
-        "created_at": datetime.utcnow(),
-        "updated_at": datetime.utcnow()
+        "is_active": True,
+        "created_at": datetime.now(),
+        "updated_at": datetime.now()
    }
    
    try:
      result = self.collection.insert_one(user_dict)
      user_dict["_id"] = result.inserted_id
      return UserInDB(**user_dict)
-    except DuplicateKeyError:
-      raise DuplicateKeyError(f"User with username '{user_data.username}' already exists")
+    except DuplicateKeyError as e:
+      raise DuplicateKeyError(f"User with username '{user_data.username}' already exists: {e}")
+    except PyMongoError as e:
+      raise ValueError(f"Failed to create user: {e}")
  
  def find_user_by_username(self, username: str) -> Optional[UserInDB]:
    """
@@ -88,10 +99,13 @@ class UserRepository:
    Returns:
        UserInDB or None: User if found, None otherwise
    """
-    user_doc = self.collection.find_one({"username": username})
-    if user_doc:
-      return UserInDB(**user_doc)
-    return None
+    try:
+      user_doc = self.collection.find_one({"username": username})
+      if user_doc:
+        return UserInDB(**user_doc)
+      return None
+    except PyMongoError:
+      return None
  
  def find_user_by_id(self, user_id: str) -> Optional[UserInDB]:
    """
@@ -104,14 +118,15 @@ class UserRepository:
        UserInDB or None: User if found, None otherwise
    """
    try:
-      object_id = ObjectId(user_id)
-      user_doc = self.collection.find_one({"_id": object_id})
+      if not ObjectId.is_valid(user_id):
+        return None
+      
+      user_doc = self.collection.find_one({"_id": ObjectId(user_id)})
      if user_doc:
        return UserInDB(**user_doc)
-    except Exception:
-      # Invalid ObjectId format
-      pass
-    return None
+      return None
+    except PyMongoError:
+      return None
  
  def find_user_by_email(self, email: str) -> Optional[UserInDB]:
    """
@@ -123,10 +138,13 @@ class UserRepository:
    Returns:
        UserInDB or None: User if found, None otherwise
    """
-    user_doc = self.collection.find_one({"email": email})
-    if user_doc:
-      return UserInDB(**user_doc)
-    return None
+    try:
+      user_doc = self.collection.find_one({"email": email})
+      if user_doc:
+        return UserInDB(**user_doc)
+      return None
+    except PyMongoError:
+      return None
  
  def update_user(self, user_id: str, user_update: UserUpdate) -> Optional[UserInDB]:
    """
@@ -140,32 +158,41 @@ class UserRepository:
        UserInDB or None: Updated user if found, None otherwise
    """
    try:
-      object_id = ObjectId(user_id)
+      if not ObjectId.is_valid(user_id):
+        return None
      
      # Build update document with only provided fields
-      update_data = {"updated_at": datetime.utcnow()}
+      update_data = {"updated_at": datetime.now()}
      
+      if user_update.username is not None:
+        update_data["username"] = user_update.username
      if user_update.email is not None:
        update_data["email"] = user_update.email
-      if user_update.hashed_password is not None:
-        update_data["hashed_password"] = user_update.hashed_password
+      if user_update.password is not None:
+        update_data["hashed_password"] = hash_password(user_update.password)
      if user_update.role is not None:
        update_data["role"] = user_update.role
      if user_update.is_active is not None:
        update_data["is_active"] = user_update.is_active
      
-      result = self.collection.update_one(
-        {"_id": object_id},
-        {"$set": update_data}
+      # Remove None values from update data
+      clean_update_data = {k: v for k, v in update_data.items() if v is not None}
+      
+      if not clean_update_data:
+        return self.find_user_by_id(user_id)
+      
+      result = self.collection.find_one_and_update(
+        {"_id": ObjectId(user_id)},
+        {"$set": clean_update_data},
+        return_document=True
      )
      
-      if result.matched_count > 0:
-        return self.find_user_by_id(user_id)
+      if result:
+        return UserInDB(**result)
+      return None
    
-    except Exception:
-      # Invalid ObjectId format or other errors
-      pass
-    return None
+    except PyMongoError:
+      return None
  
  def delete_user(self, user_id: str) -> bool:
    """
@@ -178,11 +205,12 @@ class UserRepository:
        bool: True if user was deleted, False otherwise
    """
    try:
-      object_id = ObjectId(user_id)
-      result = self.collection.delete_one({"_id": object_id})
+      if not ObjectId.is_valid(user_id):
+        return False
+      
+      result = self.collection.delete_one({"_id": ObjectId(user_id)})
      return result.deleted_count > 0
-    except Exception:
-      # Invalid ObjectId format
+    except PyMongoError:
      return False
  
  def list_users(self, skip: int = 0, limit: int = 100) -> List[UserInDB]:
@@ -196,8 +224,12 @@ class UserRepository:
    Returns:
        List[UserInDB]: List of users
    """
-    cursor = self.collection.find().skip(skip).limit(limit)
-    return [UserInDB(**user_doc) for user_doc in cursor]
+    try:
+      cursor = self.collection.find({}).skip(skip).limit(limit).sort("created_at", -1)
+      user_docs = cursor.to_list(length=limit)
+      return [UserInDB(**user_doc) for user_doc in user_docs]
+    except PyMongoError:
+      return []
  
  def count_users(self) -> int:
    """
@@ -206,7 +238,10 @@ class UserRepository:
    Returns:
        int: Total number of users in database
    """
-    return self.collection.count_documents({})
+    try:
+      return self.collection.count_documents({})
+    except PyMongoError:
+      return 0
  
  def user_exists(self, username: str) -> bool:
    """
@@ -218,4 +253,8 @@ class UserRepository:
    Returns:
        bool: True if user exists, False otherwise
    """
-    return self.collection.count_documents({"username": username}) > 0
+    try:
+      count = self.collection.count_documents({"username": username})
+      return count > 0
+    except PyMongoError:
+      return False
--- a/src/file-processor/app/exceptions/init.py
+++ b/src/file-processor/app/exceptions/init.py
--- a/src/file-processor/app/exceptions/job_exceptions.py
+++ b/src/file-processor/app/exceptions/job_exceptions.py
@@ -0,0 +1,38 @@
+"""
+Custom exceptions for job management operations.
+
+This module defines specific exceptions for job processing lifecycle
+and repository operations to provide clear error handling.
+"""
+
+from app.models.job import ProcessingStatus
+
+
+class InvalidStatusTransitionError(Exception):
+  """
+  Raised when an invalid status transition is attempted.
+
+  This exception indicates that an attempt was made to change a job's
+  status to an invalid target status given the current status.
+  """
+  
+  def __init__(self, current_status: ProcessingStatus, target_status: ProcessingStatus):
+    self.current_status = current_status
+    self.target_status = target_status
+    super().__init__(
+      f"Invalid status transition from '{current_status}' to '{target_status}'"
+    )
+
+
+class JobRepositoryError(Exception):
+  """
+  Raised when a MongoDB operation fails in the job repository.
+
+  This exception wraps database-related errors that occur during
+  job repository operations.
+  """
+  
+  def __init__(self, operation: str, original_error: Exception):
+    self.operation = operation
+    self.original_error = original_error
+    super().__init__(f"Repository operation '{operation}' failed: {str(original_error)}")
--- a/src/file-processor/app/file_watcher.py
+++ b/src/file-processor/app/file_watcher.py
@@ -0,0 +1,243 @@
+"""
+File watcher implementation with Watchdog observer and ProcessingJob management.
+
+This module provides real-time file monitoring for document processing.
+When a file is created in the watched directory, it:
+1. Creates a document record via DocumentService
+2. Dispatches a Celery task for processing
+3. Creates a ProcessingJob to track the task lifecycle
+"""
+
+import logging
+import threading
+from pathlib import Path
+from typing import Optional
+
+from watchdog.events import FileSystemEventHandler, FileCreatedEvent
+from watchdog.observers import Observer
+
+from app.services.document_service import DocumentService
+from app.services.job_service import JobService
+
+logger = logging.getLogger(__name__)
+
+
+class DocumentFileEventHandler(FileSystemEventHandler):
+  """
+  Event handler for document file creation events.
+  
+  Processes newly created files by creating document records,
+  dispatching Celery tasks, and managing processing jobs.
+  """
+  
+  SUPPORTED_EXTENSIONS = {'.txt', '.pdf', '.docx'}
+  
+  def __init__(self, document_service: DocumentService, job_service: JobService):
+    """
+    Initialize the event handler.
+    
+    Args:
+        document_service: Service for document management
+        job_service: Service for processing job management
+    """
+    super().__init__()
+    self.document_service = document_service
+    self.job_service = job_service
+  
+  def on_created(self, event: FileCreatedEvent) -> None:
+    """
+    Handle file creation events.
+    
+    Args:
+        event: File system event containing file path information
+    """
+    if event.is_directory:
+      return
+    
+    filepath = event.src_path
+    file_extension = Path(filepath).suffix.lower()
+    
+    if file_extension not in self.SUPPORTED_EXTENSIONS:
+      logger.info(f"Ignoring unsupported file type: {filepath}")
+      return
+    
+    logger.info(f"Processing new file: {filepath}")
+    
+    #    try:
+    from tasks.document_processing import process_document
+    task_result = process_document.delay(filepath)
+    print(task_result)
+    print("hello world")
+    # task_id = task_result.task_id
+    # logger.info(f"Dispatched Celery task with ID: {task_id}")
+    
+    # except Exception as e:
+    #   logger.error(f"Failed to process file {filepath}: {str(e)}")
+    #   # Note: We don't re-raise the exception to keep the watcher running
+
+
+class FileWatcher:
+  """
+  File system watcher for automatic document processing.
+  
+  Monitors a directory for new files and triggers processing pipeline
+  using a dedicated observer thread.
+  """
+  
+  def __init__(
+      self,
+      watch_directory: str,
+      document_service: DocumentService,
+      job_service: JobService,
+      recursive: bool = True
+  ):
+    """
+    Initialize the file watcher.
+    
+    Args:
+        watch_directory: Directory path to monitor
+        document_service: Service for document management
+        job_service: Service for processing job management
+        recursive: Whether to watch subdirectories recursively
+    """
+    self.watch_directory = Path(watch_directory)
+    self.recursive = recursive
+    self.observer: Optional[Observer] = None
+    self._observer_thread: Optional[threading.Thread] = None
+    self._stop_event = threading.Event()
+    
+    # Validate watch directory
+    if not self.watch_directory.exists():
+      raise ValueError(f"Watch directory does not exist: {watch_directory}")
+    
+    if not self.watch_directory.is_dir():
+      raise ValueError(f"Watch path is not a directory: {watch_directory}")
+    
+    # Create event handler
+    self.event_handler = DocumentFileEventHandler(
+      document_service=document_service,
+      job_service=job_service
+    )
+    
+    logger.info(f"FileWatcher initialized for directory: {self.watch_directory}")
+  
+  def start(self) -> None:
+    """
+    Start the file watcher in a separate thread.
+    
+    Raises:
+        RuntimeError: If the watcher is already running
+    """
+    if self.is_running():
+      raise RuntimeError("FileWatcher is already running")
+    
+    self.observer = Observer()
+    self.observer.schedule(
+      self.event_handler,
+      str(self.watch_directory),
+      recursive=self.recursive
+    )
+    
+    # Start observer in separate thread
+    self._observer_thread = threading.Thread(
+      target=self._run_observer,
+      name="FileWatcher-Observer"
+    )
+    self._stop_event.clear()
+    self._observer_thread.start()
+    
+    logger.info("FileWatcher started successfully")
+  
+  def stop(self, timeout: float = 5.0) -> None:
+    """
+    Stop the file watcher gracefully.
+    
+    Args:
+        timeout: Maximum time to wait for graceful shutdown
+    """
+    if not self.is_running():
+      logger.warning("FileWatcher is not running")
+      return
+    
+    logger.info("Stopping FileWatcher...")
+    
+    # Signal stop and wait for observer thread
+    self._stop_event.set()
+    
+    if self.observer:
+      self.observer.stop()
+    
+    if self._observer_thread and self._observer_thread.is_alive():
+      self._observer_thread.join(timeout=timeout)
+      
+      if self._observer_thread.is_alive():
+        logger.warning("FileWatcher thread did not stop gracefully within timeout")
+      else:
+        logger.info("FileWatcher stopped gracefully")
+    
+    # Clean up
+    self.observer = None
+    self._observer_thread = None
+  
+  def is_running(self) -> bool:
+    """
+    Check if the file watcher is currently running.
+    
+    Returns:
+        True if the watcher is running, False otherwise
+    """
+    return (
+        self.observer is not None
+        and self._observer_thread is not None
+        and self._observer_thread.is_alive()
+    )
+  
+  def _run_observer(self) -> None:
+    """
+    Internal method to run the observer in a separate thread.
+    
+    This method should not be called directly.
+    """
+    if not self.observer:
+      logger.error("Observer not initialized")
+      return
+    
+    try:
+      self.observer.start()
+      logger.info("Observer thread started")
+      
+      # Keep the observer running until stop is requested
+      while not self._stop_event.is_set():
+        self._stop_event.wait(timeout=1.0)
+      
+      logger.info("Observer thread stopping...")
+    
+    except Exception as e:
+      logger.error(f"Observer thread error: {str(e)}")
+    finally:
+      if self.observer:
+        self.observer.join()
+        logger.info("Observer thread stopped")
+
+
+def create_file_watcher(
+    watch_directory: str,
+    document_service: DocumentService,
+    job_service: JobService
+) -> FileWatcher:
+  """
+  Factory function to create a FileWatcher instance.
+  
+  Args:
+      watch_directory: Directory path to monitor
+      document_service: Service for document management
+      job_service: Service for processing job management
+      
+  Returns:
+      Configured FileWatcher instance
+  """
+  return FileWatcher(
+    watch_directory=watch_directory,
+    document_service=document_service,
+    job_service=job_service
+  )
--- a/src/file-processor/app/main.py
+++ b/src/file-processor/app/main.py
@@ -1,120 +1,169 @@
 """
-FastAPI application for MyDocManager file processor service.
+FastAPI application with integrated FileWatcher for document processing.

-This service provides API endpoints for health checks and task dispatching.
+This module provides the main FastAPI application with:
+- JWT authentication
+- User management APIs
+- Real-time file monitoring via FileWatcher
+- Document processing via Celery tasks
 """

-import os
-from fastapi import FastAPI, HTTPException
-from pydantic import BaseModel
-import redis
-from celery import Celery
+import logging
+from contextlib import asynccontextmanager
+from typing import AsyncGenerator

-# Initialize FastAPI app
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+
+from app.api.routes.auth import router as auth_router
+from app.api.routes.users import router as users_router
+from app.config import settings
+from app.database.connection import get_database
+from app.file_watcher import create_file_watcher, FileWatcher
+from app.services.document_service import DocumentService
+from app.services.init_service import InitializationService
+from app.services.job_service import JobService
+from app.services.user_service import UserService
+
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# Global file watcher instance
+file_watcher: FileWatcher = None
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
+  """
+  FastAPI lifespan context manager.
+  
+  Handles application startup and shutdown events including:
+  - Database connection
+  - Default admin user creation
+  - FileWatcher startup/shutdown
+  """
+  global file_watcher
+  
+  # Startup
+  logger.info("Starting MyDocManager application...")
+  
+  try:
+    # Initialize database connection
+    database = get_database()
+    logger.info("Database connection established")
+    
+    document_service = DocumentService(database=database, objects_folder=settings.get_objects_folder())
+    job_service = JobService(database=database)
+    user_service = UserService(database=database)
+    logger.info("Service created")
+    
+    # Create default admin user
+    init_service = InitializationService(user_service)
+    init_service.initialize_application()
+    logger.info("Default admin user initialization completed")
+    
+    # Create and start file watcher
+    file_watcher = create_file_watcher(
+      watch_directory=settings.watch_directory(),
+      document_service=document_service,
+      job_service=job_service
+    )
+    file_watcher.start()
+    logger.info(f"FileWatcher started for directory: {settings.watch_directory()}")
+    
+    logger.info("Application startup completed successfully")
+    
+    yield
+  
+  except Exception as e:
+    logger.error(f"Application startup failed: {str(e)}")
+    raise
+  
+  finally:
+    # Shutdown
+    logger.info("Shutting down MyDocManager application...")
+    
+    if file_watcher and file_watcher.is_running():
+      file_watcher.stop()
+      logger.info("FileWatcher stopped")
+    
+    logger.info("Application shutdown completed")
+
+
+# Create FastAPI application
 app = FastAPI(
-  title="MyDocManager File Processor",
-  description="File processing and task dispatch service",
-  version="1.0.0"
+  title="MyDocManager",
+  description="Real-time document processing application with authentication",
+  version="0.1.0",
+  lifespan=lifespan
 )

-# Environment variables
-REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379/0")
-MONGODB_URL = os.getenv("MONGODB_URL", "mongodb://localhost:27017")
-
-# Initialize Redis client
-try:
-  redis_client = redis.from_url(REDIS_URL)
-except Exception as e:
-  redis_client = None
-  print(f"Warning: Could not connect to Redis: {e}")
-
-# Initialize Celery
-celery_app = Celery(
-  "file_processor",
-  broker=REDIS_URL,
-  backend=REDIS_URL
+# Configure CORS
+app.add_middleware(
+  CORSMiddleware,
+  allow_origins=["http://localhost:5173"],  # React frontend
+  allow_credentials=True,
+  allow_methods=["*"],
+  allow_headers=["*"],
 )

-
-# Pydantic models
-class TestTaskRequest(BaseModel):
-  """Request model for test task."""
-  message: str
+# Include routers
+app.include_router(auth_router, prefix="/auth", tags=["Authentication"])
+app.include_router(users_router, prefix="/users", tags=["User Management"])
+# app.include_router(documents_router, prefix="/documents", tags=["Documents"])
+# app.include_router(jobs_router, prefix="/jobs", tags=["Processing Jobs"])


@app.get("/health")
 async def health_check():
  """
  Health check endpoint.
-
+  
  Returns:
-      dict: Service health status with dependencies
+      Dictionary containing application health status
  """
-  health_status = {
+  return {
      "status": "healthy",
-      "service": "file-processor",
-      "dependencies": {
-          "redis": "unknown",
-          "mongodb": "unknown"
-      },
+      "service": "MyDocManager",
+      "version": "1.0.0",
+      "file_watcher_running": file_watcher.is_running() if file_watcher else False
  }
-  
-  # Check Redis connection
-  if redis_client:
-    try:
-      redis_client.ping()
-      health_status["dependencies"]["redis"] = "connected"
-    except Exception:
-      health_status["dependencies"]["redis"] = "disconnected"
-      health_status["status"] = "degraded"
-  
-  return health_status
-
-
-@app.post("/test-task")
-async def dispatch_test_task(request: TestTaskRequest):
-  """
-  Dispatch a test task to Celery worker.
-
-  Args:
-      request: Test task request containing message
-
-  Returns:
-      dict: Task dispatch information
-
-  Raises:
-      HTTPException: If task dispatch fails
-  """
-  try:
-    # Send task to worker
-    task = celery_app.send_task(
-      "main.test_task",
-      args=[request.message]
-    )
-    
-    return {
-        "status": "dispatched",
-        "task_id": task.id,
-        "message": f"Test task dispatched with message: {request.message}"
-    }
-  
-  except Exception as e:
-    raise HTTPException(
-      status_code=500,
-      detail=f"Failed to dispatch task: {str(e)}"
-    )


@app.get("/")
 async def root():
  """
-  Root endpoint.
-
+  Root endpoint with basic application information.
+  
  Returns:
-      dict: Basic service information
+      Dictionary containing welcome message and available endpoints
  """
  return {
-      "service": "MyDocManager File Processor",
-      "version": "1.0.0",
-      "status": "running"
-  }
+      "message": "Welcome to MyDocManager",
+      "description": "Real-time document processing application",
+      "docs": "/docs",
+      "health": "/health"
+  }
+
+
+@app.get("/watcher/status")
+async def watcher_status():
+  """
+  Get file watcher status.
+  
+  Returns:
+      Dictionary containing file watcher status information
+  """
+  if not file_watcher:
+    return {
+        "status": "not_initialized",
+        "running": False
+    }
+  
+  return {
+      "status": "initialized",
+      "running": file_watcher.is_running(),
+      "watch_directory": str(file_watcher.watch_directory),
+      "recursive": file_watcher.recursive
+  }
--- a/src/file-processor/app/models/auth.py
+++ b/src/file-processor/app/models/auth.py
@@ -3,12 +3,45 @@ Authentication models and enums for user management.

 Contains user roles enumeration and authentication-related Pydantic models.
 """
-
+from datetime import datetime
 from enum import Enum

+from pydantic import BaseModel, Field
+
+from app.models.types import PyObjectId
+

 class UserRole(str, Enum):
  """User roles enumeration with string values."""
  
  USER = "user"
-  ADMIN = "admin"
+  ADMIN = "admin"
+
+
+class UserResponse(BaseModel):
+  """Model for user data in API responses (excludes password_hash)."""
+  
+  id: PyObjectId = Field(alias="_id")
+  username: str
+  email: str
+  role: UserRole
+  is_active: bool
+  created_at: datetime
+  updated_at: datetime
+  
+  model_config = {
+      "populate_by_name": True,
+      "arbitrary_types_allowed": True,
+  }
+
+
+class LoginResponse(BaseModel):
+  """Response model for successful login."""
+  access_token: str
+  token_type: str = "bearer"
+  user: UserResponse
+
+
+class MessageResponse(BaseModel):
+  """Generic message response."""
+  message: str
--- a/src/file-processor/app/models/document.py
+++ b/src/file-processor/app/models/document.py
@@ -0,0 +1,70 @@
+"""
+Pydantic models for document processing collections.
+
+This module defines the data models for file documents and processing jobs
+stored in MongoDB collections.
+"""
+
+from datetime import datetime
+from enum import Enum
+from typing import Any, Dict, Optional
+
+from bson import ObjectId
+from pydantic import BaseModel, Field, field_validator
+
+from app.models.types import PyObjectId
+
+
+class FileType(str, Enum):
+  """Supported file types for document processing."""
+  
+  TXT = "txt"
+  PDF = "pdf"
+  DOCX = "docx"
+  JPG = "jpg"
+  PNG = "png"
+
+
+class ExtractionMethod(str, Enum):
+  """Methods used to extract content from documents."""
+  
+  DIRECT_TEXT = "direct_text"
+  OCR = "ocr"
+  HYBRID = "hybrid"
+
+
+class FileDocument(BaseModel):
+  """
+  Model for file documents stored in the 'files' collection.
+
+  Represents a file detected in the watched directory with its
+  metadata and extracted content.
+  """
+  
+  id: Optional[PyObjectId] = Field(default=None, alias="_id")
+  filename: str = Field(..., description="Original filename")
+  filepath: str = Field(..., description="Full path to the file")
+  file_type: FileType = Field(..., description="Type of the file")
+  extraction_method: Optional[ExtractionMethod] = Field(default=None, description="Method used to extract content")
+  metadata: Dict[str, Any] = Field(default_factory=dict, description="File-specific metadata")
+  detected_at: Optional[datetime] = Field(default=None, description="Timestamp when file was detected")
+  file_hash: Optional[str] = Field(default=None, description="SHA256 hash of file content")
+  encoding: str = Field(default="utf-8", description="Character encoding for text files")
+  file_size: int = Field(..., ge=0, description="File size in bytes")
+  mime_type: str = Field(..., description="MIME type detected")
+  
+  @field_validator('filepath')
+  @classmethod
+  def validate_filepath(cls, v: str) -> str:
+    """Validate filepath format."""
+    if not v.strip():
+      raise ValueError("Filepath cannot be empty")
+    return v.strip()
+  
+  @field_validator('filename')
+  @classmethod
+  def validate_filename(cls, v: str) -> str:
+    """Validate filename format."""
+    if not v.strip():
+      raise ValueError("Filename cannot be empty")
+    return v.strip()
--- a/src/file-processor/app/models/job.py
+++ b/src/file-processor/app/models/job.py
@@ -0,0 +1,42 @@
+from datetime import datetime
+from enum import Enum
+from typing import Optional
+
+from bson import ObjectId
+from pydantic import BaseModel, Field, field_validator
+
+from app.models.types import PyObjectId
+
+
+class ProcessingStatus(str, Enum):
+  """Status values for processing jobs."""
+  
+  PENDING = "pending"
+  PROCESSING = "processing"
+  COMPLETED = "completed"
+  FAILED = "failed"
+
+
+class ProcessingJob(BaseModel):
+  """
+  Model for processing jobs stored in the 'processing_jobs' collection.
+
+  Tracks the lifecycle and status of document processing tasks.
+  """
+  
+  id: Optional[PyObjectId] = Field(default=None, alias="_id")
+  document_id: PyObjectId = Field(..., description="Reference to file document")
+  status: ProcessingStatus = Field(default=ProcessingStatus.PENDING, description="Current processing status")
+  task_id: Optional[str] = Field(default=None, description="Celery task UUID")
+  created_at: Optional[datetime] = Field(default=None, description="Timestamp when job was created")
+  started_at: Optional[datetime] = Field(default=None, description="Timestamp when processing started")
+  completed_at: Optional[datetime] = Field(default=None, description="Timestamp when processing completed")
+  error_message: Optional[str] = Field(default=None, description="Error message if processing failed")
+  
+  @field_validator('error_message')
+  @classmethod
+  def validate_error_message(cls, v: Optional[str]) -> Optional[str]:
+    """Clean up error message."""
+    if v is not None:
+      return v.strip() if v.strip() else None
+    return v
--- a/src/file-processor/app/models/types.py
+++ b/src/file-processor/app/models/types.py
@@ -0,0 +1,32 @@
+from typing import Any
+
+from bson import ObjectId
+from pydantic_core import core_schema
+
+
+class PyObjectId(ObjectId):
+  """Custom ObjectId type for Pydantic v2 compatibility."""
+  
+  @classmethod
+  def __get_pydantic_core_schema__(
+      cls, source_type: Any, handler
+  ) -> core_schema.CoreSchema:
+    return core_schema.json_or_python_schema(
+      json_schema=core_schema.str_schema(),
+      python_schema=core_schema.union_schema([
+          core_schema.is_instance_schema(ObjectId),
+          core_schema.chain_schema([
+              core_schema.str_schema(),
+              core_schema.no_info_plain_validator_function(cls.validate),
+          ])
+      ]),
+      serialization=core_schema.plain_serializer_function_ser_schema(
+        lambda x: str(x)
+      ),
+    )
+  
+  @classmethod
+  def validate(cls, v):
+    if not ObjectId.is_valid(v):
+      raise ValueError("Invalid ObjectId")
+    return ObjectId(v)
--- a/src/file-processor/app/models/user.py
+++ b/src/file-processor/app/models/user.py
@@ -7,173 +7,134 @@ and API responses with proper validation and type safety.

 import re
 from datetime import datetime
-from typing import Optional, Any
+from typing import Optional
+
 from bson import ObjectId
 from pydantic import BaseModel, Field, field_validator, EmailStr
-from pydantic_core import core_schema

 from app.models.auth import UserRole
-
-
-class PyObjectId(ObjectId):
-    """Custom ObjectId type for Pydantic v2 compatibility."""
-    
-    @classmethod
-    def __get_pydantic_core_schema__(
-        cls, source_type: Any, handler
-    ) -> core_schema.CoreSchema:
-        return core_schema.json_or_python_schema(
-            json_schema=core_schema.str_schema(),
-            python_schema=core_schema.union_schema([
-                core_schema.is_instance_schema(ObjectId),
-                core_schema.chain_schema([
-                    core_schema.str_schema(),
-                    core_schema.no_info_plain_validator_function(cls.validate),
-                ])
-            ]),
-            serialization=core_schema.plain_serializer_function_ser_schema(
-                lambda x: str(x)
-            ),
-        )
-    
-    @classmethod
-    def validate(cls, v):
-        if not ObjectId.is_valid(v):
-            raise ValueError("Invalid ObjectId")
-        return ObjectId(v)
+from app.models.types import PyObjectId


 def validate_password_strength(password: str) -> str:
-    """
-    Validate password meets security requirements.
-    
-    Requirements:
-    - At least 8 characters long
-    - Contains at least one uppercase letter
-    - Contains at least one lowercase letter
-    - Contains at least one digit
-    - Contains at least one special character
-    
-    Args:
-        password: The password string to validate
-        
-    Returns:
-        str: The validated password
-        
-    Raises:
-        ValueError: If password doesn't meet requirements
-    """
-    if len(password) < 8:
-        raise ValueError("Password must be at least 8 characters long")
-    
-    if not re.search(r'[A-Z]', password):
-        raise ValueError("Password must contain at least one uppercase letter")
-    
-    if not re.search(r'[a-z]', password):
-        raise ValueError("Password must contain at least one lowercase letter")
-    
-    if not re.search(r'\d', password):
-        raise ValueError("Password must contain at least one digit")
-    
-    if not re.search(r'[!@#$%^&*()_+\-=\[\]{};:"\\|,.<>\/?]', password):
-        raise ValueError("Password must contain at least one special character")
-    
-    return password
+  """
+  Validate password meets security requirements.
+  
+  Requirements:
+  - At least 8 characters long
+  - Contains at least one uppercase letter
+  - Contains at least one lowercase letter
+  - Contains at least one digit
+  - Contains at least one special character
+  
+  Args:
+      password: The password string to validate
+      
+  Returns:
+      str: The validated password
+      
+  Raises:
+      ValueError: If password doesn't meet requirements
+  """
+  if len(password) < 8:
+    raise ValueError("Password must be at least 8 characters long")
+  
+  if not re.search(r'[A-Z]', password):
+    raise ValueError("Password must contain at least one uppercase letter")
+  
+  if not re.search(r'[a-z]', password):
+    raise ValueError("Password must contain at least one lowercase letter")
+  
+  if not re.search(r'\d', password):
+    raise ValueError("Password must contain at least one digit")
+  
+  if not re.search(r'[!@#$%^&*()_+\-=\[\]{};:"\\|,.<>\/?]', password):
+    raise ValueError("Password must contain at least one special character")
+  
+  return password


 def validate_username_not_empty(username: str) -> str:
-    """
-    Validate username is not empty or whitespace only.
-    
-    Args:
-        username: The username string to validate
-        
-    Returns:
-        str: The validated username
-        
-    Raises:
-        ValueError: If username is empty or whitespace only
-    """
-    if not username or not username.strip():
-        raise ValueError("Username cannot be empty or whitespace only")
-    
-    return username.strip()
+  """
+  Validate username is not empty or whitespace only.
+  
+  Args:
+      username: The username string to validate
+      
+  Returns:
+      str: The validated username
+      
+  Raises:
+      ValueError: If username is empty or whitespace only
+  """
+  if not username or not username.strip():
+    raise ValueError("Username cannot be empty or whitespace only")
+  
+  return username.strip()


-class UserCreate(BaseModel):
-    """Model for creating a new user."""
-    
-    username: str
-    email: EmailStr
-    password: str
-    role: UserRole = UserRole.USER
-    
-    @field_validator('username')
-    @classmethod
-    def validate_username(cls, v):
-        return validate_username_not_empty(v)
-    
-    @field_validator('password')
-    @classmethod
-    def validate_password(cls, v):
-        return validate_password_strength(v)
+class UserCreateNoValidation(BaseModel):
+  """Model for creating a new user."""
+  
+  username: str
+  email: str
+  password: str
+  role: UserRole = UserRole.USER
+
+
+class UserCreate(UserCreateNoValidation):
+  """Model for creating a new user."""
+  email: EmailStr
+  
+  @field_validator('username')
+  @classmethod
+  def validate_username(cls, v):
+    return validate_username_not_empty(v)
+  
+  @field_validator('password')
+  @classmethod
+  def validate_password(cls, v):
+    return validate_password_strength(v)


 class UserUpdate(BaseModel):
-    """Model for updating an existing user."""
-    
-    username: Optional[str] = None
-    email: Optional[EmailStr] = None
-    password: Optional[str] = None
-    role: Optional[UserRole] = None
-    
-    @field_validator('username')
-    @classmethod
-    def validate_username(cls, v):
-        if v is not None:
-            return validate_username_not_empty(v)
-        return v
-    
-    @field_validator('password')
-    @classmethod
-    def validate_password(cls, v):
-        if v is not None:
-            return validate_password_strength(v)
-        return v
+  """Model for updating an existing user."""
+  
+  username: Optional[str] = None
+  email: Optional[EmailStr] = None
+  password: Optional[str] = None
+  role: Optional[UserRole] = None
+  is_active: Optional[bool] = None
+  
+  @field_validator('username')
+  @classmethod
+  def validate_username(cls, v):
+    if v is not None:
+      return validate_username_not_empty(v)
+    return v
+  
+  @field_validator('password')
+  @classmethod
+  def validate_password(cls, v):
+    if v is not None:
+      return validate_password_strength(v)
+    return v


 class UserInDB(BaseModel):
-    """Model for user data stored in database."""
-    
-    id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
-    username: str
-    email: str
-    password_hash: str
-    role: UserRole
-    is_active: bool = True
-    created_at: datetime
-    updated_at: datetime
-    
-    model_config = {
-        "populate_by_name": True,
-        "arbitrary_types_allowed": True,
-        "json_encoders": {ObjectId: str}
-    }
-
-
-class UserResponse(BaseModel):
-    """Model for user data in API responses (excludes password_hash)."""
-    
-    id: PyObjectId = Field(alias="_id")
-    username: str
-    email: str
-    role: UserRole
-    is_active: bool
-    created_at: datetime
-    updated_at: datetime
-    
-    model_config = {
-        "populate_by_name": True,
-        "arbitrary_types_allowed": True,
-        "json_encoders": {ObjectId: str}
-    }
+  """Model for user data stored in database."""
+  
+  id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
+  username: str
+  email: str
+  hashed_password: str
+  role: UserRole
+  is_active: bool = True
+  created_at: datetime
+  updated_at: datetime
+  
+  model_config = {
+      "populate_by_name": True,
+      "arbitrary_types_allowed": True,
+      "json_encoders": {ObjectId: str}
+  }
--- a/src/file-processor/app/services/init.py
+++ b/src/file-processor/app/services/init.py
--- a/src/file-processor/app/services/auth_service.py
+++ b/src/file-processor/app/services/auth_service.py
@@ -0,0 +1,84 @@
+"""
+Authentication service for password hashing and verification.
+
+This module provides authentication-related functionality including
+password hashing, verification, and JWT token management.
+"""
+from datetime import datetime, timedelta
+
+import jwt
+
+from app.config import settings
+from app.utils.security import hash_password, verify_password
+
+
+class AuthService:
+  """
+  Service class for authentication operations.
+
+  Handles password hashing, verification, and other authentication
+  related operations with proper security practices.
+  """
+  
+  @staticmethod
+  def hash_user_password(password: str) -> str:
+    """
+    Hash a plaintext password for secure storage.
+
+    Args:
+        password (str): Plaintext password to hash
+
+    Returns:
+        str: Hashed password safe for database storage
+
+    Example:
+        >>> auth = AuthService()
+        >>> hashed = auth.hash_user_password("mypassword123")
+        >>> len(hashed) > 0
+        True
+    """
+    return hash_password(password)
+  
+  @staticmethod
+  def verify_user_password(password: str, hashed_password: str) -> bool:
+    """
+    Verify a password against its hash.
+
+    Args:
+        password (str): Plaintext password to verify
+        hashed_password (str): Stored hashed password
+
+    Returns:
+        bool: True if password matches hash, False otherwise
+
+    Example:
+        >>> auth = AuthService()
+        >>> hashed = auth.hash_user_password("mypassword123")
+        >>> auth.verify_user_password("mypassword123", hashed)
+        True
+        >>> auth.verify_user_password("wrongpassword", hashed)
+        False
+    """
+    return verify_password(password, hashed_password)
+  
+  @staticmethod
+  def create_access_token(data=dict) -> str:
+    """
+      Create a JWT access token.
+
+      Args:
+          data (dict): Payload data to include in the token.
+
+      Returns:
+          str: Encoded JWT token.
+      """
+    # Copy data to avoid modifying the original dict
+    to_encode = data.copy()
+    
+    # Add expiration time
+    expire = datetime.now() + timedelta(hours=settings.get_jwt_expire_hours())
+    to_encode.update({"exp": expire})
+    
+    # Encode JWT
+    encoded_jwt = jwt.encode(to_encode, settings.get_jwt_secret_key(), algorithm=settings.get_jwt_algorithm())
+    return encoded_jwt
--- a/src/file-processor/app/services/document_service.py
+++ b/src/file-processor/app/services/document_service.py
@@ -0,0 +1,334 @@
+"""
+Document service for orchestrated file and content management.
+
+This service coordinates between FileDocument and DocumentContent repositories
+while maintaining data consistency through MongoDB transactions.
+"""
+
+import hashlib
+import os
+from datetime import datetime
+from pathlib import Path
+from typing import List, Optional, Dict, Any
+
+import magic
+from pymongo.errors import PyMongoError
+
+from app.config.settings import get_objects_folder
+from app.database.repositories.document_repository import FileDocumentRepository
+from app.models.document import (
+  FileDocument,
+  FileType,
+)
+from app.models.types import PyObjectId
+
+
+class DocumentService:
+  """
+  Service for orchestrated document and content management.
+
+  Provides high-level operations that coordinate between file documents
+  and their content while ensuring data consistency through transactions.
+  """
+  
+  def __init__(self, database, objects_folder: str = None):
+    """
+    Initialize the document service with repository dependencies.
+    
+    Args:
+        database: Database instance
+        objects_folder: folder to store files by their hash
+    """
+    
+    self.db = database
+    self.document_repository = FileDocumentRepository(self.db)
+    self.objects_folder = objects_folder or get_objects_folder()
+  
+  def initialize(self):
+    self.document_repository.initialize()
+    return self
+  
+  @staticmethod
+  def _calculate_file_hash(file_bytes: bytes) -> str:
+    """
+    Calculate SHA256 hash of file content.
+
+    Args:
+        file_bytes: Raw file content as bytes
+
+    Returns:
+        Hexadecimal SHA256 hash string
+    """
+    return hashlib.sha256(file_bytes).hexdigest()
+  
+  @staticmethod
+  def _detect_file_type(file_path: str) -> FileType:
+    """
+    Detect file type from file extension.
+
+    Args:
+        file_path: Path to the file
+
+    Returns:
+        FileType enum value
+
+    Raises:
+        ValueError: If file type is not supported
+    """
+    extension = Path(file_path).suffix.lower().lstrip('.')
+    
+    try:
+      return FileType(extension)
+    except ValueError:
+      raise ValueError(f"Unsupported file type: {extension}")
+  
+  @staticmethod
+  def _detect_mime_type(file_bytes: bytes) -> str:
+    """
+    Detect MIME type from file content.
+
+    Args:
+        file_bytes: Raw file content as bytes
+
+    Returns:
+        MIME type string
+    """
+    return magic.from_buffer(file_bytes, mime=True)
+  
+  @staticmethod
+  def _read_file_bytes(file_path: str | Path) -> bytes:
+    """
+    Read file content as bytes asynchronously.
+
+    Args:
+        file_path (str | Path): Path of the file to read
+
+    Returns:
+        bytes: Content of the file
+
+    Raises:
+        FileNotFoundError: If the file does not exist
+        OSError: If any I/O error occurs
+    """
+    path = Path(file_path)
+    
+    if not path.exists():
+      raise FileNotFoundError(f"File not found: {file_path}")
+    
+    return path.read_bytes()
+  
+  def _get_document_path(self, file_hash):
+    """
+
+    :param file_hash:
+    :return:
+    """
+    return os.path.join(self.objects_folder, file_hash[:24], file_hash)
+  
+  def save_content_if_needed(self, file_hash, content: bytes):
+    target_path = self._get_document_path(file_hash)
+    if os.path.exists(target_path):
+      return
+    
+    if not os.path.exists(os.path.dirname(target_path)):
+      os.makedirs(os.path.dirname(target_path))
+    
+    with open(target_path, "wb") as f:
+      f.write(content)
+  
+  def create_document(
+      self,
+      file_path: str,
+      file_bytes: bytes | None = None,
+      encoding: str = "utf-8"
+  ) -> FileDocument:
+    """
+    Create a new document with automatic deduplication.
+
+    This method handles the creation of both FileDocument and DocumentContent
+    with proper deduplication based on file hash. If content with the same
+    hash already exists, only a new FileDocument is created.
+
+    Args:
+        file_path: Full path to the file
+        file_bytes: Raw file content as bytes
+        encoding: Character encoding for text content
+
+    Returns:
+        Created FileDocument instance
+
+    Raises:
+        ValueError: If file type is not supported
+        PyMongoError: If database operation fails
+    """
+    # Calculate automatic attributes
+    file_bytes = file_bytes if file_bytes is not None else self._read_file_bytes(file_path)
+    file_hash = self._calculate_file_hash(file_bytes)
+    file_type = self._detect_file_type(file_path)
+    mime_type = self._detect_mime_type(file_bytes)
+    file_size = len(file_bytes)
+    filename = Path(file_path).name
+    detected_at = datetime.now()
+    
+    try:
+      self.save_content_if_needed(file_hash, file_bytes)
+      
+      # Create FileDocument
+      file_data = FileDocument(
+        filename=filename,
+        filepath=file_path,
+        file_type=file_type,
+        extraction_method=None,  # Will be set by processing workers
+        metadata={},  # Empty for now
+        detected_at=detected_at,
+        file_hash=file_hash,
+        encoding=encoding,
+        file_size=file_size,
+        mime_type=mime_type
+      )
+      
+      created_file = self.document_repository.create_document(file_data)
+      
+      return created_file
+    
+    except Exception as e:
+      # Transaction will automatically rollback if supported
+      raise PyMongoError(f"Failed to create document: {str(e)}")
+  
+  def get_document_by_id(self, document_id: PyObjectId) -> Optional[FileDocument]:
+    """
+    Retrieve a document by its ID.
+
+    Args:
+        document_id: Document ObjectId
+
+    Returns:
+        FileDocument if found, None otherwise
+    """
+    return self.document_repository.find_document_by_id(str(document_id))
+  
+  def get_document_by_hash(self, file_hash: str) -> Optional[FileDocument]:
+    """
+    Retrieve a document by its file hash.
+
+    Args:
+        file_hash: SHA256 hash of file content
+
+    Returns:
+        FileDocument if found, None otherwise
+    """
+    return self.document_repository.find_document_by_hash(file_hash)
+  
+  def get_document_by_filepath(self, filepath: str) -> Optional[FileDocument]:
+    """
+    Retrieve a document by its file path.
+
+    Args:
+        filepath: Full path to the file
+
+    Returns:
+        FileDocument if found, None otherwise
+    """
+    return self.document_repository.find_document_by_filepath(filepath)
+  
+  def get_document_content_by_hash(self, file_hash):
+    target_path = self._get_document_path(file_hash)
+    if not os.path.exists(target_path):
+      return None
+    
+    with open(target_path, "rb") as f:
+      return f.read()
+  
+  def list_documents(
+      self,
+      skip: int = 0,
+      limit: int = 100
+  ) -> List[FileDocument]:
+    """
+    List documents with pagination.
+
+    Args:
+        skip: Number of documents to skip
+        limit: Maximum number of documents to return
+
+    Returns:
+        List of FileDocument instances
+    """
+    return self.document_repository.list_documents(skip=skip, limit=limit)
+  
+  def count_documents(self) -> int:
+    """
+    Get total number of documents.
+
+    Returns:
+        Total document count
+    """
+    return self.document_repository.count_documents()
+  
+  def update_document(
+      self,
+      document_id: PyObjectId,
+      update_data: Dict[str, Any]
+  ) -> Optional[FileDocument]:
+    """
+    Update document metadata.
+
+    Args:
+        document_id: Document ObjectId
+        update_data: Dictionary with fields to update
+
+    Returns:
+        Updated FileDocument if found, None otherwise
+    """
+    if "file_bytes" in update_data:
+      file_hash = self._calculate_file_hash(update_data["file_bytes"])
+      update_data["file_hash"] = file_hash
+      self.save_content_if_needed(file_hash, update_data["file_bytes"])
+    
+    return self.document_repository.update_document(document_id, update_data)
+  
+  def delete_document(self, document_id: PyObjectId) -> bool:
+    """
+    Delete a document and its orphaned content.
+
+    This method removes the FileDocument and checks if the associated
+    DocumentContent is orphaned (no other files reference it). If orphaned,
+    the content is also deleted.
+
+    Args:
+        document_id: Document ObjectId
+
+    Returns:
+        True if document was deleted, False otherwise
+
+    Raises:
+        PyMongoError: If database operation fails
+    """
+    # Start transaction
+    
+    try:
+      # Get document to find its hash
+      document = self.document_repository.find_document_by_id(document_id)
+      if not document:
+        return False
+      
+      # Delete the document
+      deleted = self.document_repository.delete_document(document_id)
+      if not deleted:
+        return False
+      
+      # Check if content is orphaned
+      remaining_files = self.document_repository.find_document_by_hash(document.file_hash)
+      
+      # If no other files reference this content, delete it
+      if not remaining_files:
+        try:
+          os.remove(self._get_document_path(document.file_hash))
+        except Exception:
+          pass
+      
+      return True
+    
+    except Exception as e:
+      # Transaction will automatically rollback if supported
+      raise PyMongoError(f"Failed to delete document: {str(e)}")
--- a/src/file-processor/app/services/init_service.py
+++ b/src/file-processor/app/services/init_service.py
@@ -0,0 +1,152 @@
+"""
+Initialization service for application startup tasks.
+
+This module handles application initialization tasks including
+creating default admin user if none exists.
+"""
+
+import logging
+from typing import Optional
+
+from app.models.auth import UserRole
+from app.models.user import UserInDB, UserCreateNoValidation
+from app.services.user_service import UserService
+
+logger = logging.getLogger(__name__)
+
+
+class InitializationService:
+  """
+  Service for handling application initialization tasks.
+
+  This service manages startup operations like ensuring required
+  users exist and system is properly configured.
+  """
+  
+  def __init__(self, user_service: UserService):
+    """
+    Initialize service with user service dependency.
+
+    Args:
+        user_service (UserService): Service for user operations
+    """
+    self.user_service = user_service
+  
+  def ensure_admin_user_exists(self) -> Optional[UserInDB]:
+    """
+    Ensure default admin user exists in the system.
+
+    Creates a default admin user if no admin user exists in the system.
+    Uses default credentials that should be changed after first login.
+
+    Returns:
+        UserInDB or None: Created admin user if created, None if already exists
+
+    Raises:
+        Exception: If admin user creation fails
+    """
+    logger.info("Checking if admin user exists...")
+    
+    # Check if any admin user already exists
+    if self._admin_user_exists():
+      logger.info("Admin user already exists, skipping creation")
+      return None
+    
+    logger.info("No admin user found, creating default admin user...")
+    
+    try:
+      # Create default admin user
+      admin_data = UserCreateNoValidation(
+        username="admin",
+        email="admin@mydocmanager.local",
+        password="admin",  # Should be changed after first login
+        role=UserRole.ADMIN
+      )
+      
+      created_user = self.user_service.create_user(admin_data)
+      logger.info(f"Default admin user created successfully with ID: {created_user.id}")
+      logger.warning(
+        "Default admin user created with username 'admin' and password 'admin'. "
+        "Please change these credentials immediately for security!"
+      )
+      
+      return created_user
+    
+    except Exception as e:
+      logger.error(f"Failed to create default admin user: {str(e)}")
+      raise Exception(f"Admin user creation failed: {str(e)}")
+  
+  def _admin_user_exists(self) -> bool:
+    """
+    Check if any admin user exists in the system.
+
+    Returns:
+        bool: True if at least one admin user exists, False otherwise
+    """
+    try:
+      # Get all users and check if any have admin role
+      users = self.user_service.list_users(limit=1000)  # Reasonable limit for admin check
+      
+      for user in users:
+        if user.role == UserRole.ADMIN and user.is_active:
+          return True
+      
+      return False
+    
+    except Exception as e:
+      logger.error(f"Error checking for admin users: {str(e)}")
+      # In case of error, assume admin exists to avoid creating duplicates
+      return True
+  
+  def initialize_application(self) -> dict:
+    """
+    Perform all application initialization tasks.
+
+    This method runs all necessary initialization procedures including
+    admin user creation and any other startup requirements.
+
+    Returns:
+        dict: Summary of initialization tasks performed
+    """
+    logger.info("Starting application initialization...")
+    
+    initialization_summary = {
+        "admin_user_created": False,
+        "initialization_success": False,
+        "errors": []
+    }
+    
+    try:
+      # Ensure admin user exists
+      created_admin = self.ensure_admin_user_exists()
+      if created_admin:
+        initialization_summary["admin_user_created"] = True
+      
+      initialization_summary["initialization_success"] = True
+      logger.info("Application initialization completed successfully")
+    
+    except Exception as e:
+      error_msg = f"Application initialization failed: {str(e)}"
+      logger.error(error_msg)
+      initialization_summary["errors"].append(error_msg)
+    
+    self.log_initialization_result(initialization_summary)
+    
+    return initialization_summary
+  
+  @staticmethod
+  def log_initialization_result(summary: dict) -> None:
+    """
+    Log the result of the initialization process.
+
+    Args:
+        summary (dict): Summary of initialization tasks performed
+    """
+    if summary["initialization_success"]:
+      logger.info("Application startup completed successfully")
+      if summary["admin_user_created"]:
+        logger.info("Default admin user was created during startup")
+    else:
+      logger.error("Application startup completed with errors:")
+      for error in summary["errors"]:
+        logger.error(f"  - {error}")
--- a/src/file-processor/app/services/job_service.py
+++ b/src/file-processor/app/services/job_service.py
@@ -0,0 +1,182 @@
+"""
+Service layer for job processing business logic.
+
+This module provides high-level operations for managing processing jobs
+with strict status transition validation and business rules enforcement.
+"""
+
+from typing import Optional
+
+from app.database.repositories.job_repository import JobRepository
+from app.exceptions.job_exceptions import InvalidStatusTransitionError
+from app.models.job import ProcessingJob, ProcessingStatus
+from app.models.types import PyObjectId
+
+
+class JobService:
+  """
+  Service for processing job business logic operations.
+
+  Provides high-level job management with strict status transition
+  validation and business rule enforcement.
+  """
+  
+  def __init__(self, database):
+    """
+    Initialize service with job repository.
+
+    Args:
+        repository: Optional JobRepository instance (creates default if None)
+    """
+    self.db = database
+    self.repository = JobRepository(database)
+  
+  def initialize(self):
+    self.repository.initialize()
+    return self
+  
+  def create_job(self, document_id: PyObjectId, task_id: Optional[str] = None) -> ProcessingJob:
+    """
+    Create a new processing job.
+
+    Args:
+        document_id: Reference to the file document
+        task_id: Optional Celery task UUID
+
+    Returns:
+        The created ProcessingJob
+
+    Raises:
+        JobRepositoryError: If database operation fails
+    """
+    return self.repository.create_job(document_id, task_id)
+  
+  def get_job_by_id(self, job_id: PyObjectId) -> ProcessingJob:
+    """
+    Retrieve a job by its ID.
+
+    Args:
+        job_id: The job ObjectId
+
+    Returns:
+        The ProcessingJob document
+
+    Raises:
+        JobNotFoundError: If job doesn't exist
+        JobRepositoryError: If database operation fails
+    """
+    return self.repository.find_job_by_id(job_id)
+  
+  def mark_job_as_started(self, job_id: PyObjectId) -> ProcessingJob:
+    """
+    Mark a job as started (PENDING → PROCESSING).
+
+    Args:
+        job_id: The job ObjectId
+
+    Returns:
+        The updated ProcessingJob
+
+    Raises:
+        JobNotFoundError: If job doesn't exist
+        InvalidStatusTransitionError: If job is not in PENDING status
+        JobRepositoryError: If database operation fails
+    """
+    # Get current job to validate transition
+    current_job = self.repository.find_job_by_id(job_id)
+    
+    # Validate status transition
+    if current_job.status != ProcessingStatus.PENDING:
+      raise InvalidStatusTransitionError(current_job.status, ProcessingStatus.PROCESSING)
+    
+    # Update status
+    return self.repository.update_job_status(job_id, ProcessingStatus.PROCESSING)
+  
+  def mark_job_as_completed(self, job_id: PyObjectId) -> ProcessingJob:
+    """
+    Mark a job as completed (PROCESSING → COMPLETED).
+
+    Args:
+        job_id: The job ObjectId
+
+    Returns:
+        The updated ProcessingJob
+
+    Raises:
+        JobNotFoundError: If job doesn't exist
+        InvalidStatusTransitionError: If job is not in PROCESSING status
+        JobRepositoryError: If database operation fails
+    """
+    # Get current job to validate transition
+    current_job = self.repository.find_job_by_id(job_id)
+    
+    # Validate status transition
+    if current_job.status != ProcessingStatus.PROCESSING:
+      raise InvalidStatusTransitionError(current_job.status, ProcessingStatus.COMPLETED)
+    
+    # Update status
+    return self.repository.update_job_status(job_id, ProcessingStatus.COMPLETED)
+  
+  def mark_job_as_failed(
+      self,
+      job_id: PyObjectId,
+      error_message: Optional[str] = None
+  ) -> ProcessingJob:
+    """
+    Mark a job as failed (PROCESSING → FAILED).
+
+    Args:
+        job_id: The job ObjectId
+        error_message: Optional error description
+
+    Returns:
+        The updated ProcessingJob
+
+    Raises:
+        JobNotFoundError: If job doesn't exist
+        InvalidStatusTransitionError: If job is not in PROCESSING status
+        JobRepositoryError: If database operation fails
+    """
+    # Get current job to validate transition
+    current_job = self.repository.find_job_by_id(job_id)
+    
+    # Validate status transition
+    if current_job.status != ProcessingStatus.PROCESSING:
+      raise InvalidStatusTransitionError(current_job.status, ProcessingStatus.FAILED)
+    
+    # Update status with error message
+    return self.repository.update_job_status(
+      job_id,
+      ProcessingStatus.FAILED,
+      error_message
+    )
+  
+  def delete_job(self, job_id: PyObjectId) -> bool:
+    """
+    Delete a job from the database.
+
+    Args:
+        job_id: The job ObjectId
+
+    Returns:
+        True if job was deleted, False if not found
+
+    Raises:
+        JobRepositoryError: If database operation fails
+    """
+    return self.repository.delete_job(job_id)
+  
+  def get_jobs_by_status(self, status: ProcessingStatus) -> list[ProcessingJob]:
+    """
+    Retrieve all jobs with a specific status.
+
+    Args:
+        status: The processing status to filter by
+
+    Returns:
+        List of ProcessingJob documents
+
+    Raises:
+        JobRepositoryError: If database operation fails
+    """
+    return self.repository.get_jobs_by_status(status)
--- a/src/file-processor/app/services/user_service.py
+++ b/src/file-processor/app/services/user_service.py
@@ -0,0 +1,186 @@
+"""
+User service for business logic operations.
+
+This module provides user-related business logic including user creation,
+retrieval, updates, and authentication operations with proper error handling.
+"""
+
+from typing import Optional, List
+
+from pymongo.errors import DuplicateKeyError
+
+from app.database.repositories.user_repository import UserRepository
+from app.models.user import UserCreate, UserInDB, UserUpdate, UserCreateNoValidation
+from app.services.auth_service import AuthService
+
+
+class UserService:
+  """
+  Service class for user business logic operations.
+
+  This class handles user-related operations including creation,
+  authentication, and data management with proper validation.
+  """
+  
+  def __init__(self, database):
+    """
+    Initialize user service with repository dependency.
+
+    Args:
+        user_repository (UserRepository): Repository for user data operations
+    """
+    self.db = database
+    self.user_repository = UserRepository(self.db)
+    self.auth_service = AuthService()
+  
+  def initialize(self):
+    self.user_repository.initialize()
+    return self
+  
+  def create_user(self, user_data: UserCreate | UserCreateNoValidation) -> UserInDB:
+    """
+    Create a new user with business logic validation.
+
+    Args:
+        user_data (UserCreate): User creation data
+
+    Returns:
+        UserInDB: Created user with database information
+
+    Raises:
+        ValueError: If user already exists or validation fails
+    """
+    # Check if user already exists
+    if self.user_repository.user_exists(user_data.username):
+      raise ValueError(f"User with username '{user_data.username}' already exists")
+    
+    # Check if email already exists
+    existing_user = self.user_repository.find_user_by_email(user_data.email)
+    if existing_user:
+      raise ValueError(f"User with email '{user_data.email}' already exists")
+    
+    try:
+      return self.user_repository.create_user(user_data)
+    except DuplicateKeyError:
+      raise ValueError(f"User with username '{user_data.username}' already exists")
+  
+  def get_user_by_username(self, username: str) -> Optional[UserInDB]:
+    """
+    Retrieve user by username.
+
+    Args:
+        username (str): Username to search for
+
+    Returns:
+        UserInDB or None: User if found, None otherwise
+    """
+    return self.user_repository.find_user_by_username(username)
+  
+  def get_user_by_id(self, user_id: str) -> Optional[UserInDB]:
+    """
+    Retrieve user by ID.
+
+    Args:
+        user_id (str): User ID to search for
+
+    Returns:
+        UserInDB or None: User if found, None otherwise
+    """
+    return self.user_repository.find_user_by_id(user_id)
+  
+  def authenticate_user(self, username: str, password: str) -> Optional[UserInDB]:
+    """
+    Authenticate user with username and password.
+
+    Args:
+        username (str): Username for authentication
+        password (str): Password for authentication
+
+    Returns:
+        UserInDB or None: Authenticated user if valid, None otherwise
+    """
+    user = self.user_repository.find_user_by_username(username)
+    if not user:
+      return None
+    
+    if not user.is_active:
+      return None
+    
+    if not self.auth_service.verify_user_password(password, user.hashed_password):
+      return None
+    
+    return user
+  
+  def update_user(self, user_id: str, user_update: UserUpdate) -> Optional[UserInDB]:
+    """
+    Update user information.
+
+    Args:
+        user_id (str): User ID to update
+        user_update (UserUpdate): Updated user data
+
+    Returns:
+        UserInDB or None: Updated user if successful, None otherwise
+
+    Raises:
+        ValueError: If username or email already exists for different user
+    """
+    # Validate username uniqueness if being updated
+    if user_update.username is not None:
+      existing_user = self.user_repository.find_user_by_username(user_update.username)
+      if existing_user and str(existing_user.id) != user_id:
+        raise ValueError(f"Username '{user_update.username}' is already taken")
+    
+    # Validate email uniqueness if being updated
+    if user_update.email is not None:
+      existing_user = self.user_repository.find_user_by_email(user_update.email)
+      if existing_user and str(existing_user.id) != user_id:
+        raise ValueError(f"Email '{user_update.email}' is already taken")
+    
+    return self.user_repository.update_user(user_id, user_update)
+  
+  def delete_user(self, user_id: str) -> bool:
+    """
+    Delete user from system.
+
+    Args:
+        user_id (str): User ID to delete
+
+    Returns:
+        bool: True if user was deleted, False otherwise
+    """
+    return self.user_repository.delete_user(user_id)
+  
+  def list_users(self, skip: int = 0, limit: int = 100) -> List[UserInDB]:
+    """
+    List users with pagination.
+
+    Args:
+        skip (int): Number of users to skip (default: 0)
+        limit (int): Maximum number of users to return (default: 100)
+
+    Returns:
+        List[UserInDB]: List of users
+    """
+    return self.user_repository.list_users(skip=skip, limit=limit)
+  
+  def count_users(self) -> int:
+    """
+    Count total number of users.
+
+    Returns:
+        int: Total number of users in system
+    """
+    return self.user_repository.count_users()
+  
+  def user_exists(self, username: str) -> bool:
+    """
+    Check if user exists by username.
+
+    Args:
+        username (str): Username to check
+
+    Returns:
+        bool: True if user exists, False otherwise
+    """
+    return self.user_repository.user_exists(username)
--- a/src/file-processor/app/utils/document_matching.py
+++ b/src/file-processor/app/utils/document_matching.py
@@ -0,0 +1,60 @@
+from difflib import SequenceMatcher
+
+from app.models.document import FileDocument
+
+
+def _is_subsequence(query: str, target: str) -> tuple[bool, float]:
+  """
+  Check if query is a subsequence of target (case-insensitive).
+  Returns (match, score).
+  Score is higher when the query letters are closer together in the target.
+  """
+  query = query.lower()
+  target = target.lower()
+  
+  positions = []
+  idx = 0
+  
+  for char in query:
+    idx = target.find(char, idx)
+    if idx == -1:
+      return False, 0.0
+    positions.append(idx)
+    idx += 1
+  
+  # Smallest window containing all matched chars
+  window_size = positions[-1] - positions[0] + 1
+  
+  # Score: ratio of query length vs window size (compactness)
+  score = len(query) / window_size
+  
+  return True, score
+
+def fuzzy_matching(filename: str, documents: list[FileDocument], similarity_threshold: float = 0.7):
+  matches = []
+  for file_doc in documents:
+    # Calculate similarity between search term and filename
+    similarity = SequenceMatcher(None, filename.lower(), file_doc.filename.lower()).ratio()
+    
+    if similarity >= similarity_threshold:
+      matches.append((file_doc, similarity))
+  
+  # Sort by similarity score (highest first)
+  matches.sort(key=lambda x: x[1], reverse=True)
+  
+  # Return only the FileDocument objects
+  return [match[0] for match in matches]
+  
+
+def subsequence_matching(query: str, documents: list[FileDocument]):
+  matches = []
+  for file_doc in documents:
+    matched, score = _is_subsequence(query, file_doc.filename)
+    if matched:
+      matches.append((file_doc, score))
+  
+  # Sort by score (highest first)
+  matches.sort(key=lambda x: x[1], reverse=True)
+  
+  # Return only the FileDocument objects
+  return [match[0] for match in matches]
--- a/src/file-processor/requirements.txt
+++ b/src/file-processor/requirements.txt
@@ -1,6 +1,14 @@
-fastapi==0.116.1
-uvicorn==0.35.0
+asgiref==3.9.1
+bcrypt==4.3.0
 celery==5.5.3
-redis==6.4.0
+email-validator==2.3.0
+fastapi==0.116.1
+httptools==0.6.4
+motor==3.7.1
+pydantic==2.11.9
+PyJWT==2.10.1
 pymongo==4.15.0
-pydantic==2.11.9
+redis==6.4.0
+uvicorn==0.35.0
+python-magic==0.4.27
+watchdog==6.0.0
--- a/src/frontend/.dockerignore
+++ b/src/frontend/.dockerignore
@@ -0,0 +1,41 @@
+# Dependencies
+node_modules
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+
+# Build outputs
+dist
+build
+
+# Environment files
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+# IDE files
+.vscode
+.idea
+*.swp
+*.swo
+
+# OS generated files
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+
+# Git
+.git
+.gitignore
+
+# Docker
+Dockerfile
+.dockerignore
+
+# Logs
+*.log
--- a/src/frontend/Dockerfile
+++ b/src/frontend/Dockerfile
@@ -0,0 +1,20 @@
+# Use Node.js 20 Alpine for lightweight container
+FROM node:20-alpine
+
+# Set working directory
+WORKDIR /app
+
+# Copy package.json and package-lock.json (if available)
+COPY package*.json ./
+
+# Install dependencies
+RUN npm install
+
+# Copy source code
+COPY . .
+
+# Expose Vite default port
+EXPOSE 5173
+
+# Start development server with host 0.0.0.0 to accept external connections
+CMD ["npm", "run", "dev", "--", "--host", "0.0.0.0", "--port", "5173"]
--- a/src/worker/Dockerfile
+++ b/src/worker/Dockerfile
@@ -3,12 +3,18 @@ FROM python:3.12-slim
 # Set working directory
 WORKDIR /app

+# Install libmagic
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libmagic1 \
+    file \
+ && rm -rf /var/lib/apt/lists/*
+
 # Copy requirements and install dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt

 # Copy application code
-COPY tasks/ .
+COPY . .

 # Command will be overridden by docker-compose
-CMD ["celery", "-A", "main", "worker", "--loglevel=info"]
+CMD ["celery", "-A", "main", "worker", "--loglevel=info"]
--- a/src/worker/requirements.txt
+++ b/src/worker/requirements.txt
@@ -1,4 +1,13 @@
-
+asgiref==3.9.1
+bcrypt==4.3.0
 celery==5.5.3
+email-validator==2.3.0
+fastapi==0.116.1
+httptools==0.6.4
+motor==3.7.1
+pymongo==4.15.0
+pydantic==2.11.9
 redis==6.4.0
-pymongo==4.15.0
+uvicorn==0.35.0
+python-magic==0.4.27
+watchdog==6.0.0
--- a/src/worker/tasks/document_processing.py
+++ b/src/worker/tasks/document_processing.py
@@ -0,0 +1,85 @@
+"""
+Celery tasks for document processing with ProcessingJob status management.
+
+This module contains Celery tasks that handle document content extraction
+and update processing job statuses throughout the task lifecycle.
+"""
+
+import logging
+from typing import Any, Dict
+
+from app.config import settings
+from app.database.connection import get_database
+from app.services.document_service import DocumentService
+from tasks.main import celery_app
+
+logger = logging.getLogger(__name__)
+
+@celery_app.task(bind=True, autoretry_for=(Exception,), retry_kwargs={'max_retries': 3, 'countdown': 60})
+def process_document(self, filepath: str) -> Dict[str, Any]:
+  """
+  Process a document file and extract its content.
+
+  This task:
+  1. Updates the processing job status to PROCESSING
+  2. Performs document content extraction
+  3. Updates job status to COMPLETED or FAILED based on result
+
+  Args:
+      self : Celery task instance
+      filepath: Full path to the document file to process
+
+  Returns:
+      Dictionary containing processing results
+
+  Raises:
+      Exception: Any processing error (will trigger retry)
+  """
+  task_id = self.request.id
+  logger.info(f"Starting document processing task {task_id} for file: {filepath}")
+  
+  database = get_database()
+  document_service = DocumentService(database=database, objects_folder=settings.get_objects_folder())
+  from app.services.job_service import JobService
+  job_service = JobService(database=database)
+  
+  job = None
+  try:
+    # Step 1: Insert the document in DB
+    document = document_service.create_document(filepath)
+    logger.info(f"Job {task_id} created for document {document.id} with file path: {filepath}")
+    
+    # Step 2: Create a new job record for the document
+    job = job_service.create_job(task_id=task_id, document_id=document.id)
+    
+    # Step 3: Mark job as started
+    job_service.mark_job_as_started(job_id=job.id)
+    logger.info(f"Job {task_id} marked as PROCESSING")
+    
+    # Step 4: Mark job as completed
+    job_service.mark_job_as_completed(job_id=job.id)
+    logger.info(f"Job {task_id} marked as COMPLETED")
+    
+    return {
+        "task_id": task_id,
+        "filepath": filepath,
+        "status": "completed",
+    }
+  
+  except Exception as e:
+    error_message = f"Document processing failed: {str(e)}"
+    logger.error(f"Task {task_id} failed: {error_message}")
+    
+    try:
+      # Mark job as failed
+      if job is not None:
+        job_service.mark_job_as_failed(job_id=job.id, error_message=error_message)
+        logger.info(f"Job {task_id} marked as FAILED")
+      else:
+        logger.error(f"Failed to process {filepath}. error = {str(e)}")
+    except Exception as job_error:
+      logger.error(f"Failed to update job status for task {task_id}: {str(job_error)}")
+    
+    # Re-raise the exception to trigger Celery retry mechanism
+    raise
+  
--- a/src/worker/tasks/main.py
+++ b/src/worker/tasks/main.py
@@ -3,9 +3,8 @@ Celery worker for MyDocManager document processing tasks.

 This module contains all Celery tasks for processing documents.
 """
-
 import os
-import time
+
 from celery import Celery

 # Environment variables
@@ -13,101 +12,25 @@ REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379/0")
 MONGODB_URL = os.getenv("MONGODB_URL", "mongodb://localhost:27017")

 # Initialize Celery app
-app = Celery(
+celery_app = Celery(
  "mydocmanager_worker",
  broker=REDIS_URL,
-  backend=REDIS_URL
+  backend=REDIS_URL,
 )

+celery_app.autodiscover_tasks(["tasks.document_processing"])
+
 # Celery configuration
-app.conf.update(
+celery_app.conf.update(
  task_serializer="json",
  accept_content=["json"],
  result_serializer="json",
  timezone="UTC",
  enable_utc=True,
  task_track_started=True,
-  task_time_limit=300,  # 5 minutes
-  task_soft_time_limit=240,  # 4 minutes
+  task_time_limit=300,        # 5 minutes
+  task_soft_time_limit=240,   # 4 minutes
 )

-
-@app.task(bind=True)
-def test_task(self, message: str):
-  """
-  Test task for validating worker functionality.
-
-  Args:
-      message: Test message to process
-
-  Returns:
-      dict: Task result with processing information
-  """
-  try:
-    print(f"[WORKER] Starting test task with message: {message}")
-    
-    # Simulate some work
-    for i in range(5):
-      print(f"[WORKER] Processing step {i + 1}/5...")
-      time.sleep(1)
-      
-      # Update task progress
-      self.update_state(
-        state="PROGRESS",
-        meta={
-            "current": i + 1,
-            "total": 5,
-            "message": f"Processing step {i + 1}"
-        }
-      )
-    
-    result = {
-        "status": "completed",
-        "message": f"Successfully processed: {message}",
-        "processed_at": time.time(),
-        "worker_id": self.request.id
-    }
-    
-    print(f"[WORKER] Test task completed successfully: {result}")
-    return result
-  
-  except Exception as exc:
-    print(f"[WORKER] Test task failed: {str(exc)}")
-    raise self.retry(exc=exc, countdown=60, max_retries=3)
-
-
-@app.task(bind=True)
-def process_document_task(self, file_path: str):
-  """
-  Placeholder task for document processing.
-
-  Args:
-      file_path: Path to the document to process
-
-  Returns:
-      dict: Processing result
-  """
-  try:
-    print(f"[WORKER] Starting document processing for: {file_path}")
-    
-    # Placeholder for document processing logic
-    time.sleep(2)  # Simulate processing time
-    
-    result = {
-        "status": "completed",
-        "file_path": file_path,
-        "processed_at": time.time(),
-        "content": f"Placeholder content for {file_path}",
-        "worker_id": self.request.id
-    }
-    
-    print(f"[WORKER] Document processing completed: {file_path}")
-    return result
-  
-  except Exception as exc:
-    print(f"[WORKER] Document processing failed for {file_path}: {str(exc)}")
-    raise self.retry(exc=exc, countdown=60, max_retries=3)
-
-
 if __name__ == "__main__":
-  app.start()
+  celery_app.start()
--- a/tests/api/init.py
+++ b/tests/api/init.py
--- a/tests/api/test_auth_routes.py
+++ b/tests/api/test_auth_routes.py
@@ -0,0 +1,149 @@
+from datetime import datetime
+from unittest.mock import MagicMock
+
+import pytest
+from fastapi import status, HTTPException
+from fastapi.testclient import TestClient
+from mongomock.mongo_client import MongoClient
+
+from app.api.dependencies import get_auth_service, get_user_service, get_current_user
+from app.main import app  # Assuming you have FastAPI app defined in app/main.py
+from app.models.auth import UserRole
+from app.models.types import PyObjectId
+from app.models.user import UserInDB
+from app.services.auth_service import AuthService
+from app.services.user_service import UserService
+
+
+@pytest.fixture
+def client():
+  return TestClient(app)
+
+
+@pytest.fixture
+def fake_user():
+  return UserInDB(
+    _id=PyObjectId(),
+    username="testuser",
+    email="test@example.com",
+    role=UserRole.USER,
+    is_active=True,
+    hashed_password="hashed-secret",
+    created_at=datetime(2025, 1, 1),
+    updated_at=datetime(2025, 1, 2),
+  )
+
+
+def override_auth_service():
+  mock = MagicMock(spec=AuthService)
+  mock.verify_user_password.return_value = True
+  mock.create_access_token.return_value = "fake-jwt-token"
+  return mock
+
+
+def override_user_service(fake_user):
+  mock = MagicMock(spec=UserService)
+  mock.get_user_by_username.return_value = fake_user
+  return mock
+
+
+def override_get_current_user(fake_user):
+  def _override():
+    return fake_user
+  
+  return _override
+
+
+def override_get_database():
+  def _override():
+    client = MongoClient()
+    db = client.test_database
+    return db
+  
+  return _override
+
+
+# ---------------------- TESTS FOR /auth/login ----------------------
+class TestLogin:
+  def test_i_can_login_with_valid_credentials(self, client, fake_user):
+    auth_service = override_auth_service()
+    user_service = override_user_service(fake_user)
+    
+    client.app.dependency_overrides[get_auth_service] = lambda: auth_service
+    client.app.dependency_overrides[get_user_service] = lambda: user_service
+    
+    response = client.post(
+      "/auth/login",
+      data={"username": "testuser", "password": "secret"},
+    )
+    
+    assert response.status_code == status.HTTP_200_OK
+    data = response.json()
+    assert "access_token" in data
+    assert data["user"]["username"] == "testuser"
+  
+  def test_i_cannot_login_with_invalid_username(self, client):
+    auth_service = override_auth_service()
+    user_service = MagicMock(spec=UserService)
+    user_service.get_user_by_username.return_value = None
+    
+    client.app.dependency_overrides[get_auth_service] = lambda: auth_service
+    client.app.dependency_overrides[get_user_service] = lambda: user_service
+    
+    response = client.post(
+      "/auth/login",
+      data={"username": "unknown", "password": "secret"},
+    )
+    
+    assert response.status_code == status.HTTP_401_UNAUTHORIZED
+  
+  def test_i_cannot_login_with_inactive_user(self, client, fake_user):
+    fake_user.is_active = False
+    auth_service = override_auth_service()
+    user_service = override_user_service(fake_user)
+    client.app.dependency_overrides[get_auth_service] = lambda: auth_service
+    client.app.dependency_overrides[get_user_service] = lambda: user_service
+    
+    response = client.post(
+      "/auth/login",
+      data={"username": "testuser", "password": "secret"},
+    )
+    
+    assert response.status_code == status.HTTP_401_UNAUTHORIZED
+  
+  def test_i_cannot_login_with_wrong_password(self, client, fake_user):
+    auth_service = override_auth_service()
+    auth_service.verify_user_password.return_value = False
+    user_service = override_user_service(fake_user)
+    client.app.dependency_overrides[get_auth_service] = lambda: auth_service
+    client.app.dependency_overrides[get_user_service] = lambda: user_service
+    
+    response = client.post(
+      "/auth/login",
+      data={"username": "testuser", "password": "wrong"},
+    )
+    
+    assert response.status_code == status.HTTP_401_UNAUTHORIZED
+
+
+# ---------------------- TESTS FOR /auth/me ----------------------
+class TesteMe:
+  def test_i_can_get_current_user_profile(self, client, fake_user):
+    client.app.dependency_overrides[get_current_user] = override_get_current_user(fake_user)
+    
+    response = client.get("/auth/me")
+    
+    assert response.status_code == status.HTTP_200_OK
+    data = response.json()
+    assert data["username"] == fake_user.username
+    assert data["email"] == fake_user.email
+  
+  def test_i_cannot_get_profile_without_authentication(self, client, monkeypatch):
+    def raise_http_exception():
+      raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED)
+    
+    client.app.dependency_overrides[get_current_user] = raise_http_exception
+    
+    response = client.get("/auth/me")
+    
+    assert response.status_code == status.HTTP_401_UNAUTHORIZED
--- a/tests/api/test_users.py
+++ b/tests/api/test_users.py
@@ -0,0 +1,167 @@
+# File: tests/api/test_users.py
+from datetime import datetime
+from unittest.mock import MagicMock
+
+import pytest
+from fastapi import status
+from fastapi.testclient import TestClient
+
+from app.api.dependencies import get_admin_user, get_user_service
+from app.main import app
+from app.models.auth import UserRole
+from app.models.types import PyObjectId
+from app.models.user import UserInDB, UserCreate
+from app.services.user_service import UserService
+
+
+# -----------------------
+# Fixtures
+# -----------------------
+
+@pytest.fixture
+def fake_user_admin():
+  return UserInDB(
+    _id=PyObjectId(),
+    username="admin",
+    email="admin@example.com",
+    role=UserRole.ADMIN,
+    is_active=True,
+    hashed_password="hashed-secret",
+    created_at=datetime(2025, 1, 1),
+    updated_at=datetime(2025, 1, 2),
+  )
+
+
+@pytest.fixture
+def fake_user_response():
+  return UserInDB(
+    _id=PyObjectId(),
+    username="other",
+    email="other@example.com",
+    role=UserRole.USER,
+    is_active=True,
+    hashed_password="hashed-secret-2",
+    created_at=datetime(2025, 1, 1),
+    updated_at=datetime(2025, 1, 2),
+  )
+
+
+@pytest.fixture
+def client(fake_user_admin):
+  # Fake admin dependency
+  def get_admin_user_override():
+    return fake_user_admin
+  
+  # Fake user service
+  user_service_mock = MagicMock(spec=UserService)
+  
+  def get_user_service_override():
+    return user_service_mock
+  
+  client = TestClient(app)
+  client.app.dependency_overrides = {
+      get_admin_user: get_admin_user_override,
+      get_user_service: get_user_service_override
+  }
+  
+  client.user_service_mock = user_service_mock
+  return client
+
+
+# -----------------------
+# Tests
+# -----------------------
+
+class TestListUsers:
+  
+  def test_i_can_list_users(self, client, fake_user_admin, fake_user_response):
+    client.user_service_mock.list_users.return_value = [fake_user_admin, fake_user_response]
+    response = client.get("/users")
+    assert response.status_code == status.HTTP_200_OK
+    data = response.json()
+    assert len(data) == 2
+    assert data[0]["username"] == "admin"
+  
+  def test_i_can_list_users_when_empty(self, client):
+    client.user_service_mock.list_users.return_value = []
+    response = client.get("/users")
+    assert response.status_code == status.HTTP_200_OK
+    assert response.json() == []
+
+
+class TestGetUserById:
+  
+  def test_i_can_get_user_by_id(self, client, fake_user_response):
+    client.user_service_mock.get_user_by_id.return_value = fake_user_response
+    response = client.get(f"/users/{fake_user_response.id}")
+    assert response.status_code == status.HTTP_200_OK
+    data = response.json()
+    assert data["username"] == fake_user_response.username
+  
+  def test_i_cannot_get_user_by_id_not_found(self, client):
+    client.user_service_mock.get_user_by_id.return_value = None
+    response = client.get("/users/64f0c9f4b0d1c8b7b8e1f0a2")
+    assert response.status_code == status.HTTP_404_NOT_FOUND
+    assert response.json()["detail"] == "User not found"
+
+
+class TestCreateUser:
+  
+  def test_i_can_create_user(self, client, fake_user_response):
+    user_data = UserCreate(username="newuser",
+                           email="new@example.com",
+                           password="#Passw0rd!",
+                           role=UserRole.USER)
+    
+    client.user_service_mock.create_user.return_value = fake_user_response
+    response = client.post("/users", json=user_data.model_dump(mode="json"))
+    assert response.status_code == status.HTTP_201_CREATED
+    data = response.json()
+    assert data["username"] == fake_user_response.username
+  
+  def test_i_cannot_create_user_when_service_raises_value_error(self, client):
+    user_data = {"username": "baduser", "email": "bad@example.com", "role": "user", "password": "password"}
+    client.user_service_mock.create_user.side_effect = ValueError("Invalid data")
+    response = client.post("/users", json=user_data)
+    assert response.status_code == status.HTTP_422_UNPROCESSABLE_ENTITY
+
+
+class TestUpdateUser:
+  
+  def test_i_can_update_user(self, client, fake_user_response):
+    user_data = {"username": "updateduser", "email": "updated@example.com"}
+    client.user_service_mock.update_user.return_value = fake_user_response
+    response = client.put(f"/users/{fake_user_response.id}", json=user_data)
+    assert response.status_code == status.HTTP_200_OK
+    data = response.json()
+    assert data["username"] == fake_user_response.username
+  
+  def test_i_cannot_update_user_not_found(self, client):
+    client.user_service_mock.update_user.return_value = None
+    user_data = {"username": "updateduser"}
+    response = client.put("/users/64f0c9f4b0d1c8b7b8e1f0a2", json=user_data)
+    assert response.status_code == status.HTTP_404_NOT_FOUND
+    assert response.json()["detail"] == "User not found"
+  
+  def test_i_cannot_update_user_when_service_raises_value_error(self, client):
+    client.user_service_mock.update_user.side_effect = ValueError("Invalid update")
+    user_data = {"username": "badupdate"}
+    response = client.put("/users/64f0c9f4b0d1c8b7b8e1f0a2", json=user_data)
+    assert response.status_code == status.HTTP_400_BAD_REQUEST
+    assert response.json()["detail"] == "Invalid update"
+
+
+class TestDeleteUser:
+  
+  def test_i_can_delete_user(self, client):
+    client.user_service_mock.delete_user.return_value = True
+    response = client.delete("/users/64f0c9f4b0d1c8b7b8e1f0a1")
+    assert response.status_code == status.HTTP_200_OK
+    data = response.json()
+    assert data["message"] == "User successfully deleted"
+  
+  def test_i_cannot_delete_user_not_found(self, client):
+    client.user_service_mock.delete_user.return_value = False
+    response = client.delete("/users/64f0c9f4b0d1c8b7b8e1f0a2")
+    assert response.status_code == status.HTTP_404_NOT_FOUND
+    assert response.json()["detail"] == "User not found"
--- a/tests/database/init.py
+++ b/tests/database/init.py
--- a/tests/models/init.py
+++ b/tests/models/init.py
--- a/tests/models/test_user_models.py
+++ b/tests/models/test_user_models.py
@@ -10,8 +10,8 @@ from pydantic import ValidationError
 from datetime import datetime
 from bson import ObjectId

-from app.models.user import UserCreate, UserUpdate, UserInDB, UserResponse
-from app.models.auth import UserRole
+from app.models.user import UserCreate, UserUpdate, UserInDB
+from app.models.auth import UserRole, UserResponse


 class TestUserCreateModel:
@@ -262,14 +262,14 @@ class TestUserInDBModel:
  def test_i_can_create_user_in_db_model(self):
    """Test creation of valid UserInDB model with all fields."""
    user_id = ObjectId()
-    created_at = datetime.utcnow()
-    updated_at = datetime.utcnow()
+    created_at = datetime.now()
+    updated_at = datetime.now()
    
    user_data = {
        "id": user_id,
        "username": "testuser",
        "email": "test@example.com",
-        "password_hash": "$2b$12$hashedpassword",
+        "hashed_password": "$2b$12$hashedpassword",
        "role": UserRole.USER,
        "is_active": True,
        "created_at": created_at,
@@ -281,28 +281,11 @@ class TestUserInDBModel:
    assert user.id == user_id
    assert user.username == "testuser"
    assert user.email == "test@example.com"
-    assert user.password_hash == "$2b$12$hashedpassword"
+    assert user.hashed_password == "$2b$12$hashedpassword"
    assert user.role == UserRole.USER
    assert user.is_active is True
    assert user.created_at == created_at
    assert user.updated_at == updated_at
-  
-  def test_i_can_create_inactive_user(self):
-    """Test creation of inactive user."""
-    user_data = {
-        "id": ObjectId(),
-        "username": "testuser",
-        "email": "test@example.com",
-        "password_hash": "$2b$12$hashedpassword",
-        "role": UserRole.USER,
-        "is_active": False,
-        "created_at": datetime.utcnow(),
-        "updated_at": datetime.utcnow()
-    }
-    
-    user = UserInDB(**user_data)
-    
-    assert user.is_active is False


 class TestUserResponseModel:
@@ -311,8 +294,8 @@ class TestUserResponseModel:
  def test_i_can_create_user_response_model(self):
    """Test creation of valid UserResponse model without password."""
    user_id = ObjectId()
-    created_at = datetime.utcnow()
-    updated_at = datetime.utcnow()
+    created_at = datetime.now()
+    updated_at = datetime.now()
    
    user_data = {
        "id": user_id,
@@ -350,14 +333,14 @@ class TestUserResponseModel:
  def test_i_can_convert_user_in_db_to_response(self):
    """Test conversion from UserInDB to UserResponse model."""
    user_id = ObjectId()
-    created_at = datetime.utcnow()
-    updated_at = datetime.utcnow()
+    created_at = datetime.now()
+    updated_at = datetime.now()
    
    user_in_db = UserInDB(
      id=user_id,
      username="testuser",
      email="test@example.com",
-      password_hash="$2b$12$hashedpassword",
+      hashed_password="$2b$12$hashedpassword",
      role=UserRole.USER,
      is_active=True,
      created_at=created_at,
@@ -366,7 +349,7 @@ class TestUserResponseModel:
    
    # Convert to response model (excluding password_hash)
    user_response = UserResponse(
-      id=user_in_db.id,
+      _id=user_in_db.id,
      username=user_in_db.username,
      email=user_in_db.email,
      role=user_in_db.role,
--- a/tests/repositories/init.py
+++ b/tests/repositories/init.py
--- a/tests/repositories/test_document_repository.py
+++ b/tests/repositories/test_document_repository.py
@@ -0,0 +1,611 @@
+"""
+Test suite for FileDocumentRepository with async/support.
+
+This module contains comprehensive tests for all FileDocumentRepository methods
+using mongomock-motor for in-memory MongoDB testing.
+"""
+
+from datetime import datetime
+
+import pytest
+from bson import ObjectId
+from mongomock.mongo_client import MongoClient
+from pymongo.errors import PyMongoError
+
+from app.database.repositories.document_repository import (
+  FileDocumentRepository,
+  MatchMethodBase,
+  SubsequenceMatching,
+  FuzzyMatching
+)
+from app.models.document import FileDocument, FileType, ExtractionMethod
+
+
+@pytest.fixture
+def in_memory_repository():
+  """Create an in-memory FileDocumentRepository for testing."""
+  client = MongoClient()
+  db = client.test_database
+  repo = FileDocumentRepository(db)
+  repo.initialize()
+  return repo
+
+
+@pytest.fixture
+def sample_file_document():
+  """Sample FileDocument data for testing."""
+  return FileDocument(
+    filename="sample_document.pdf",
+    filepath="/home/user/documents/sample_document.pdf",
+    file_type=FileType.PDF,
+    extraction_method=ExtractionMethod.OCR,
+    metadata={"pages": 5, "language": "en", "author": "John Doe"},
+    detected_at=datetime.now(),
+    file_hash="a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456",
+    encoding="utf-8",
+    file_size=1024000,
+    mime_type="application/pdf"
+  )
+
+
+@pytest.fixture
+def sample_update_data():
+  """Sample update data for testing."""
+  return {
+      "extraction_method": ExtractionMethod.HYBRID,
+      "metadata": {"pages": 10, "language": "fr", "updated": True},
+      "file_size": 2048000
+  }
+
+
+@pytest.fixture
+def multiple_sample_files():
+  """Multiple FileDocument objects for list/search testing."""
+  base_time = datetime.now()
+  return [
+      FileDocument(
+        filename="first_doc.txt",
+        filepath="/docs/first_doc.txt",
+        file_type=FileType.TXT,
+        extraction_method=ExtractionMethod.DIRECT_TEXT,
+        metadata={"words": 500},
+        detected_at=base_time,
+        file_hash="hash1" + "0" * 58,
+        encoding="utf-8",
+        file_size=5000,
+        mime_type="text/plain"
+      ),
+      FileDocument(
+        filename="second_document.pdf",
+        filepath="/docs/second_document.pdf",
+        file_type=FileType.PDF,
+        extraction_method=ExtractionMethod.OCR,
+        metadata={"pages": 8},
+        detected_at=base_time,
+        file_hash="hash2" + "0" * 58,
+        encoding="utf-8",
+        file_size=10000,
+        mime_type="application/pdf"
+      ),
+      FileDocument(
+        filename="third_file.docx",
+        filepath="/docs/third_file.docx",
+        file_type=FileType.DOCX,
+        extraction_method=ExtractionMethod.HYBRID,
+        metadata={"paragraphs": 15},
+        detected_at=base_time,
+        file_hash="hash3" + "0" * 58,
+        encoding="utf-8",
+        file_size=15000,
+        mime_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+      )
+  ]
+
+
+class TestFileDocumentRepositoryInitialization:
+  """Tests for repository initialization."""
+  
+  def test_i_can_initialize_repository(self):
+    """Test repository initialization."""
+    # Arrange
+    client = MongoClient()
+    db = client.test_database
+    repo = FileDocumentRepository(db)
+    repo.initialize()
+    
+    # Act & Assert (should not raise any exception)
+    assert repo.db is not None
+    assert repo.collection is not None
+    # TODO : check that the indexes are created
+
+
+class TestFileDocumentRepositoryCreation:
+  """Tests for file document creation functionality."""
+  
+  def test_i_can_create_file_document(self, in_memory_repository, sample_file_document):
+    """Test successful file document creation."""
+    # Act
+    created_file = in_memory_repository.create_document(sample_file_document)
+    
+    # Assert
+    assert created_file is not None
+    assert created_file.filename == sample_file_document.filename
+    assert created_file.filepath == sample_file_document.filepath
+    assert created_file.file_type == sample_file_document.file_type
+    assert created_file.extraction_method == sample_file_document.extraction_method
+    assert created_file.metadata == sample_file_document.metadata
+    assert created_file.file_hash == sample_file_document.file_hash
+    assert created_file.file_size == sample_file_document.file_size
+    assert created_file.mime_type == sample_file_document.mime_type
+    assert created_file.id is not None
+    assert isinstance(created_file.id, ObjectId)
+  
+  def test_i_can_create_file_document_without_id(self, in_memory_repository, sample_file_document):
+    """Test creating file document with _id set to None (should be removed)."""
+    # Arrange
+    sample_file_document.id = None
+    
+    # Act
+    created_file = in_memory_repository.create_document(sample_file_document)
+    
+    # Assert
+    assert created_file is not None
+    assert created_file.id is not None
+    assert isinstance(created_file.id, ObjectId)
+  
+  def test_i_cannot_create_file_document_with_pymongo_error(self, in_memory_repository,
+                                                            sample_file_document, mocker):
+    """Test handling of PyMongo errors during file document creation."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'insert_one', side_effect=PyMongoError("Database error"))
+    
+    # Act & Assert
+    with pytest.raises(ValueError) as exc_info:
+      in_memory_repository.create_document(sample_file_document)
+    
+    assert "Failed to create file document" in str(exc_info.value)
+
+
+class TestFileDocumentRepositoryFinding:
+  """Tests for file document finding functionality."""
+  
+  def test_i_can_find_document_by_valid_id(self, in_memory_repository, sample_file_document):
+    """Test finding file document by valid ObjectId."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    
+    # Act
+    found_file = in_memory_repository.find_document_by_id(str(created_file.id))
+    
+    # Assert
+    assert found_file is not None
+    assert found_file.id == created_file.id
+    assert found_file.filename == created_file.filename
+    assert found_file.filepath == created_file.filepath
+  
+  def test_i_cannot_find_document_with_invalid_id(self, in_memory_repository):
+    """Test that invalid ObjectId returns None."""
+    # Act
+    found_file = in_memory_repository.find_document_by_id("invalid_id")
+    
+    # Assert
+    assert found_file is None
+  
+  def test_i_cannot_find_document_by_nonexistent_id(self, in_memory_repository):
+    """Test that nonexistent but valid ObjectId returns None."""
+    # Arrange
+    nonexistent_id = str(ObjectId())
+    
+    # Act
+    found_file = in_memory_repository.find_document_by_id(nonexistent_id)
+    
+    # Assert
+    assert found_file is None
+  
+  def test_i_can_find_document_by_file_hash(self, in_memory_repository, sample_file_document):
+    """Test finding file document by file hash."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    
+    # Act
+    found_file = in_memory_repository.find_document_by_hash(sample_file_document.file_hash)
+    
+    # Assert
+    assert found_file is not None
+    assert found_file.file_hash == created_file.file_hash
+    assert found_file.id == created_file.id
+  
+  def test_i_cannot_find_document_with_nonexistent_file_hash(self, in_memory_repository):
+    """Test that nonexistent file hash returns None."""
+    # Act
+    found_file = in_memory_repository.find_document_by_hash("nonexistent_hash")
+    
+    # Assert
+    assert found_file is None
+  
+  def test_i_can_find_document_by_filepath(self, in_memory_repository, sample_file_document):
+    """Test finding file document by filepath."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    
+    # Act
+    found_file = in_memory_repository.find_document_by_filepath(sample_file_document.filepath)
+    
+    # Assert
+    assert found_file is not None
+    assert found_file.filepath == created_file.filepath
+    assert found_file.id == created_file.id
+  
+  def test_i_cannot_find_document_with_nonexistent_filepath(self, in_memory_repository):
+    """Test that nonexistent filepath returns None."""
+    # Act
+    found_file = in_memory_repository.find_document_by_filepath("/nonexistent/path/file.pdf")
+    
+    # Assert
+    assert found_file is None
+  
+  def test_i_cannot_find_document_with_pymongo_error(self, in_memory_repository, mocker):
+    """Test handling of PyMongo errors during file document finding."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'find_one', side_effect=PyMongoError("Database error"))
+    
+    # Act
+    found_file = in_memory_repository.find_document_by_hash("test_hash")
+    
+    # Assert
+    assert found_file is None
+
+
+class TestFileDocumentRepositoryNameMatching:
+  """Tests for file document name matching functionality."""
+  
+  def test_i_can_find_documents_by_name_with_fuzzy_matching(self, in_memory_repository, multiple_sample_files):
+    """Test finding file documents by filename using fuzzy matching."""
+    # Arrange
+    for file_doc in multiple_sample_files:
+      in_memory_repository.create_document(file_doc)
+    
+    # Act
+    fuzzy_method = FuzzyMatching(threshold=0.5)
+    found_files = in_memory_repository.find_document_by_name("document", fuzzy_method)
+    
+    # Assert
+    assert len(found_files) >= 1
+    assert all(isinstance(file_doc, FileDocument) for file_doc in found_files)
+    # Should find files with "document" in the name
+    found_filenames = [f.filename for f in found_files]
+    assert any("document" in fname.lower() for fname in found_filenames)
+  
+  def test_i_can_find_documents_by_name_with_subsequence_matching(self, in_memory_repository,
+                                                                  multiple_sample_files):
+    """Test finding file documents by filename using subsequence matching."""
+    # Arrange
+    for file_doc in multiple_sample_files:
+      in_memory_repository.create_document(file_doc)
+    
+    # Act
+    subsequence_method = SubsequenceMatching()
+    found_files = in_memory_repository.find_document_by_name("doc", subsequence_method)
+    
+    # Assert
+    assert len(found_files) >= 1
+    assert all(isinstance(file_doc, FileDocument) for file_doc in found_files)
+  
+  def test_i_can_find_documents_by_name_with_default_method(self, in_memory_repository, multiple_sample_files):
+    """Test finding file documents by filename with default matching method."""
+    # Arrange
+    for file_doc in multiple_sample_files:
+      in_memory_repository.create_document(file_doc)
+    
+    # Act
+    found_files = in_memory_repository.find_document_by_name("first")
+    
+    # Assert
+    assert len(found_files) >= 0
+    assert all(isinstance(file_doc, FileDocument) for file_doc in found_files)
+  
+  def test_i_cannot_find_documents_by_name_with_pymongo_error(self, in_memory_repository, mocker):
+    """Test handling of PyMongo errors during document name matching."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'find', side_effect=PyMongoError("Database error"))
+    
+    # Act
+    found_files = in_memory_repository.find_document_by_name("test")
+    
+    # Assert
+    assert found_files == []
+
+
+class TestFileDocumentRepositoryListing:
+  """Tests for file document listing functionality."""
+  
+  def test_i_can_list_documents_with_default_pagination(self, in_memory_repository, multiple_sample_files):
+    """Test listing file documents with default pagination."""
+    # Arrange
+    for file_doc in multiple_sample_files:
+      in_memory_repository.create_document(file_doc)
+    
+    # Act
+    files = in_memory_repository.list_documents()
+    
+    # Assert
+    assert len(files) == len(multiple_sample_files)
+    assert all(isinstance(file_doc, FileDocument) for file_doc in files)
+  
+  def test_i_can_list_documents_with_custom_pagination(self, in_memory_repository, multiple_sample_files):
+    """Test listing file documents with custom pagination."""
+    # Arrange
+    for file_doc in multiple_sample_files:
+      in_memory_repository.create_document(file_doc)
+    
+    # Act
+    files_page1 = in_memory_repository.list_documents(skip=0, limit=2)
+    files_page2 = in_memory_repository.list_documents(skip=2, limit=2)
+    
+    # Assert
+    assert len(files_page1) == 2
+    assert len(files_page2) == 1  # Only 3 total files
+    
+    # Ensure no overlap between pages
+    page1_ids = [file_doc.id for file_doc in files_page1]
+    page2_ids = [file_doc.id for file_doc in files_page2]
+    assert len(set(page1_ids).intersection(set(page2_ids))) == 0
+  
+  def test_i_can_list_documents_sorted_by_detected_at(self, in_memory_repository, sample_file_document):
+    """Test that file documents are sorted by detected_at in descending order."""
+    # Arrange
+    file1 = sample_file_document.model_copy()
+    file1.filepath = "/docs/file1.pdf"
+    file1.filename = "file1.pdf"
+    file1.file_hash = "hash1" + "0" * 58
+    file1.detected_at = datetime(2024, 1, 1, 10, 0, 0)
+    
+    file2 = sample_file_document.model_copy()
+    file2.filepath = "/docs/file2.pdf"
+    file2.filename = "file2.pdf"
+    file2.file_hash = "hash2" + "0" * 58
+    file2.detected_at = datetime(2024, 1, 2, 10, 0, 0)  # Later date
+    
+    created_file1 = in_memory_repository.create_document(file1)
+    created_file2 = in_memory_repository.create_document(file2)
+    
+    # Act
+    files = in_memory_repository.list_documents()
+    
+    # Assert
+    assert len(files) == 2
+    # Most recent (latest detected_at) should be first
+    assert files[0].id == created_file2.id
+    assert files[1].id == created_file1.id
+  
+  def test_i_can_list_empty_documents(self, in_memory_repository):
+    """Test listing file documents from empty collection."""
+    # Act
+    files = in_memory_repository.list_documents()
+    
+    # Assert
+    assert files == []
+  
+  def test_i_cannot_list_documents_with_pymongo_error(self, in_memory_repository, mocker):
+    """Test handling of PyMongo errors during file document listing."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'find', side_effect=PyMongoError("Database error"))
+    
+    # Act
+    files = in_memory_repository.list_documents()
+    
+    # Assert
+    assert files == []
+
+
+class TestFileDocumentRepositoryUpdate:
+  """Tests for file document update functionality."""
+  
+  def test_i_can_update_document_successfully(self, in_memory_repository, sample_file_document,
+                                              sample_update_data):
+    """Test successful file document update."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    
+    # Act
+    updated_file = in_memory_repository.update_document(str(created_file.id), sample_update_data)
+    
+    # Assert
+    assert updated_file is not None
+    assert updated_file.extraction_method == sample_update_data["extraction_method"]
+    assert updated_file.metadata == sample_update_data["metadata"]
+    assert updated_file.file_size == sample_update_data["file_size"]
+    assert updated_file.id == created_file.id
+    assert updated_file.filename == created_file.filename  # Unchanged fields remain
+    assert updated_file.filepath == created_file.filepath
+  
+  def test_i_can_update_document_with_partial_data(self, in_memory_repository, sample_file_document):
+    """Test updating file document with partial data."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    partial_update = {"file_size": 999999}
+    
+    # Act
+    updated_file = in_memory_repository.update_document(str(created_file.id), partial_update)
+    
+    # Assert
+    assert updated_file is not None
+    assert updated_file.file_size == 999999
+    assert updated_file.filename == created_file.filename  # Should remain unchanged
+    assert updated_file.metadata == created_file.metadata  # Should remain unchanged
+  
+  def test_i_can_update_document_filtering_none_values(self, in_memory_repository, sample_file_document):
+    """Test that None values are filtered out from update data."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    update_with_none = {"file_size": 777777, "metadata": None}
+    
+    # Act
+    updated_file = in_memory_repository.update_document(str(created_file.id), update_with_none)
+    
+    # Assert
+    assert updated_file is not None
+    assert updated_file.file_size == 777777
+    assert updated_file.metadata == created_file.metadata  # Should remain unchanged (None filtered out)
+  
+  def test_i_can_update_document_with_empty_data(self, in_memory_repository, sample_file_document):
+    """Test updating file document with empty data returns current document."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    empty_update = {}
+    
+    # Act
+    result = in_memory_repository.update_document(str(created_file.id), empty_update)
+    
+    # Assert
+    assert result is not None
+    assert result.filename == created_file.filename
+    assert result.filepath == created_file.filepath
+    assert result.metadata == created_file.metadata
+  
+  def test_i_cannot_update_document_with_invalid_id(self, in_memory_repository, sample_update_data):
+    """Test that updating with invalid ID returns None."""
+    # Act
+    result = in_memory_repository.update_document("invalid_id", sample_update_data)
+    
+    # Assert
+    assert result is None
+  
+  def test_i_cannot_update_nonexistent_document(self, in_memory_repository, sample_update_data):
+    """Test that updating nonexistent file document returns None."""
+    # Arrange
+    nonexistent_id = str(ObjectId())
+    
+    # Act
+    result = in_memory_repository.update_document(nonexistent_id, sample_update_data)
+    
+    # Assert
+    assert result is None
+  
+  def test_i_cannot_update_document_with_pymongo_error(self, in_memory_repository, sample_file_document,
+                                                       sample_update_data, mocker):
+    """Test handling of PyMongo errors during file document update."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    mocker.patch.object(in_memory_repository.collection, 'find_one_and_update',
+                        side_effect=PyMongoError("Database error"))
+    
+    # Act
+    result = in_memory_repository.update_document(str(created_file.id), sample_update_data)
+    
+    # Assert
+    assert result is None
+
+
+class TestFileDocumentRepositoryDeletion:
+  """Tests for file document deletion functionality."""
+  
+  def test_i_can_delete_existing_document(self, in_memory_repository, sample_file_document):
+    """Test successful file document deletion."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    
+    # Act
+    deletion_result = in_memory_repository.delete_document(str(created_file.id))
+    
+    # Assert
+    assert deletion_result is True
+    
+    # Verify document is actually deleted
+    found_file = in_memory_repository.find_document_by_id(str(created_file.id))
+    assert found_file is None
+  
+  def test_i_cannot_delete_document_with_invalid_id(self, in_memory_repository):
+    """Test that deleting with invalid ID returns False."""
+    # Act
+    result = in_memory_repository.delete_document("invalid_id")
+    
+    # Assert
+    assert result is False
+  
+  def test_i_cannot_delete_nonexistent_document(self, in_memory_repository):
+    """Test that deleting nonexistent file document returns False."""
+    # Arrange
+    nonexistent_id = str(ObjectId())
+    
+    # Act
+    result = in_memory_repository.delete_document(nonexistent_id)
+    
+    # Assert
+    assert result is False
+  
+  def test_i_cannot_delete_document_with_pymongo_error(self, in_memory_repository, sample_file_document, mocker):
+    """Test handling of PyMongo errors during file document deletion."""
+    # Arrange
+    created_file = in_memory_repository.create_document(sample_file_document)
+    mocker.patch.object(in_memory_repository.collection, 'delete_one', side_effect=PyMongoError("Database error"))
+    
+    # Act
+    result = in_memory_repository.delete_document(str(created_file.id))
+    
+    # Assert
+    assert result is False
+
+
+class TestFileDocumentRepositoryUtilities:
+  """Tests for utility methods."""
+  
+  def test_i_can_count_documents(self, in_memory_repository, sample_file_document):
+    """Test counting file documents."""
+    # Arrange
+    initial_count = in_memory_repository.count_documents()
+    in_memory_repository.create_document(sample_file_document)
+    
+    # Act
+    final_count = in_memory_repository.count_documents()
+    
+    # Assert
+    assert final_count == initial_count + 1
+  
+  def test_i_can_count_zero_documents(self, in_memory_repository):
+    """Test counting file documents in empty collection."""
+    # Act
+    count = in_memory_repository.count_documents()
+    
+    # Assert
+    assert count == 0
+  
+  def test_i_cannot_count_documents_with_pymongo_error(self, in_memory_repository, mocker):
+    """Test handling of PyMongo errors during file document counting."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'count_documents', side_effect=PyMongoError("Database error"))
+    
+    # Act
+    count = in_memory_repository.count_documents()
+    
+    # Assert
+    assert count == 0
+
+
+class TestMatchingMethods:
+  """Tests for matching method classes."""
+  
+  def test_i_can_create_fuzzy_matching_with_default_threshold(self):
+    """Test creating FuzzyMatching with default threshold."""
+    # Act
+    fuzzy = FuzzyMatching()
+    
+    # Assert
+    assert fuzzy.threshold == 0.6
+  
+  def test_i_can_create_fuzzy_matching_with_custom_threshold(self):
+    """Test creating FuzzyMatching with custom threshold."""
+    # Act
+    fuzzy = FuzzyMatching(threshold=0.8)
+    
+    # Assert
+    assert fuzzy.threshold == 0.8
+  
+  def test_i_can_create_subsequence_matching(self):
+    """Test creating SubsequenceMatching."""
+    # Act
+    subsequence = SubsequenceMatching()
+    
+    # Assert
+    assert isinstance(subsequence, MatchMethodBase)
+    assert isinstance(subsequence, SubsequenceMatching)
--- a/tests/repositories/test_job_repository.py
+++ b/tests/repositories/test_job_repository.py
@@ -0,0 +1,496 @@
+"""
+Test suite for JobRepository with async/support.
+
+This module contains comprehensive tests for all JobRepository methods
+using mongomock-motor for in-memory MongoDB testing.
+"""
+
+from datetime import datetime
+
+import pytest
+from bson import ObjectId
+from mongomock.mongo_client import MongoClient
+from mongomock_motor import AsyncMongoMockClient
+from pymongo.errors import PyMongoError
+
+from app.database.repositories.job_repository import JobRepository
+from app.exceptions.job_exceptions import JobRepositoryError
+from app.models.job import ProcessingJob, ProcessingStatus
+from app.models.types import PyObjectId
+
+
+@pytest.fixture
+def in_memory_repository():
+  """Create an in-memory JobRepository for testing."""
+  client = MongoClient()
+  db = client.test_database
+  repo = JobRepository(db)
+  repo.initialize()
+  return repo
+
+
+@pytest.fixture
+def sample_document_id():
+  """Sample document ObjectId for testing."""
+  return PyObjectId()
+
+
+@pytest.fixture
+def sample_task_id():
+  """Sample Celery task ID for testing."""
+  return "celery-task-12345-abcde"
+
+
+@pytest.fixture
+def multiple_sample_jobs():
+  """Multiple ProcessingJob objects for testing."""
+  doc_id_1 = ObjectId()
+  doc_id_2 = ObjectId()
+  base_time = datetime.utcnow()
+  
+  return [
+      ProcessingJob(
+        document_id=doc_id_1,
+        status=ProcessingStatus.PENDING,
+        task_id="task-1",
+        created_at=base_time,
+        started_at=None,
+        completed_at=None,
+        error_message=None
+      ),
+      ProcessingJob(
+        document_id=doc_id_2,
+        status=ProcessingStatus.PROCESSING,
+        task_id="task-2",
+        created_at=base_time,
+        started_at=base_time,
+        completed_at=None,
+        error_message=None
+      ),
+      ProcessingJob(
+        document_id=doc_id_1,
+        status=ProcessingStatus.COMPLETED,
+        task_id="task-3",
+        created_at=base_time,
+        started_at=base_time,
+        completed_at=base_time,
+        error_message=None
+      )
+  ]
+
+
+class TestJobRepositoryInitialization:
+  """Tests for repository initialization."""
+  
+  def test_i_can_initialize_repository(self):
+    """Test repository initialization."""
+    # Arrange
+    client = AsyncMongoMockClient()
+    db = client.test_database
+    repo = JobRepository(db)
+    
+    # Act
+    initialized_repo = repo.initialize()
+    
+    # Assert
+    assert initialized_repo is repo
+    assert repo.db is not None
+    assert repo.collection is not None
+
+
+class TestJobRepositoryCreation:
+  """Tests for job creation functionality."""
+  
+  def test_i_can_create_job_with_task_id(self, in_memory_repository, sample_document_id, sample_task_id):
+    """Test successful job creation with task ID."""
+    # Act
+    created_job = in_memory_repository.create_job(sample_document_id, sample_task_id)
+    
+    # Assert
+    assert created_job is not None
+    assert created_job.document_id == sample_document_id
+    assert created_job.task_id == sample_task_id
+    assert created_job.status == ProcessingStatus.PENDING
+    assert created_job.created_at is not None
+    assert created_job.started_at is None
+    assert created_job.completed_at is None
+    assert created_job.error_message is None
+    assert created_job.id is not None
+    assert isinstance(created_job.id, ObjectId)
+  
+  def test_i_can_create_job_without_task_id(self, in_memory_repository, sample_document_id):
+    """Test successful job creation without task ID."""
+    # Act
+    created_job = in_memory_repository.create_job(sample_document_id)
+    
+    # Assert
+    assert created_job is not None
+    assert created_job.document_id == sample_document_id
+    assert created_job.task_id is None
+    assert created_job.status == ProcessingStatus.PENDING
+    assert created_job.created_at is not None
+    assert created_job.started_at is None
+    assert created_job.completed_at is None
+    assert created_job.error_message is None
+    assert created_job.id is not None
+    assert isinstance(created_job.id, ObjectId)
+  
+  def test_i_cannot_create_duplicate_job_for_document(self, in_memory_repository, sample_document_id,
+                                                      sample_task_id):
+    """Test that creating job with duplicate document_id raises DuplicateKeyError."""
+    # Arrange
+    in_memory_repository.create_job(sample_document_id, sample_task_id)
+    
+    # Act & Assert
+    with pytest.raises(JobRepositoryError) as exc_info:
+      in_memory_repository.create_job(sample_document_id, "different-task-id")
+    
+    assert "create_job" in str(exc_info.value)
+  
+  def test_i_cannot_create_job_with_pymongo_error(self, in_memory_repository, sample_document_id, mocker):
+    """Test handling of PyMongo errors during job creation."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'insert_one', side_effect=PyMongoError("Database error"))
+    
+    # Act & Assert
+    with pytest.raises(JobRepositoryError) as exc_info:
+      in_memory_repository.create_job(sample_document_id)
+    
+    assert "create_job" in str(exc_info.value)
+
+
+class TestJobRepositoryFinding:
+  """Tests for job finding functionality."""
+  
+  def test_i_can_find_job_by_valid_id(self, in_memory_repository, sample_document_id, sample_task_id):
+    """Test finding job by valid ObjectId."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id, sample_task_id)
+    
+    # Act
+    found_job = in_memory_repository.find_job_by_id(created_job.id)
+    
+    # Assert
+    assert found_job is not None
+    assert found_job.id == created_job.id
+    assert found_job.document_id == created_job.document_id
+    assert found_job.task_id == created_job.task_id
+    assert found_job.status == created_job.status
+  
+  def test_i_cannot_find_job_by_nonexistent_id(self, in_memory_repository):
+    """Test that nonexistent ObjectId returns None."""
+    # Arrange
+    nonexistent_id = PyObjectId()
+    
+    # Act
+    found_job = in_memory_repository.find_job_by_id(nonexistent_id)
+    
+    # Assert
+    assert found_job is None
+  
+  def test_i_cannot_find_job_with_pymongo_error(self, in_memory_repository, mocker):
+    """Test handling of PyMongo errors during job finding."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'find_one', side_effect=PyMongoError("Database error"))
+    
+    # Act & Assert
+    with pytest.raises(JobRepositoryError) as exc_info:
+      in_memory_repository.find_job_by_id(PyObjectId())
+    
+    assert "get_job_by_id" in str(exc_info.value)
+  
+  def test_i_can_find_jobs_by_document_id(self, in_memory_repository, sample_document_id, sample_task_id):
+    """Test finding jobs by document ID."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id, sample_task_id)
+    
+    # Act
+    found_jobs = in_memory_repository.find_jobs_by_document_id(sample_document_id)
+    
+    # Assert
+    assert len(found_jobs) == 1
+    assert found_jobs[0].id == created_job.id
+    assert found_jobs[0].document_id == sample_document_id
+  
+  def test_i_can_find_empty_jobs_list_for_nonexistent_document(self, in_memory_repository):
+    """Test that nonexistent document ID returns empty list."""
+    # Arrange
+    nonexistent_id = ObjectId()
+    
+    # Act
+    found_jobs = in_memory_repository.find_jobs_by_document_id(nonexistent_id)
+    
+    # Assert
+    assert found_jobs == []
+  
+  def test_i_cannot_find_jobs_by_document_with_pymongo_error(self, in_memory_repository, mocker):
+    """Test handling of PyMongo errors during finding jobs by document ID."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'find', side_effect=PyMongoError("Database error"))
+    
+    # Act & Assert
+    with pytest.raises(JobRepositoryError) as exc_info:
+      in_memory_repository.find_jobs_by_document_id(PyObjectId())
+    
+    assert "get_jobs_by_file_id" in str(exc_info.value)
+  
+  @pytest.mark.parametrize("status", [
+      ProcessingStatus.PENDING,
+      ProcessingStatus.PROCESSING,
+      ProcessingStatus.COMPLETED
+  ])
+  def test_i_can_find_jobs_by_pending_status(self, in_memory_repository, sample_document_id, status):
+    """Test finding jobs by PENDING status."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    in_memory_repository.update_job_status(created_job.id, status)
+    
+    # Act
+    found_jobs = in_memory_repository.get_jobs_by_status(status)
+    
+    # Assert
+    assert len(found_jobs) == 1
+    assert found_jobs[0].id == created_job.id
+    assert found_jobs[0].status == status
+  
+  def test_i_can_find_jobs_by_failed_status(self, in_memory_repository, sample_document_id):
+    """Test finding jobs by FAILED status."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    in_memory_repository.update_job_status(created_job.id, ProcessingStatus.FAILED, "Test error")
+    
+    # Act
+    found_jobs = in_memory_repository.get_jobs_by_status(ProcessingStatus.FAILED)
+    
+    # Assert
+    assert len(found_jobs) == 1
+    assert found_jobs[0].id == created_job.id
+    assert found_jobs[0].status == ProcessingStatus.FAILED
+    assert found_jobs[0].error_message == "Test error"
+  
+  def test_i_can_find_empty_jobs_list_for_unused_status(self, in_memory_repository):
+    """Test that unused status returns empty list."""
+    # Act
+    found_jobs = in_memory_repository.get_jobs_by_status(ProcessingStatus.COMPLETED)
+    
+    # Assert
+    assert found_jobs == []
+  
+  def test_i_cannot_find_jobs_by_status_with_pymongo_error(self, in_memory_repository, mocker):
+    """Test handling of PyMongo errors during finding jobs by status."""
+    # Arrange
+    mocker.patch.object(in_memory_repository.collection, 'find', side_effect=PyMongoError("Database error"))
+    
+    # Act & Assert
+    with pytest.raises(JobRepositoryError) as exc_info:
+      in_memory_repository.get_jobs_by_status(ProcessingStatus.PENDING)
+    
+    assert "get_jobs_by_status" in str(exc_info.value)
+
+
+class TestJobRepositoryStatusUpdate:
+  """Tests for job status update functionality."""
+  
+  def test_i_can_update_job_status_to_processing(self, in_memory_repository, sample_document_id):
+    """Test updating job status to PROCESSING with started_at timestamp."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    
+    # Act
+    updated_job = in_memory_repository.update_job_status(created_job.id, ProcessingStatus.PROCESSING)
+    
+    # Assert
+    assert updated_job is not None
+    assert updated_job.id == created_job.id
+    assert updated_job.status == ProcessingStatus.PROCESSING
+    assert updated_job.started_at is not None
+    assert updated_job.completed_at is None
+    assert updated_job.error_message is None
+  
+  def test_i_can_update_job_status_to_completed(self, in_memory_repository, sample_document_id):
+    """Test updating job status to COMPLETED with completed_at timestamp."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    in_memory_repository.update_job_status(created_job.id, ProcessingStatus.PROCESSING)
+    
+    # Act
+    updated_job = in_memory_repository.update_job_status(created_job.id, ProcessingStatus.COMPLETED)
+    
+    # Assert
+    assert updated_job is not None
+    assert updated_job.id == created_job.id
+    assert updated_job.status == ProcessingStatus.COMPLETED
+    assert updated_job.started_at is not None
+    assert updated_job.completed_at is not None
+    assert updated_job.error_message is None
+  
+  def test_i_can_update_job_status_to_failed_with_error(self, in_memory_repository, sample_document_id):
+    """Test updating job status to FAILED with error message and completed_at timestamp."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    error_message = "Processing failed due to invalid format"
+    
+    # Act
+    updated_job = in_memory_repository.update_job_status(
+      created_job.id, ProcessingStatus.FAILED, error_message
+    )
+    
+    # Assert
+    assert updated_job is not None
+    assert updated_job.id == created_job.id
+    assert updated_job.status == ProcessingStatus.FAILED
+    assert updated_job.completed_at is not None
+    assert updated_job.error_message == error_message
+  
+  def test_i_can_update_job_status_to_failed_without_error(self, in_memory_repository, sample_document_id):
+    """Test updating job status to FAILED without error message."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    
+    # Act
+    updated_job = in_memory_repository.update_job_status(created_job.id, ProcessingStatus.FAILED)
+    
+    # Assert
+    assert updated_job is not None
+    assert updated_job.id == created_job.id
+    assert updated_job.status == ProcessingStatus.FAILED
+    assert updated_job.completed_at is not None
+    assert updated_job.error_message is None
+  
+  def test_i_cannot_update_nonexistent_job_status(self, in_memory_repository):
+    """Test that updating nonexistent job returns None."""
+    # Arrange
+    nonexistent_id = ObjectId()
+    
+    # Act
+    result = in_memory_repository.update_job_status(nonexistent_id, ProcessingStatus.COMPLETED)
+    
+    # Assert
+    assert result is None
+  
+  def test_i_cannot_update_job_status_with_pymongo_error(self, in_memory_repository, sample_document_id, mocker):
+    """Test handling of PyMongo errors during job status update."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    mocker.patch.object(in_memory_repository.collection, 'find_one_and_update',
+                        side_effect=PyMongoError("Database error"))
+    
+    # Act & Assert
+    with pytest.raises(JobRepositoryError) as exc_info:
+      in_memory_repository.update_job_status(created_job.id, ProcessingStatus.COMPLETED)
+    
+    assert "update_job_status" in str(exc_info.value)
+
+
+class TestJobRepositoryDeletion:
+  """Tests for job deletion functionality."""
+  
+  def test_i_can_delete_existing_job(self, in_memory_repository, sample_document_id):
+    """Test successful job deletion."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    
+    # Act
+    deletion_result = in_memory_repository.delete_job(created_job.id)
+    
+    # Assert
+    assert deletion_result is True
+    
+    # Verify job is actually deleted
+    found_job = in_memory_repository.find_job_by_id(created_job.id)
+    assert found_job is None
+  
+  def test_i_cannot_delete_nonexistent_job(self, in_memory_repository):
+    """Test that deleting nonexistent job returns False."""
+    # Arrange
+    nonexistent_id = ObjectId()
+    
+    # Act
+    result = in_memory_repository.delete_job(nonexistent_id)
+    
+    # Assert
+    assert result is False
+  
+  def test_i_cannot_delete_job_with_pymongo_error(self, in_memory_repository, sample_document_id, mocker):
+    """Test handling of PyMongo errors during job deletion."""
+    # Arrange
+    created_job = in_memory_repository.create_job(sample_document_id)
+    mocker.patch.object(in_memory_repository.collection, 'delete_one', side_effect=PyMongoError("Database error"))
+    
+    # Act & Assert
+    with pytest.raises(JobRepositoryError) as exc_info:
+      in_memory_repository.delete_job(created_job.id)
+    
+    assert "delete_job" in str(exc_info.value)
+
+
+class TestJobRepositoryComplexScenarios:
+  """Tests for complex job repository scenarios."""
+  
+  def test_i_can_handle_complete_job_lifecycle(self, in_memory_repository, sample_document_id, sample_task_id):
+    """Test complete job lifecycle from creation to completion."""
+    # Create job
+    job = in_memory_repository.create_job(sample_document_id, sample_task_id)
+    assert job.status == ProcessingStatus.PENDING
+    assert job.started_at is None
+    assert job.completed_at is None
+    
+    # Start processing
+    job = in_memory_repository.update_job_status(job.id, ProcessingStatus.PROCESSING)
+    assert job.status == ProcessingStatus.PROCESSING
+    assert job.started_at is not None
+    assert job.completed_at is None
+    
+    # Complete job
+    job = in_memory_repository.update_job_status(job.id, ProcessingStatus.COMPLETED)
+    assert job.status == ProcessingStatus.COMPLETED
+    assert job.started_at is not None
+    assert job.completed_at is not None
+    assert job.error_message is None
+  
+  def test_i_can_handle_job_failure_scenario(self, in_memory_repository, sample_document_id, sample_task_id):
+    """Test job failure scenario with error message."""
+    # Create and start job
+    job = in_memory_repository.create_job(sample_document_id, sample_task_id)
+    job = in_memory_repository.update_job_status(job.id, ProcessingStatus.PROCESSING)
+    
+    # Fail job with error
+    error_msg = "File format not supported"
+    job = in_memory_repository.update_job_status(job.id, ProcessingStatus.FAILED, error_msg)
+    
+    # Assert failure state
+    assert job.status == ProcessingStatus.FAILED
+    assert job.started_at is not None
+    assert job.completed_at is not None
+    assert job.error_message == error_msg
+  
+  def test_i_can_handle_multiple_documents_with_different_statuses(self, in_memory_repository):
+    """Test managing multiple jobs for different documents with various statuses."""
+    # Create jobs for different documents
+    doc1 = PyObjectId()
+    doc2 = PyObjectId()
+    doc3 = PyObjectId()
+    
+    job1 = in_memory_repository.create_job(doc1, "task-1")
+    job2 = in_memory_repository.create_job(doc2, "task-2")
+    job3 = in_memory_repository.create_job(doc3, "task-3")
+    
+    # Update to different statuses
+    in_memory_repository.update_job_status(job1.id, ProcessingStatus.PROCESSING)
+    in_memory_repository.update_job_status(job2.id, ProcessingStatus.COMPLETED)
+    in_memory_repository.update_job_status(job3.id, ProcessingStatus.FAILED, "Error occurred")
+    
+    # Verify status queries
+    pending_jobs = in_memory_repository.get_jobs_by_status(ProcessingStatus.PENDING)
+    processing_jobs = in_memory_repository.get_jobs_by_status(ProcessingStatus.PROCESSING)
+    completed_jobs = in_memory_repository.get_jobs_by_status(ProcessingStatus.COMPLETED)
+    failed_jobs = in_memory_repository.get_jobs_by_status(ProcessingStatus.FAILED)
+    
+    assert len(pending_jobs) == 0
+    assert len(processing_jobs) == 1
+    assert len(completed_jobs) == 1
+    assert len(failed_jobs) == 1
+    
+    assert processing_jobs[0].id == job1.id
+    assert completed_jobs[0].id == job2.id
+    assert failed_jobs[0].id == job3.id
--- a/tests/repositories/test_user_repository.py
+++ b/tests/repositories/test_user_repository.py
@@ -0,0 +1,279 @@
+"""
+Test suite for UserRepository with async/support.
+
+This module contains comprehensive tests for all UserRepository methods
+using mongomock-motor for in-memory MongoDB testing.
+"""
+
+import pytest
+from bson import ObjectId
+from mongomock.mongo_client import MongoClient
+from pymongo.errors import DuplicateKeyError
+
+from app.database.repositories.user_repository import UserRepository
+from app.models.user import UserCreate, UserUpdate
+
+
+@pytest.fixture
+def in_memory_repository():
+  """Create an in-memory UserRepository for testing."""
+  client = MongoClient()
+  db = client.test_database
+  repo = UserRepository(db)
+  repo.initialize()
+  return repo
+
+
+@pytest.fixture
+def sample_user_create():
+  """Sample UserCreate data for testing."""
+  return UserCreate(
+    username="testuser",
+    email="test@example.com",
+    password="#TestPassword123",
+    role="user"
+  )
+
+
+@pytest.fixture
+def sample_user_update():
+  """Sample UserUpdate data for testing."""
+  return UserUpdate(
+    username="updateduser",
+    email="updated@example.com",
+    role="admin"
+  )
+
+
+class TestUserRepositoryCreation:
+  """Tests for user creation functionality."""
+  
+  def test_i_can_create_user(self, in_memory_repository, sample_user_create):
+    """Test successful user creation."""
+    # Act
+    created_user = in_memory_repository.create_user(sample_user_create)
+    
+    # Assert
+    assert created_user is not None
+    assert created_user.username == sample_user_create.username
+    assert created_user.email == sample_user_create.email
+    assert created_user.role == sample_user_create.role
+    assert created_user.is_active is True
+    assert created_user.id is not None
+    assert created_user.created_at is not None
+    assert created_user.updated_at is not None
+    assert created_user.hashed_password != sample_user_create.password  # Should be hashed
+  
+  def test_i_cannot_create_user_with_duplicate_username(self, in_memory_repository, sample_user_create):
+    """Test that creating user with duplicate username raises DuplicateKeyError."""
+    # Arrange
+    in_memory_repository.create_user(sample_user_create)
+    
+    # Act & Assert
+    with pytest.raises(DuplicateKeyError) as exc_info:
+      in_memory_repository.create_user(sample_user_create)
+    
+    assert "already exists" in str(exc_info.value)
+
+
+class TestUserRepositoryFinding:
+  """Tests for user finding functionality."""
+  
+  def test_i_can_find_user_by_id(self, in_memory_repository, sample_user_create):
+    """Test finding user by valid ID."""
+    # Arrange
+    created_user = in_memory_repository.create_user(sample_user_create)
+    
+    # Act
+    found_user = in_memory_repository.find_user_by_id(str(created_user.id))
+    
+    # Assert
+    assert found_user is not None
+    assert found_user.id == created_user.id
+    assert found_user.username == created_user.username
+    assert found_user.email == created_user.email
+  
+  def test_i_cannot_find_user_by_invalid_id(self, in_memory_repository):
+    """Test that invalid ObjectId returns None."""
+    # Act
+    found_user = in_memory_repository.find_user_by_id("invalid_id")
+    
+    # Assert
+    assert found_user is None
+  
+  def test_i_cannot_find_user_by_nonexistent_id(self, in_memory_repository):
+    """Test that nonexistent but valid ObjectId returns None."""
+    # Arrange
+    nonexistent_id = str(ObjectId())
+    
+    # Act
+    found_user = in_memory_repository.find_user_by_id(nonexistent_id)
+    
+    # Assert
+    assert found_user is None
+  
+  def test_i_can_find_user_by_username(self, in_memory_repository, sample_user_create):
+    """Test finding user by username."""
+    # Arrange
+    created_user = in_memory_repository.create_user(sample_user_create)
+    
+    # Act
+    found_user = in_memory_repository.find_user_by_username(sample_user_create.username)
+    
+    # Assert
+    assert found_user is not None
+    assert found_user.username == created_user.username
+    assert found_user.id == created_user.id
+  
+  def test_i_cannot_find_user_by_nonexistent_username(self, in_memory_repository):
+    """Test that nonexistent username returns None."""
+    # Act
+    found_user = in_memory_repository.find_user_by_username("nonexistent")
+    
+    # Assert
+    assert found_user is None
+  
+  def test_i_can_find_user_by_email(self, in_memory_repository, sample_user_create):
+    """Test finding user by email."""
+    # Arrange
+    created_user = in_memory_repository.create_user(sample_user_create)
+    
+    # Act
+    found_user = in_memory_repository.find_user_by_email(str(sample_user_create.email))
+    
+    # Assert
+    assert found_user is not None
+    assert found_user.email == created_user.email
+    assert found_user.id == created_user.id
+  
+  def test_i_cannot_find_user_by_nonexistent_email(self, in_memory_repository):
+    """Test that nonexistent email returns None."""
+    # Act
+    found_user = in_memory_repository.find_user_by_email("nonexistent@example.com")
+    
+    # Assert
+    assert found_user is None
+
+
+class TestUserRepositoryUpdate:
+  """Tests for user update functionality."""
+  
+  def test_i_can_update_user(self, in_memory_repository, sample_user_create, sample_user_update):
+    """Test successful user update."""
+    # Arrange
+    created_user = in_memory_repository.create_user(sample_user_create)
+    original_updated_at = created_user.updated_at
+    
+    # Act
+    updated_user = in_memory_repository.update_user(str(created_user.id), sample_user_update)
+    
+    # Assert
+    assert updated_user is not None
+    assert updated_user.username == sample_user_update.username
+    assert updated_user.email == sample_user_update.email
+    assert updated_user.role == sample_user_update.role
+    assert updated_user.id == created_user.id
+  
+  def test_i_cannot_update_user_with_invalid_id(self, in_memory_repository, sample_user_update):
+    """Test that updating with invalid ID returns None."""
+    # Act
+    result = in_memory_repository.update_user("invalid_id", sample_user_update)
+    
+    # Assert
+    assert result is None
+  
+  def test_i_can_update_user_with_partial_data(self, in_memory_repository, sample_user_create):
+    """Test updating user with partial data."""
+    # Arrange
+    created_user = in_memory_repository.create_user(sample_user_create)
+    partial_update = UserUpdate(username="newusername")
+    
+    # Act
+    updated_user = in_memory_repository.update_user(str(created_user.id), partial_update)
+    
+    # Assert
+    assert updated_user is not None
+    assert updated_user.username == "newusername"
+    assert updated_user.email == created_user.email  # Should remain unchanged
+    assert updated_user.role == created_user.role  # Should remain unchanged
+  
+  def test_i_can_update_user_with_empty_data(self, in_memory_repository, sample_user_create):
+    """Test updating user with empty data returns current user."""
+    # Arrange
+    created_user = in_memory_repository.create_user(sample_user_create)
+    empty_update = UserUpdate()
+    
+    # Act
+    result = in_memory_repository.update_user(str(created_user.id), empty_update)
+    
+    # Assert
+    assert result is not None
+    assert result.username == created_user.username
+    assert result.email == created_user.email
+
+
+class TestUserRepositoryDeletion:
+  """Tests for user deletion functionality."""
+  
+  def test_i_can_delete_user(self, in_memory_repository, sample_user_create):
+    """Test successful user deletion."""
+    # Arrange
+    created_user = in_memory_repository.create_user(sample_user_create)
+    
+    # Act
+    deletion_result = in_memory_repository.delete_user(str(created_user.id))
+    
+    # Assert
+    assert deletion_result is True
+    
+    # Verify user is actually deleted
+    found_user = in_memory_repository.find_user_by_id(str(created_user.id))
+    assert found_user is None
+  
+  def test_i_cannot_delete_user_with_invalid_id(self, in_memory_repository):
+    """Test that deleting with invalid ID returns False."""
+    # Act
+    result = in_memory_repository.delete_user("invalid_id")
+    
+    # Assert
+    assert result is False
+  
+  def test_i_cannot_delete_nonexistent_user(self, in_memory_repository):
+    """Test that deleting nonexistent user returns False."""
+    # Arrange
+    nonexistent_id = str(ObjectId())
+    
+    # Act
+    result = in_memory_repository.delete_user(nonexistent_id)
+    
+    # Assert
+    assert result is False
+
+
+class TestUserRepositoryUtilities:
+  """Tests for utility methods."""
+  
+  def test_i_can_count_users(self, in_memory_repository, sample_user_create):
+    """Test counting users."""
+    # Arrange
+    initial_count = in_memory_repository.count_users()
+    in_memory_repository.create_user(sample_user_create)
+    
+    # Act
+    final_count = in_memory_repository.count_users()
+    
+    # Assert
+    assert final_count == initial_count + 1
+  
+  def test_i_can_check_user_exists(self, in_memory_repository, sample_user_create):
+    """Test checking if user exists."""
+    # Arrange
+    in_memory_repository.create_user(sample_user_create)
+    
+    # Act
+    exists = in_memory_repository.user_exists(sample_user_create.username)
+    not_exists = in_memory_repository.user_exists("nonexistent")
+    
+    # Assert
+    assert exists is True
+    assert not_exists is False
--- a/tests/services/init.py
+++ b/tests/services/init.py
--- a/tests/services/test_document_service.py
+++ b/tests/services/test_document_service.py
@@ -0,0 +1,570 @@
+"""
+Unit tests for DocumentService using in-memory MongoDB.
+
+Tests the orchestration logic with real MongoDB operations
+using mongomock for better integration testing.
+"""
+import os
+from datetime import datetime
+from unittest.mock import patch
+
+import pytest
+import pytest_asyncio
+from bson import ObjectId
+from mongomock.mongo_client import MongoClient
+
+from app.models.document import FileType
+from app.services.document_service import DocumentService
+
+
+@pytest.fixture(autouse=True)
+def cleanup_test_folder():
+  """Clean up test folder."""
+  import shutil
+  shutil.rmtree("test_folder", ignore_errors=True)
+
+
+@pytest.fixture
+def in_memory_database():
+  """Create an in-memory database for testing."""
+  client = MongoClient()
+  return client.test_database
+
+
+@pytest_asyncio.fixture
+def document_service(in_memory_database):
+  """Create DocumentService with in-memory repositories."""
+  service = DocumentService(in_memory_database, objects_folder="test_folder")
+  return service
+
+
+@pytest.fixture
+def sample_file_bytes():
+  """Sample file content as bytes."""
+  return b"This is a test PDF content"
+
+
+@pytest.fixture
+def sample_text_bytes():
+  """Sample text file content as bytes."""
+  return b"This is a test text file content"
+
+
+@pytest.fixture
+def sample_file_hash():
+  """Expected SHA256 hash for sample file bytes."""
+  import hashlib
+  return hashlib.sha256(b"This is a test PDF content").hexdigest()
+
+
+def validate_file_saved(document_service, file_hash, file_bytes):
+  # Verify file is saved to disk
+  target_file_path = os.path.join(document_service.objects_folder, file_hash[:24], file_hash)
+  assert os.path.exists(target_file_path)
+  
+  with open(target_file_path, "rb") as f:
+    content = f.read()
+  assert content == file_bytes
+
+
+class TestCreateDocument:
+  """Tests for create_document method."""
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  @patch('app.services.document_service.datetime')
+  def test_i_can_create_document_with_new_content(
+      self,
+      mock_datetime,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test creating document when content doesn't exist yet."""
+    # Setup mocks
+    fixed_time = datetime(2025, 1, 1, 10, 30, 0)
+    mock_datetime.now.return_value = fixed_time
+    mock_magic.return_value = "application/pdf"
+    
+    # Execute
+    result = document_service.create_document(
+      "/test/test.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Verify document creation
+    assert result is not None
+    assert result.filename == "test.pdf"
+    assert result.filepath == "/test/test.pdf"
+    assert result.file_type == FileType.PDF
+    assert result.detected_at == fixed_time
+    assert result.file_hash == document_service._calculate_file_hash(sample_file_bytes)
+    
+    # Verify document created in database
+    doc_in_db = document_service.document_repository.find_document_by_id(result.id)
+    assert doc_in_db is not None
+    assert doc_in_db.id == result.id
+    assert doc_in_db.filename == result.filename
+    assert doc_in_db.filepath == result.filepath
+    assert doc_in_db.file_type == result.file_type
+    assert doc_in_db.detected_at == fixed_time
+    assert doc_in_db.file_hash == result.file_hash
+    
+    # Verify file is saved to disk
+    validate_file_saved(document_service, result.file_hash, sample_file_bytes)
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  @patch('app.services.document_service.datetime')
+  def test_i_can_create_document_with_existing_content(
+      self,
+      mock_datetime,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test creating document when content already exists (deduplication)."""
+    # Setup mocks
+    fixed_time = datetime(2025, 1, 1, 10, 30, 0)
+    mock_datetime.now.return_value = fixed_time
+    mock_magic.return_value = "application/pdf"
+    
+    # Create first document
+    first_doc = document_service.create_document(
+      "/test/first.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Create second document with same content
+    second_doc = document_service.create_document(
+      "/test/second.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Verify both documents exist but share same hash
+    assert first_doc.file_hash == second_doc.file_hash
+    assert first_doc.filename != second_doc.filename
+    assert first_doc.filepath != second_doc.filepath
+  
+  def test_i_cannot_create_document_with_unsupported_file_type(
+      self,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test that unsupported file types raise ValueError."""
+    with pytest.raises(ValueError, match="Unsupported file type"):
+      document_service.create_document(
+        "/test/test.xyz",  # Unsupported extension
+        sample_file_bytes,
+        "utf-8"
+      )
+  
+  def test_i_cannot_create_document_with_empty_file_path(
+      self,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test that empty file path raises ValueError."""
+    with pytest.raises(ValueError):
+      document_service.create_document(
+        "",  # Empty path
+        sample_file_bytes,
+        "utf-8"
+      )
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_create_document_with_empty_bytes(
+      self,
+      mock_magic,
+      document_service
+  ):
+    """Test behavior with empty file bytes."""
+    # Setup
+    mock_magic.return_value = "text/plain"
+    
+    # Execute with empty bytes
+    result = document_service.create_document(
+      "/test/empty.txt",
+      b"",  # Empty bytes
+      "utf-8"
+    )
+    
+    # Verify file is saved to disk
+    validate_file_saved(document_service, result.file_hash, b"")
+
+
+class TestGetMethods:
+  """Tests for document retrieval methods."""
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_get_document_by_id(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test retrieving document by ID."""
+    # Setup
+    mock_magic.return_value = "application/pdf"
+    
+    # Create a document first
+    created_doc = document_service.create_document(
+      "/test/test.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Execute
+    result = document_service.get_document_by_id(created_doc.id)
+    
+    # Verify
+    assert result is not None
+    assert result.id == created_doc.id
+    assert result.filename == created_doc.filename
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_get_document_by_hash(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test retrieving document by file hash."""
+    # Setup
+    mock_magic.return_value = "application/pdf"
+    
+    # Create a document first
+    created_doc = document_service.create_document(
+      "/test/test.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Execute
+    result = document_service.get_document_by_hash(created_doc.file_hash)
+    
+    # Verify
+    assert result is not None
+    assert result.file_hash == created_doc.file_hash
+    assert result.filename == created_doc.filename
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_get_document_by_filepath(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test retrieving document by file path."""
+    # Setup
+    mock_magic.return_value = "application/pdf"
+    test_path = "/test/unique_test.pdf"
+    
+    # Create a document first
+    created_doc = document_service.create_document(
+      test_path,
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Execute
+    result = document_service.get_document_by_filepath(test_path)
+    
+    # Verify
+    assert result is not None
+    assert result.filepath == test_path
+    assert result.id == created_doc.id
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_get_document_content(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test retrieving document with associated content."""
+    # Setup
+    mock_magic.return_value = "application/pdf"
+    
+    # Create a document first
+    created_doc = document_service.create_document(
+      "/test/test.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Execute
+    result = document_service.get_document_content_by_hash(created_doc.file_hash)
+    
+    # Verify
+    assert result == sample_file_bytes
+  
+  def test_i_cannot_get_nonexistent_document_by_id(
+      self,
+      document_service
+  ):
+    """Test that nonexistent document returns None."""
+    # Execute with random ObjectId
+    result = document_service.get_document_by_id(ObjectId())
+    
+    # Verify
+    assert result is None
+  
+  def test_i_cannot_get_nonexistent_document_by_hash(
+      self,
+      document_service
+  ):
+    """Test that nonexistent document hash returns None."""
+    # Execute
+    result = document_service.get_document_by_hash("nonexistent_hash")
+    
+    # Verify
+    assert result is None
+
+
+class TestPaginationAndCounting:
+  """Tests for document listing and counting."""
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_list_documents_with_pagination(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test document listing with pagination parameters."""
+    # Setup
+    mock_magic.return_value = "application/pdf"
+    
+    # Create multiple documents
+    for i in range(5):
+      document_service.create_document(
+        f"/test/test{i}.pdf",
+        sample_file_bytes + bytes(str(i), 'utf-8'),  # Make each file unique
+        "utf-8"
+      )
+    
+    # Execute with pagination
+    result = document_service.list_documents(skip=1, limit=2)
+    
+    # Verify
+    assert len(result) == 2
+    
+    # Test counting
+    total_count = document_service.count_documents()
+    assert total_count == 5
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_count_documents(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test document counting."""
+    # Setup
+    mock_magic.return_value = "text/plain"
+    
+    # Initially should be 0
+    initial_count = document_service.count_documents()
+    assert initial_count == 0
+    
+    # Create some documents
+    for i in range(3):
+      document_service.create_document(
+        f"/test/test{i}.txt",
+        sample_file_bytes + bytes(str(i), 'utf-8'),
+        "utf-8"
+      )
+    
+    # Execute
+    final_count = document_service.count_documents()
+    
+    # Verify
+    assert final_count == 3
+
+
+class TestUpdateAndDelete:
+  """Tests for document update and deletion operations."""
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_update_document_metadata(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test updating document metadata."""
+    # Setup
+    mock_magic.return_value = "application/pdf"
+    
+    # Create a document first
+    created_doc = document_service.create_document(
+      "/test/test.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Execute update
+    update_data = {"metadata": {"page_count": 5}}
+    result = document_service.update_document(created_doc.id, update_data)
+    
+    # Verify
+    assert result is not None
+    assert result.metadata.get("page_count") == 5
+    assert result.filename == created_doc.filename
+    assert result.filepath == created_doc.filepath
+    assert result.file_hash == created_doc.file_hash
+    assert result.file_type == created_doc.file_type
+    assert result.metadata == update_data['metadata']
+  
+  def test_i_can_update_document_content(
+      self,
+      document_service,
+      sample_file_bytes
+  ):
+    # Create a document first
+    created_doc = document_service.create_document(
+      "/test/test.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Execute update
+    update_data = {"file_bytes": b"this is an updated file content"}
+    result = document_service.update_document(created_doc.id, update_data)
+    
+    assert result.filename == created_doc.filename
+    assert result.filepath == created_doc.filepath
+    assert result.file_hash != created_doc.file_hash
+    assert result.file_type == created_doc.file_type
+    assert result.metadata == created_doc.metadata
+    
+    # Verify file is saved to disk
+    validate_file_saved(document_service, result.file_hash, b"this is an updated file content")
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_delete_document_and_orphaned_content(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test deleting document with orphaned content cleanup."""
+    # Setup
+    mock_magic.return_value = "application/pdf"
+    
+    # Create a document
+    created_doc = document_service.create_document(
+      "/test/test.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # Verify content exists
+    validate_file_saved(document_service, created_doc.file_hash, sample_file_bytes)
+    
+    # Execute deletion
+    result = document_service.delete_document(created_doc.id)
+    
+    # Verify document and content are deleted
+    assert result is True
+    
+    deleted_doc = document_service.get_document_by_id(created_doc.id)
+    assert deleted_doc is None
+    
+    # validate content is deleted
+    file_hash = created_doc.file_hash[:24]
+    target_file_path = os.path.join(document_service.objects_folder, file_hash[:24], file_hash)
+    assert not os.path.exists(target_file_path)
+  
+  @patch('app.services.document_service.magic.from_buffer')
+  def test_i_can_delete_document_without_affecting_shared_content(
+      self,
+      mock_magic,
+      document_service,
+      sample_file_bytes
+  ):
+    """Test deleting document without removing shared content."""
+    # Setup
+    mock_magic.return_value = "application/pdf"
+    
+    # Create two documents with same content
+    doc1 = document_service.create_document(
+      "/test/test1.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    doc2 = document_service.create_document(
+      "/test/test2.pdf",
+      sample_file_bytes,
+      "utf-8"
+    )
+    
+    # They should share the same hash
+    assert doc1.file_hash == doc2.file_hash
+    
+    # Delete first document
+    result = document_service.delete_document(doc1.id)
+    assert result is True
+    
+    # Verify first document is deleted but content still exists
+    deleted_doc = document_service.get_document_by_id(doc1.id)
+    assert deleted_doc is None
+    
+    remaining_doc = document_service.get_document_by_id(doc2.id)
+    assert remaining_doc is not None
+    
+    validate_file_saved(document_service, doc2.file_hash, sample_file_bytes)
+
+
+class TestHashCalculation:
+  """Tests for file hash calculation utility."""
+  
+  def test_i_can_calculate_consistent_file_hash(self, document_service):
+    """Test that file hash calculation is consistent."""
+    test_bytes = b"Test content for hashing"
+    
+    # Calculate hash multiple times
+    hash1 = document_service._calculate_file_hash(test_bytes)
+    hash2 = document_service._calculate_file_hash(test_bytes)
+    
+    # Should be identical
+    assert hash1 == hash2
+    assert len(hash1) == 64  # SHA256 produces 64-character hex string
+  
+  def test_i_get_different_hashes_for_different_content(self, document_service):
+    """Test that different content produces different hashes."""
+    content1 = b"First content"
+    content2 = b"Second content"
+    
+    hash1 = document_service._calculate_file_hash(content1)
+    hash2 = document_service._calculate_file_hash(content2)
+    
+    assert hash1 != hash2
+
+
+class TestFileTypeDetection:
+  """Tests for file type detection."""
+  
+  def test_i_can_detect_pdf_file_type(self, document_service):
+    """Test PDF file type detection."""
+    file_type = document_service._detect_file_type("/path/to/document.pdf")
+    assert file_type == FileType.PDF
+  
+  def test_i_can_detect_txt_file_type(self, document_service):
+    """Test text file type detection."""
+    file_type = document_service._detect_file_type("/path/to/document.txt")
+    assert file_type == FileType.TXT
+  
+  def test_i_can_detect_docx_file_type(self, document_service):
+    """Test DOCX file type detection."""
+    file_type = document_service._detect_file_type("/path/to/document.docx")
+    assert file_type == FileType.DOCX
+  
+  def test_i_cannot_detect_unsupported_file_type(self, document_service):
+    """Test unsupported file type raises ValueError."""
+    with pytest.raises(ValueError, match="Unsupported file type"):
+      document_service._detect_file_type("/path/to/document.xyz")
--- a/tests/services/test_job_service.py
+++ b/tests/services/test_job_service.py
@@ -0,0 +1,518 @@
+"""
+Unit tests for JobService using in-memory MongoDB.
+
+Tests the business logic operations with real MongoDB operations
+using mongomock for better integration testing.
+"""
+
+import pytest
+from bson import ObjectId
+from mongomock.mongo_client import MongoClient
+
+from app.exceptions.job_exceptions import InvalidStatusTransitionError
+from app.models.job import ProcessingStatus
+from app.models.types import PyObjectId
+from app.services.job_service import JobService
+
+
+@pytest.fixture
+def in_memory_database():
+  """Create an in-memory database for testing."""
+  client = MongoClient()
+  return client.test_database
+
+
+@pytest.fixture
+def job_service(in_memory_database):
+  """Create JobService with in-memory repositories."""
+  service = JobService(in_memory_database).initialize()
+  return service
+
+
+@pytest.fixture
+def sample_document_id():
+  """Sample file ObjectId."""
+  return PyObjectId()
+
+
+@pytest.fixture
+def sample_task_id():
+  """Sample Celery task UUID."""
+  return "550e8400-e29b-41d4-a716-446655440000"
+
+
+class TestCreateJob:
+  """Tests for create_job method."""
+  
+  def test_i_can_create_job_with_task_id(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test creating job with task ID."""
+    # Execute
+    result = job_service.create_job(sample_document_id, sample_task_id)
+    
+    # Verify job creation
+    assert result is not None
+    assert result.document_id == sample_document_id
+    assert result.task_id == sample_task_id
+    assert result.status == ProcessingStatus.PENDING
+    assert result.created_at is not None
+    assert result.started_at is None
+    assert result.error_message is None
+    
+    # Verify job exists in database
+    job_in_db = job_service.get_job_by_id(result.id)
+    assert job_in_db is not None
+    assert job_in_db.id == result.id
+    assert job_in_db.document_id == sample_document_id
+    assert job_in_db.task_id == sample_task_id
+    assert job_in_db.status == ProcessingStatus.PENDING
+  
+  def test_i_can_create_job_without_task_id(
+      self,
+      job_service,
+      sample_document_id
+  ):
+    """Test creating job without task ID."""
+    # Execute
+    result = job_service.create_job(sample_document_id)
+    
+    # Verify job creation
+    assert result is not None
+    assert result.document_id == sample_document_id
+    assert result.task_id is None
+    assert result.status == ProcessingStatus.PENDING
+    assert result.created_at is not None
+    assert result.started_at is None
+    assert result.error_message is None
+
+
+class TestGetJobMethods:
+  """Tests for job retrieval methods."""
+  
+  def test_i_can_get_job_by_id(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test retrieving job by ID."""
+    # Create a job first
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    
+    # Execute
+    result = job_service.get_job_by_id(created_job.id)
+    
+    # Verify
+    assert result is not None
+    assert result.id == created_job.id
+    assert result.document_id == created_job.document_id
+    assert result.task_id == created_job.task_id
+    assert result.status == created_job.status
+  
+  def test_i_can_get_jobs_by_status(
+      self,
+      job_service,
+      sample_document_id
+  ):
+    """Test retrieving jobs by status."""
+    # Create jobs with different statuses
+    pending_job = job_service.create_job(sample_document_id, "pending-task")
+    
+    processing_job = job_service.create_job(ObjectId(), "processing-task")
+    job_service.mark_job_as_started(processing_job.id)
+    
+    completed_job = job_service.create_job(ObjectId(), "completed-task")
+    job_service.mark_job_as_started(completed_job.id)
+    job_service.mark_job_as_completed(completed_job.id)
+    
+    # Execute - get pending jobs
+    pending_results = job_service.get_jobs_by_status(ProcessingStatus.PENDING)
+    
+    # Verify
+    assert len(pending_results) == 1
+    assert pending_results[0].id == pending_job.id
+    assert pending_results[0].status == ProcessingStatus.PENDING
+    
+    # Execute - get processing jobs
+    processing_results = job_service.get_jobs_by_status(ProcessingStatus.PROCESSING)
+    assert len(processing_results) == 1
+    assert processing_results[0].status == ProcessingStatus.PROCESSING
+    
+    # Execute - get completed jobs
+    completed_results = job_service.get_jobs_by_status(ProcessingStatus.COMPLETED)
+    assert len(completed_results) == 1
+    assert completed_results[0].status == ProcessingStatus.COMPLETED
+
+
+class TestUpdateStatus:
+  """Tests for mark_job_as_started method."""
+  
+  def test_i_can_mark_pending_job_as_started(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test marking pending job as started (PENDING → PROCESSING)."""
+    # Create a pending job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    assert created_job.status == ProcessingStatus.PENDING
+    
+    # Execute
+    result = job_service.mark_job_as_started(created_job.id)
+    
+    # Verify status transition
+    assert result is not None
+    assert result.id == created_job.id
+    assert result.status == ProcessingStatus.PROCESSING
+    
+    # Verify in database
+    updated_job = job_service.get_job_by_id(created_job.id)
+    assert updated_job.status == ProcessingStatus.PROCESSING
+  
+  def test_i_cannot_mark_processing_job_as_started(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that processing job cannot be marked as started."""
+    # Create and start a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    job_service.mark_job_as_started(created_job.id)
+    
+    # Try to start it again
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_started(created_job.id)
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.PROCESSING
+    assert exc_info.value.target_status == ProcessingStatus.PROCESSING
+  
+  def test_i_cannot_mark_completed_job_as_started(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that completed job cannot be marked as started."""
+    # Create, start, and complete a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    job_service.mark_job_as_started(created_job.id)
+    job_service.mark_job_as_completed(created_job.id)
+    
+    # Try to start it again
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_started(created_job.id)
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.COMPLETED
+    assert exc_info.value.target_status == ProcessingStatus.PROCESSING
+  
+  def test_i_cannot_mark_failed_job_as_started(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that failed job cannot be marked as started."""
+    # Create, start, and fail a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    job_service.mark_job_as_started(created_job.id)
+    job_service.mark_job_as_failed(created_job.id, "Test error")
+    
+    # Try to start it again
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_started(created_job.id)
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.FAILED
+    assert exc_info.value.target_status == ProcessingStatus.PROCESSING
+  
+  def test_i_can_mark_processing_job_as_completed(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test marking processing job as completed (PROCESSING → COMPLETED)."""
+    # Create and start a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    started_job = job_service.mark_job_as_started(created_job.id)
+    
+    # Execute
+    result = job_service.mark_job_as_completed(created_job.id)
+    
+    # Verify status transition
+    assert result is not None
+    assert result.id == created_job.id
+    assert result.status == ProcessingStatus.COMPLETED
+    
+    # Verify in database
+    updated_job = job_service.get_job_by_id(created_job.id)
+    assert updated_job.status == ProcessingStatus.COMPLETED
+  
+  def test_i_cannot_mark_pending_job_as_completed(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that pending job cannot be marked as completed."""
+    # Create a pending job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    
+    # Try to complete it directly
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_completed(created_job.id)
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.PENDING
+    assert exc_info.value.target_status == ProcessingStatus.COMPLETED
+  
+  def test_i_cannot_mark_completed_job_as_completed(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that completed job cannot be marked as completed again."""
+    # Create, start, and complete a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    job_service.mark_job_as_started(created_job.id)
+    job_service.mark_job_as_completed(created_job.id)
+    
+    # Try to complete it again
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_completed(created_job.id)
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.COMPLETED
+    assert exc_info.value.target_status == ProcessingStatus.COMPLETED
+  
+  def test_i_cannot_mark_failed_job_as_completed(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that failed job cannot be marked as completed."""
+    # Create, start, and fail a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    job_service.mark_job_as_started(created_job.id)
+    job_service.mark_job_as_failed(created_job.id, "Test error")
+    
+    # Try to complete it
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_completed(created_job.id)
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.FAILED
+    assert exc_info.value.target_status == ProcessingStatus.COMPLETED
+  
+  def test_i_can_mark_processing_job_as_failed_with_error_message(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test marking processing job as failed with error message."""
+    # Create and start a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    started_job = job_service.mark_job_as_started(created_job.id)
+    
+    error_message = "Processing failed due to invalid file format"
+    
+    # Execute
+    result = job_service.mark_job_as_failed(created_job.id, error_message)
+    
+    # Verify status transition
+    assert result is not None
+    assert result.id == created_job.id
+    assert result.status == ProcessingStatus.FAILED
+    assert result.error_message == error_message
+    
+    # Verify in database
+    updated_job = job_service.get_job_by_id(created_job.id)
+    assert updated_job.status == ProcessingStatus.FAILED
+    assert updated_job.error_message == error_message
+  
+  def test_i_can_mark_processing_job_as_failed_without_error_message(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test marking processing job as failed without error message."""
+    # Create and start a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    job_service.mark_job_as_started(created_job.id)
+    
+    # Execute without error message
+    result = job_service.mark_job_as_failed(created_job.id)
+    
+    # Verify status transition
+    assert result is not None
+    assert result.status == ProcessingStatus.FAILED
+    assert result.error_message is None
+  
+  def test_i_cannot_mark_pending_job_as_failed(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that pending job cannot be marked as failed."""
+    # Create a pending job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    
+    # Try to fail it directly
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_failed(created_job.id, "Test error")
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.PENDING
+    assert exc_info.value.target_status == ProcessingStatus.FAILED
+  
+  def test_i_cannot_mark_completed_job_as_failed(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that completed job cannot be marked as failed."""
+    # Create, start, and complete a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    job_service.mark_job_as_started(created_job.id)
+    job_service.mark_job_as_completed(created_job.id)
+    
+    # Try to fail it
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_failed(created_job.id, "Test error")
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.COMPLETED
+    assert exc_info.value.target_status == ProcessingStatus.FAILED
+  
+  def test_i_cannot_mark_failed_job_as_failed(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test that failed job cannot be marked as failed again."""
+    # Create, start, and fail a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    job_service.mark_job_as_started(created_job.id)
+    job_service.mark_job_as_failed(created_job.id, "First error")
+    
+    # Try to fail it again
+    with pytest.raises(InvalidStatusTransitionError) as exc_info:
+      job_service.mark_job_as_failed(created_job.id, "Second error")
+    
+    # Verify exception details
+    assert exc_info.value.current_status == ProcessingStatus.FAILED
+    assert exc_info.value.target_status == ProcessingStatus.FAILED
+
+
+class TestDeleteJob:
+  """Tests for delete_job method."""
+  
+  def test_i_can_delete_existing_job(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test deleting an existing job."""
+    # Create a job
+    created_job = job_service.create_job(sample_document_id, sample_task_id)
+    
+    # Verify job exists
+    job_before_delete = job_service.get_job_by_id(created_job.id)
+    assert job_before_delete is not None
+    
+    # Execute deletion
+    result = job_service.delete_job(created_job.id)
+    
+    # Verify deletion
+    assert result is True
+    
+    # Verify job no longer exists
+    deleted_job = job_service.get_job_by_id(created_job.id)
+    assert deleted_job is None
+  
+  def test_i_cannot_delete_nonexistent_job(
+      self,
+      job_service
+  ):
+    """Test deleting a nonexistent job returns False."""
+    # Execute deletion with random ObjectId
+    result = job_service.delete_job(ObjectId())
+    
+    # Verify
+    assert result is False
+
+
+class TestStatusTransitionValidation:
+  """Tests for status transition validation across different scenarios."""
+  
+  def test_valid_job_lifecycle_flow(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test complete valid job lifecycle: PENDING → PROCESSING → COMPLETED."""
+    # Create job (PENDING)
+    job = job_service.create_job(sample_document_id, sample_task_id)
+    assert job.status == ProcessingStatus.PENDING
+    
+    # Start job (PENDING → PROCESSING)
+    started_job = job_service.mark_job_as_started(job.id)
+    assert started_job.status == ProcessingStatus.PROCESSING
+    
+    # Complete job (PROCESSING → COMPLETED)
+    completed_job = job_service.mark_job_as_completed(job.id)
+    assert completed_job.status == ProcessingStatus.COMPLETED
+  
+  def test_valid_job_failure_flow(
+      self,
+      job_service,
+      sample_document_id,
+      sample_task_id
+  ):
+    """Test valid job failure: PENDING → PROCESSING → FAILED."""
+    # Create job (PENDING)
+    job = job_service.create_job(sample_document_id, sample_task_id)
+    assert job.status == ProcessingStatus.PENDING
+    
+    # Start job (PENDING → PROCESSING)
+    started_job = job_service.mark_job_as_started(job.id)
+    assert started_job.status == ProcessingStatus.PROCESSING
+    
+    # Fail job (PROCESSING → FAILED)
+    failed_job = job_service.mark_job_as_failed(job.id, "Test failure")
+    assert failed_job.status == ProcessingStatus.FAILED
+    assert failed_job.error_message == "Test failure"
+  
+  def test_job_operations_with_empty_database(
+      self,
+      job_service
+  ):
+    """Test job operations when database is empty."""
+    # Try to get nonexistent job
+    result = job_service.get_job_by_id(ObjectId())
+    assert result is None
+    
+    # Try to get jobs by status when none exist
+    pending_jobs = job_service.get_jobs_by_status(ProcessingStatus.PENDING)
+    assert pending_jobs == []
+    
+    # Try to delete nonexistent job
+    delete_result = job_service.delete_job(ObjectId())
+    assert delete_result is False
--- a/tests/test_connection.py
+++ b/tests/test_connection.py
@@ -1,194 +0,0 @@
-"""
-Unit tests for MongoDB database connection module.
-
-Tests the database connection functionality with mocking
-to avoid requiring actual MongoDB instance during tests.
-"""
-
-import pytest
-from unittest.mock import Mock, patch, MagicMock
-from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError
-
-from app.database.connection import (
-  create_mongodb_client,
-  get_database,
-  close_database_connection,
-  get_mongodb_client,
-  test_database_connection
-)
-
-
-def test_i_can_get_database_connection():
-  """Test successful database connection creation."""
-  mock_client = Mock()
-  mock_database = Mock()
-  mock_client.__getitem__.return_value = mock_database
-  
-  with patch('app.database.connection.MongoClient', return_value=mock_client):
-    with patch('app.database.connection.get_mongodb_url', return_value="mongodb://localhost:27017"):
-      with patch('app.database.connection.get_mongodb_database_name', return_value="testdb"):
-        # Reset global variables
-        import app.database.connection
-        app.database.connection._client = None
-        app.database.connection._database = None
-        
-        result = get_database()
-        
-        assert result == mock_database
-        mock_client.admin.command.assert_called_with('ping')
-
-
-def test_i_cannot_connect_to_invalid_mongodb_url():
-  """Test fail-fast behavior with invalid MongoDB URL."""
-  mock_client = Mock()
-  mock_client.admin.command.side_effect = ConnectionFailure("Connection failed")
-  
-  with patch('app.database.connection.MongoClient', return_value=mock_client):
-    with patch('app.database.connection.get_mongodb_url', return_value="mongodb://invalid:27017"):
-      with pytest.raises(SystemExit) as exc_info:
-        create_mongodb_client()
-      
-      assert exc_info.value.code == 1
-
-
-def test_i_cannot_connect_with_server_selection_timeout():
-  """Test fail-fast behavior with server selection timeout."""
-  mock_client = Mock()
-  mock_client.admin.command.side_effect = ServerSelectionTimeoutError("Timeout")
-  
-  with patch('app.database.connection.MongoClient', return_value=mock_client):
-    with patch('app.database.connection.get_mongodb_url', return_value="mongodb://timeout:27017"):
-      with pytest.raises(SystemExit) as exc_info:
-        create_mongodb_client()
-      
-      assert exc_info.value.code == 1
-
-
-def test_i_cannot_connect_with_unexpected_error():
-  """Test fail-fast behavior with unexpected connection error."""
-  with patch('app.database.connection.MongoClient', side_effect=Exception("Unexpected error")):
-    with patch('app.database.connection.get_mongodb_url', return_value="mongodb://error:27017"):
-      with pytest.raises(SystemExit) as exc_info:
-        create_mongodb_client()
-      
-      assert exc_info.value.code == 1
-
-
-def test_i_can_get_database_singleton():
-  """Test that get_database returns the same instance (singleton pattern)."""
-  mock_client = Mock()
-  mock_database = Mock()
-  mock_client.__getitem__.return_value = mock_database
-  
-  with patch('app.database.connection.MongoClient', return_value=mock_client):
-    with patch('app.database.connection.get_mongodb_url', return_value="mongodb://localhost:27017"):
-      with patch('app.database.connection.get_mongodb_database_name', return_value="testdb"):
-        # Reset global variables
-        import app.database.connection
-        app.database.connection._client = None
-        app.database.connection._database = None
-        
-        # First call
-        db1 = get_database()
-        # Second call
-        db2 = get_database()
-        
-        assert db1 is db2
-        # MongoClient should be called only once
-        assert mock_client.admin.command.call_count == 1
-
-
-def test_i_can_close_database_connection():
-  """Test closing database connection."""
-  mock_client = Mock()
-  mock_database = Mock()
-  mock_client.__getitem__.return_value = mock_database
-  
-  with patch('app.database.connection.MongoClient', return_value=mock_client):
-    with patch('app.database.connection.get_mongodb_url', return_value="mongodb://localhost:27017"):
-      with patch('app.database.connection.get_mongodb_database_name', return_value="testdb"):
-        # Reset global variables
-        import app.database.connection
-        app.database.connection._client = None
-        app.database.connection._database = None
-        
-        # Create connection
-        get_database()
-        
-        # Close connection
-        close_database_connection()
-        
-        mock_client.close.assert_called_once()
-        assert app.database.connection._client is None
-        assert app.database.connection._database is None
-
-
-def test_i_can_get_mongodb_client():
-  """Test getting raw MongoDB client instance."""
-  mock_client = Mock()
-  mock_database = Mock()
-  mock_client.__getitem__.return_value = mock_database
-  
-  with patch('app.database.connection.MongoClient', return_value=mock_client):
-    with patch('app.database.connection.get_mongodb_url', return_value="mongodb://localhost:27017"):
-      with patch('app.database.connection.get_mongodb_database_name', return_value="testdb"):
-        # Reset global variables
-        import app.database.connection
-        app.database.connection._client = None
-        app.database.connection._database = None
-        
-        # Create connection first
-        get_database()
-        
-        # Get client
-        result = get_mongodb_client()
-        
-        assert result == mock_client
-
-
-def test_i_can_get_none_mongodb_client_when_not_connected():
-  """Test getting MongoDB client returns None when not connected."""
-  # Reset global variables
-  import app.database.connection
-  app.database.connection._client = None
-  app.database.connection._database = None
-  
-  result = get_mongodb_client()
-  assert result is None
-
-
-def test_i_can_test_database_connection_success():
-  """Test database connection health check - success case."""
-  mock_database = Mock()
-  mock_database.command.return_value = True
-  
-  with patch('app.database.connection.get_database', return_value=mock_database):
-    result = test_database_connection()
-    
-    assert result is True
-    mock_database.command.assert_called_with('ping')
-
-
-def test_i_cannot_test_database_connection_failure():
-  """Test database connection health check - failure case."""
-  mock_database = Mock()
-  mock_database.command.side_effect = Exception("Connection error")
-  
-  with patch('app.database.connection.get_database', return_value=mock_database):
-    result = test_database_connection()
-    
-    assert result is False
-
-
-def test_i_can_close_connection_when_no_client():
-  """Test closing connection when no client exists (should not raise error)."""
-  # Reset global variables
-  import app.database.connection
-  app.database.connection._client = None
-  app.database.connection._database = None
-  
-  # Should not raise any exception
-  close_database_connection()
-  
-  assert app.database.connection._client is None
-  assert app.database.connection._database is None
--- a/tests/test_user_repository.py
+++ b/tests/test_user_repository.py
@@ -1,385 +0,0 @@
-"""
-Unit tests for user repository module.
-
-Tests all CRUD operations for users with MongoDB mocking
-to ensure proper database interactions without requiring
-actual MongoDB instance during tests.
-"""
-
-import pytest
-from unittest.mock import Mock, MagicMock
-from datetime import datetime
-from bson import ObjectId
-from pymongo.errors import DuplicateKeyError
-
-from app.database.repositories.user_repository import UserRepository
-from app.models.user import UserCreate, UserUpdate, UserInDB, UserRole
-
-
-@pytest.fixture
-def mock_database():
-  """Create mock database with users collection."""
-  db = Mock()
-  collection = Mock()
-  db.users = collection
-  return db
-
-
-@pytest.fixture
-def user_repository(mock_database):
-  """Create UserRepository instance with mocked database."""
-  return UserRepository(mock_database)
-
-
-@pytest.fixture
-def sample_user_create():
-  """Create sample UserCreate object for testing."""
-  return UserCreate(
-    username="testuser",
-    email="test@example.com",
-    hashed_password="hashed_password_123",
-    role=UserRole.USER,
-    is_active=True
-  )
-
-
-@pytest.fixture
-def sample_user_update():
-  """Create sample UserUpdate object for testing."""
-  return UserUpdate(
-    email="updated@example.com",
-    role=UserRole.ADMIN,
-    is_active=False
-  )
-
-
-def test_i_can_create_user(user_repository, mock_database, sample_user_create):
-  """Test successful user creation."""
-  # Mock successful insertion
-  mock_result = Mock()
-  mock_result.inserted_id = ObjectId()
-  mock_database.users.insert_one.return_value = mock_result
-  
-  result = user_repository.create_user(sample_user_create)
-  
-  assert isinstance(result, UserInDB)
-  assert result.username == sample_user_create.username
-  assert result.email == sample_user_create.email
-  assert result.hashed_password == sample_user_create.hashed_password
-  assert result.role == sample_user_create.role
-  assert result.is_active == sample_user_create.is_active
-  assert result.id is not None
-  assert isinstance(result.created_at, datetime)
-  assert isinstance(result.updated_at, datetime)
-  
-  # Verify insert_one was called with correct data
-  mock_database.users.insert_one.assert_called_once()
-  call_args = mock_database.users.insert_one.call_args[0][0]
-  assert call_args["username"] == sample_user_create.username
-  assert call_args["email"] == sample_user_create.email
-
-
-def test_i_cannot_create_duplicate_username(user_repository, mock_database, sample_user_create):
-  """Test that creating user with duplicate username raises DuplicateKeyError."""
-  # Mock DuplicateKeyError from MongoDB
-  mock_database.users.insert_one.side_effect = DuplicateKeyError("duplicate key error")
-  
-  with pytest.raises(DuplicateKeyError, match="User with username 'testuser' already exists"):
-    user_repository.create_user(sample_user_create)
-
-
-def test_i_can_find_user_by_username(user_repository, mock_database):
-  """Test finding user by username."""
-  # Mock user document from database
-  user_doc = {
-      "_id": ObjectId(),
-      "username": "testuser",
-      "email": "test@example.com",
-      "hashed_password": "hashed_password_123",
-      "role": "user",
-      "is_active": True,
-      "created_at": datetime.utcnow(),
-      "updated_at": datetime.utcnow()
-  }
-  mock_database.users.find_one.return_value = user_doc
-  
-  result = user_repository.find_user_by_username("testuser")
-  
-  assert isinstance(result, UserInDB)
-  assert result.username == "testuser"
-  assert result.email == "test@example.com"
-  
-  mock_database.users.find_one.assert_called_once_with({"username": "testuser"})
-
-
-def test_i_cannot_find_nonexistent_user_by_username(user_repository, mock_database):
-  """Test finding nonexistent user by username returns None."""
-  mock_database.users.find_one.return_value = None
-  
-  result = user_repository.find_user_by_username("nonexistent")
-  
-  assert result is None
-  mock_database.users.find_one.assert_called_once_with({"username": "nonexistent"})
-
-
-def test_i_can_find_user_by_id(user_repository, mock_database):
-  """Test finding user by ID."""
-  user_id = ObjectId()
-  user_doc = {
-      "_id": user_id,
-      "username": "testuser",
-      "email": "test@example.com",
-      "hashed_password": "hashed_password_123",
-      "role": "user",
-      "is_active": True,
-      "created_at": datetime.utcnow(),
-      "updated_at": datetime.utcnow()
-  }
-  mock_database.users.find_one.return_value = user_doc
-  
-  result = user_repository.find_user_by_id(str(user_id))
-  
-  assert isinstance(result, UserInDB)
-  assert result.id == user_id
-  assert result.username == "testuser"
-  
-  mock_database.users.find_one.assert_called_once_with({"_id": user_id})
-
-
-def test_i_cannot_find_user_with_invalid_id(user_repository, mock_database):
-  """Test finding user with invalid ObjectId returns None."""
-  result = user_repository.find_user_by_id("invalid_id")
-  
-  assert result is None
-  # find_one should not be called with invalid ID
-  mock_database.users.find_one.assert_not_called()
-
-
-def test_i_cannot_find_nonexistent_user_by_id(user_repository, mock_database):
-  """Test finding nonexistent user by ID returns None."""
-  user_id = ObjectId()
-  mock_database.users.find_one.return_value = None
-  
-  result = user_repository.find_user_by_id(str(user_id))
-  
-  assert result is None
-  mock_database.users.find_one.assert_called_once_with({"_id": user_id})
-
-
-def test_i_can_find_user_by_email(user_repository, mock_database):
-  """Test finding user by email address."""
-  user_doc = {
-      "_id": ObjectId(),
-      "username": "testuser",
-      "email": "test@example.com",
-      "hashed_password": "hashed_password_123",
-      "role": "user",
-      "is_active": True,
-      "created_at": datetime.utcnow(),
-      "updated_at": datetime.utcnow()
-  }
-  mock_database.users.find_one.return_value = user_doc
-  
-  result = user_repository.find_user_by_email("test@example.com")
-  
-  assert isinstance(result, UserInDB)
-  assert result.email == "test@example.com"
-  
-  mock_database.users.find_one.assert_called_once_with({"email": "test@example.com"})
-
-
-def test_i_can_update_user(user_repository, mock_database, sample_user_update):
-  """Test updating user information."""
-  user_id = ObjectId()
-  
-  # Mock successful update
-  mock_update_result = Mock()
-  mock_update_result.matched_count = 1
-  mock_database.users.update_one.return_value = mock_update_result
-  
-  # Mock find_one for returning updated user
-  updated_user_doc = {
-      "_id": user_id,
-      "username": "testuser",
-      "email": "updated@example.com",
-      "hashed_password": "hashed_password_123",
-      "role": "admin",
-      "is_active": False,
-      "created_at": datetime.utcnow(),
-      "updated_at": datetime.utcnow()
-  }
-  mock_database.users.find_one.return_value = updated_user_doc
-  
-  result = user_repository.update_user(str(user_id), sample_user_update)
-  
-  assert isinstance(result, UserInDB)
-  assert result.email == "updated@example.com"
-  assert result.role == UserRole.ADMIN
-  assert result.is_active is False
-  
-  # Verify update_one was called with correct data
-  mock_database.users.update_one.assert_called_once()
-  call_args = mock_database.users.update_one.call_args
-  assert call_args[0][0] == {"_id": user_id}  # Filter
-  update_data = call_args[0][1]["$set"]  # Update data
-  assert update_data["email"] == "updated@example.com"
-  assert update_data["role"] == UserRole.ADMIN
-  assert update_data["is_active"] is False
-  assert "updated_at" in update_data
-
-
-def test_i_cannot_update_nonexistent_user(user_repository, mock_database, sample_user_update):
-  """Test updating nonexistent user returns None."""
-  user_id = ObjectId()
-  
-  # Mock no match found
-  mock_update_result = Mock()
-  mock_update_result.matched_count = 0
-  mock_database.users.update_one.return_value = mock_update_result
-  
-  result = user_repository.update_user(str(user_id), sample_user_update)
-  
-  assert result is None
-
-
-def test_i_cannot_update_user_with_invalid_id(user_repository, mock_database, sample_user_update):
-  """Test updating user with invalid ID returns None."""
-  result = user_repository.update_user("invalid_id", sample_user_update)
-  
-  assert result is None
-  # update_one should not be called with invalid ID
-  mock_database.users.update_one.assert_not_called()
-
-
-def test_i_can_delete_user(user_repository, mock_database):
-  """Test successful user deletion."""
-  user_id = ObjectId()
-  
-  # Mock successful deletion
-  mock_delete_result = Mock()
-  mock_delete_result.deleted_count = 1
-  mock_database.users.delete_one.return_value = mock_delete_result
-  
-  result = user_repository.delete_user(str(user_id))
-  
-  assert result is True
-  mock_database.users.delete_one.assert_called_once_with({"_id": user_id})
-
-
-def test_i_cannot_delete_nonexistent_user(user_repository, mock_database):
-  """Test deleting nonexistent user returns False."""
-  user_id = ObjectId()
-  
-  # Mock no deletion occurred
-  mock_delete_result = Mock()
-  mock_delete_result.deleted_count = 0
-  mock_database.users.delete_one.return_value = mock_delete_result
-  
-  result = user_repository.delete_user(str(user_id))
-  
-  assert result is False
-
-
-def test_i_cannot_delete_user_with_invalid_id(user_repository, mock_database):
-  """Test deleting user with invalid ID returns False."""
-  result = user_repository.delete_user("invalid_id")
-  
-  assert result is False
-  # delete_one should not be called with invalid ID
-  mock_database.users.delete_one.assert_not_called()
-
-
-def test_i_can_list_users(user_repository, mock_database):
-  """Test listing users with pagination."""
-  # Mock cursor with user documents
-  user_docs = [
-      {
-          "_id": ObjectId(),
-          "username": "user1",
-          "email": "user1@example.com",
-          "hashed_password": "hash1",
-          "role": "user",
-          "is_active": True,
-          "created_at": datetime.utcnow(),
-          "updated_at": datetime.utcnow()
-      },
-      {
-          "_id": ObjectId(),
-          "username": "user2",
-          "email": "user2@example.com",
-          "hashed_password": "hash2",
-          "role": "admin",
-          "is_active": False,
-          "created_at": datetime.utcnow(),
-          "updated_at": datetime.utcnow()
-      }
-  ]
-  
-  mock_cursor = Mock()
-  mock_cursor.__iter__.return_value = iter(user_docs)
-  mock_cursor.skip.return_value = mock_cursor
-  mock_cursor.limit.return_value = mock_cursor
-  mock_database.users.find.return_value = mock_cursor
-  
-  result = user_repository.list_users(skip=10, limit=50)
-  
-  assert len(result) == 2
-  assert all(isinstance(user, UserInDB) for user in result)
-  assert result[0].username == "user1"
-  assert result[1].username == "user2"
-  
-  mock_database.users.find.assert_called_once()
-  mock_cursor.skip.assert_called_once_with(10)
-  mock_cursor.limit.assert_called_once_with(50)
-
-
-def test_i_can_count_users(user_repository, mock_database):
-  """Test counting total users."""
-  mock_database.users.count_documents.return_value = 42
-  
-  result = user_repository.count_users()
-  
-  assert result == 42
-  mock_database.users.count_documents.assert_called_once_with({})
-
-
-def test_i_can_check_user_exists(user_repository, mock_database):
-  """Test checking if user exists by username."""
-  mock_database.users.count_documents.return_value = 1
-  
-  result = user_repository.user_exists("testuser")
-  
-  assert result is True
-  mock_database.users.count_documents.assert_called_once_with({"username": "testuser"})
-
-
-def test_i_can_check_user_does_not_exist(user_repository, mock_database):
-  """Test checking if user does not exist by username."""
-  mock_database.users.count_documents.return_value = 0
-  
-  result = user_repository.user_exists("nonexistent")
-  
-  assert result is False
-  mock_database.users.count_documents.assert_called_once_with({"username": "nonexistent"})
-
-
-def test_i_can_create_indexes_on_initialization(mock_database):
-  """Test that indexes are created when repository is initialized."""
-  # Mock create_index to not raise exception
-  mock_database.users.create_index.return_value = None
-  
-  repository = UserRepository(mock_database)
-  
-  mock_database.users.create_index.assert_called_once_with("username", unique=True)
-
-
-def test_i_can_handle_index_creation_error(mock_database):
-  """Test that index creation errors are handled gracefully."""
-  # Mock create_index to raise exception (index already exists)
-  mock_database.users.create_index.side_effect = Exception("Index already exists")
-  
-  # Should not raise exception
-  repository = UserRepository(mock_database)
-  
-  assert repository is not None
-  mock_database.users.create_index.assert_called_once_with("username", unique=True)
--- a/tests/utils/init.py
+++ b/tests/utils/init.py
--- a/tests/utils/test_document_matching.py
+++ b/tests/utils/test_document_matching.py
@@ -0,0 +1,89 @@
+import os
+from datetime import datetime
+
+import pytest
+from app.models.document import FileDocument, FileType
+from app.utils.document_matching import fuzzy_matching, subsequence_matching
+
+
+def get_doc(filename: str = None):
+  """Sample FileDocument data for testing."""
+  return FileDocument(
+    filename=f"{filename}",
+    filepath=f"/path/to/{filename}",
+    file_hash="a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456",
+    file_type=FileType(os.path.splitext(filename)[1].lstrip(".") or "txt"),
+    detected_at=datetime.now(),
+    file_size=1024,
+    mime_type="application/pdf"
+  )
+
+
+class TestFuzzyMatching:
+  def test_i_can_find_exact_match_with_fuzzy(self):
+    # Exact match should always pass
+    docs = [get_doc(filename="hello.txt")]
+    result = fuzzy_matching("hello.txt", docs)
+    assert len(result) == 1
+    assert result[0].filename == "hello.txt"
+  
+  def test_i_can_find_close_match_with_fuzzy(self):
+    # "helo.txt" should match "hello.txt" with high similarity
+    docs = [get_doc(filename="hello.txt")]
+    result = fuzzy_matching("helo.txt", docs, similarity_threshold=0.7)
+    assert len(result) == 1
+    assert result[0].filename == "hello.txt"
+  
+  def test_i_cannot_find_dissimilar_match_with_fuzzy(self):
+    # "world.txt" should not match "hello.txt"
+    docs = [get_doc(filename="hello.txt")]
+    result = fuzzy_matching("world.txt", docs, similarity_threshold=0.7)
+    assert len(result) == 0
+  
+  def test_i_can_sort_by_similarity_in_fuzzy(self):
+    # "helo.txt" is closer to "hello.txt" than "hllll.txt"
+    docs = [
+        get_doc(filename="hello.txt"),
+        get_doc(filename="hllll.txt"),
+    ]
+    result = fuzzy_matching("helo.txt", docs, similarity_threshold=0.5)
+    assert result[0].filename == "hello.txt"
+
+
+class TestSubsequenceMatching:
+  def test_i_can_match_subsequence_simple(self):
+    # "ifb" should match "ilFaitBeau.txt"
+    docs = [get_doc(filename="ilFaitBeau.txt")]
+    result = subsequence_matching("ifb", docs)
+    assert len(result) == 1
+    assert result[0].filename == "ilFaitBeau.txt"
+  
+  def test_i_cannot_match_wrong_order_subsequence(self):
+    # "fib" should not match "ilFaitBeau.txt" because the order is wrong
+    docs = [get_doc(filename="ilFaitBeau.txt")]
+    result = subsequence_matching("bfi", docs)
+    assert len(result) == 0
+  
+  def test_i_can_match_multiple_documents_subsequence(self):
+    # "ifb" should match both filenames, but "ilFaitBeau.txt" has a higher score
+    docs = [
+        get_doc(filename="ilFaitBeau.txt"),
+        get_doc(filename="information_base.txt"),
+    ]
+    result = subsequence_matching("ifb", docs)
+    assert len(result) == 2
+    assert result[0].filename == "ilFaitBeau.txt"
+    assert result[1].filename == "information_base.txt"
+  
+  def test_i_cannot_match_unrelated_subsequence(self):
+    # "xyz" should not match any file
+    docs = [get_doc(filename="ilFaitBeau.txt")]
+    result = subsequence_matching("xyz", docs)
+    assert len(result) == 0
+  
+  def test_i_can_handle_case_insensitivity_in_subsequence(self):
+    # Matching should be case-insensitive
+    docs = [get_doc(filename="HelloWorld.txt")]
+    result = subsequence_matching("hw", docs)
+    assert len(result) == 1
+    assert result[0].filename == "HelloWorld.txt"
--- a/tests/utils/test_security.py
+++ b/tests/utils/test_security.py
Author	SHA1	Message	Date
Kodjo Sossouvi	4de732b0ae	Implemented default pipeline	2025-09-26 22:08:39 +02:00
Kodjo Sossouvi	f1b551d243	Adding document service	2025-09-19 22:59:41 +02:00
Kodjo Sossouvi	e8b306ac4a	Fixed unit tests	2025-09-19 21:06:09 +02:00
Kodjo Sossouvi	c3ea80363f	Working on document repository	2025-09-18 22:53:51 +02:00
Kodjo Sossouvi	df86a3d998	Fixed docker config. Added services	2025-09-17 22:45:33 +02:00
Kodjo Sossouvi	da63f1b75b	Fixed unit tests	2025-09-17 21:24:03 +02:00