Working on API

2025-09-25 22:58:31 +02:00
parent 48f5b009ae
commit 1f7ef200e7
16 changed files with 618 additions and 63 deletions
--- a/Readme.md
+++ b/Readme.md
@@ -13,7 +13,7 @@ architecture with Redis for task queuing and MongoDB for data persistence.
 - **Backend API**: FastAPI (Python 3.12)
 - **Task Processing**: Celery with Redis broker
 - **Document Processing**: EasyOCR, PyMuPDF, python-docx, pdfplumber
- **Database**: MongoDB
+- **Database**: MongoDB (pymongo)
 - **Frontend**: React
 - **Containerization**: Docker & Docker Compose
 - **File Monitoring**: Python watchdog library
@@ -109,16 +109,18 @@ MyDocManager/
 │   │   │   │   └── types.py                 # PyObjectId and other useful types
 │   │   │   ├── database/
 │   │   │   │   ├── __init__.py
-│   │   │   │   ├── connection.py            # MongoDB connection
+│   │   │   │   ├── connection.py            # MongoDB connection (pymongo)
 │   │   │   │   └── repositories/
 │   │   │   │       ├── __init__.py
-│   │   │   │       ├── user_repository.py      # User CRUD operations
-│   │   │   │       └── document_repository.py  # User CRUD operations
+│   │   │   │       ├── user_repository.py      # User CRUD operations (synchronous)
+│   │   │   │       ├── document_repository.py  # Document CRUD operations (synchronous)
+│   │   │   │       └── job_repository.py       # Job CRUD operations (synchronous)
 │   │   │   ├── services/
 │   │   │   │   ├── __init__.py
-│   │   │   │   ├── auth_service.py          # JWT & password logic
-│   │   │   │   ├── user_service.py          # User business logic
-│   │   │   │   ├── document_service.py      # Document business logic
+│   │   │   │   ├── auth_service.py          # JWT & password logic (synchronous)
+│   │   │   │   ├── user_service.py          # User business logic (synchronous)
+│   │   │   │   ├── document_service.py      # Document business logic (synchronous)
+│   │   │   │   ├── job_service.py           # Job processing logic (synchronous)
 │   │   │   │   └── init_service.py          # Admin creation at startup
 │   │   │   ├── api/
 │   │   │   │   ├── __init__.py
@@ -334,13 +336,20 @@ class ProcessingJob(BaseModel):
 - **Rationale**: MongoDB is not meant for large files, better performance. Files remain in the file system for easy
  access.

-### Implementation Order
+#### Repository and Services Implementation
+
+- **Choice**: Synchronous implementation using pymongo
+- **Rationale**: Full compatibility with Celery workers and simplified workflow
+- **Implementation**: All repositories and services operate synchronously for seamless integration
+
+### Implementation Status

 1. ✅ Pydantic models for MongoDB collections
-2. UNDER PROGRESS : Repository layer for data access (files + processing_jobs)
-3. TODO :  Celery tasks for document processing
-4. TODO : Watchdog file monitoring implementation
-5. TODO : FastAPI integration and startup coordination
+2. ✅ Repository layer for data access (files + processing_jobs + users + documents) - synchronous
+3. ✅ Service layer for business logic (auth, user, document, job) - synchronous
+4. ✅ Celery tasks for document processing
+5. ✅ Watchdog file monitoring implementation
+6. ✅ FastAPI integration and startup coordination

 ## Job Management Layer

@@ -350,7 +359,7 @@ The job management system follows the repository pattern for clean separation be

 #### JobRepository

-Handles direct MongoDB operations for processing jobs:
+Handles direct MongoDB operations for processing jobs using synchronous pymongo:

 **CRUD Operations:**
 - `create_job()` - Create new processing job with automatic `created_at` timestamp
@@ -367,7 +376,7 @@ Handles direct MongoDB operations for processing jobs:

 #### JobService

-Provides business logic layer with strict status transition validation:
+Provides synchronous business logic layer with strict status transition validation:

 **Status Transition Methods:**
 - `mark_job_as_started()` - PENDING → PROCESSING
@@ -381,7 +390,6 @@ Provides business logic layer with strict status transition validation:

 #### Custom Exceptions

-**JobNotFoundError**: Raised when job ID doesn't exist
 **InvalidStatusTransitionError**: Raised for invalid status transitions  
 **JobRepositoryError**: Raised for MongoDB operation failures

@@ -400,11 +408,17 @@ All other transitions are forbidden and will raise `InvalidStatusTransitionError
 ```
 src/file-processor/app/
 ├── database/repositories/
-│   └── job_repository.py           # JobRepository class
+│   ├── job_repository.py           # JobRepository class (synchronous)
+│   ├── user_repository.py          # UserRepository class (synchronous)
+│   ├── document_repository.py      # DocumentRepository class (synchronous)
+│   └── file_repository.py          # FileRepository class (synchronous)
 ├── services/  
-│   └── job_service.py             # JobService class
+│   ├── job_service.py              # JobService class (synchronous)
+│   ├── auth_service.py             # AuthService class (synchronous)
+│   ├── user_service.py             # UserService class (synchronous)
+│   └── document_service.py         # DocumentService class (synchronous)
 └── exceptions/
-    └── job_exceptions.py          # Custom exceptions
+    └── job_exceptions.py           # Custom exceptions
 ```

 ### Processing Pipeline Features
@@ -414,6 +428,7 @@ src/file-processor/app/
 - **Status Tracking**: Real-time processing status via `processing_jobs` collection
 - **Extensible Metadata**: Flexible metadata storage per file type
 - **Multiple Extraction Methods**: Support for direct text, OCR, and hybrid approaches
+- **Synchronous Operations**: All database operations use pymongo for Celery compatibility

 ## Key Implementation Notes

@@ -436,6 +451,7 @@ src/file-processor/app/
 - **Package Manager**: pip (standard)
 - **External Dependencies**: Listed in each service's requirements.txt
 - **Standard Library First**: Prefer standard library when possible
+- **Database Driver**: pymongo for synchronous MongoDB operations

 ### Testing Strategy

@@ -460,6 +476,7 @@ src/file-processor/app/
 12. **Content in Files Collection**: Extracted content stored with file metadata
 13. **Direct Task Dispatch**: File watcher directly creates Celery tasks
 14. **SHA256 Duplicate Detection**: Prevents reprocessing identical files
+15. **Synchronous Implementation**: All repositories and services use pymongo for Celery compatibility

 ### Development Process Requirements

@@ -470,12 +487,13 @@ src/file-processor/app/

 ### Next Implementation Steps

-1. **IN PROGRESS**: Implement file processing pipeline =>
-    1. Create Pydantic models for files and processing_jobs collections
-    2. Implement repository layer for file and processing job data access
-    3. Create Celery tasks for document processing (.txt, .pdf, .docx)
-    4. Implement Watchdog file monitoring with dedicated observer
-    5. Integrate file watcher with FastAPI startup
+1. **TODO**: Complete file processing pipeline =>
+    1. ✅ Create Pydantic models for files and processing_jobs collections
+    2. ✅ Implement repository layer for file and processing job data access (synchronous)
+    3. ✅ Implement service layer for business logic (synchronous)
+    4. ✅ Create Celery tasks for document processing (.txt, .pdf, .docx)
+    5. ✅ Implement Watchdog file monitoring with dedicated observer
+    6. ✅ Integrate file watcher with FastAPI startup
 2. Create protected API routes for user management
 3. Build React monitoring interface with authentication

@@ -566,4 +584,4 @@ docker-compose up --scale worker=3
 - **file-processor**: Hot-reload enabled via `--reload` flag
    - Code changes in `src/file-processor/app/` automatically restart FastAPI
 - **worker**: No hot-reload (manual restart required for stability)
-    - Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
+    - Code changes in `src/worker/tasks/` require: `docker-compose restart worker`