Working on API
This commit is contained in:
68
Readme.md
68
Readme.md
@@ -13,7 +13,7 @@ architecture with Redis for task queuing and MongoDB for data persistence.
|
||||
- **Backend API**: FastAPI (Python 3.12)
|
||||
- **Task Processing**: Celery with Redis broker
|
||||
- **Document Processing**: EasyOCR, PyMuPDF, python-docx, pdfplumber
|
||||
- **Database**: MongoDB
|
||||
- **Database**: MongoDB (pymongo)
|
||||
- **Frontend**: React
|
||||
- **Containerization**: Docker & Docker Compose
|
||||
- **File Monitoring**: Python watchdog library
|
||||
@@ -109,16 +109,18 @@ MyDocManager/
|
||||
│ │ │ │ └── types.py # PyObjectId and other useful types
|
||||
│ │ │ ├── database/
|
||||
│ │ │ │ ├── __init__.py
|
||||
│ │ │ │ ├── connection.py # MongoDB connection
|
||||
│ │ │ │ ├── connection.py # MongoDB connection (pymongo)
|
||||
│ │ │ │ └── repositories/
|
||||
│ │ │ │ ├── __init__.py
|
||||
│ │ │ │ ├── user_repository.py # User CRUD operations
|
||||
│ │ │ │ └── document_repository.py # User CRUD operations
|
||||
│ │ │ │ ├── user_repository.py # User CRUD operations (synchronous)
|
||||
│ │ │ │ ├── document_repository.py # Document CRUD operations (synchronous)
|
||||
│ │ │ │ └── job_repository.py # Job CRUD operations (synchronous)
|
||||
│ │ │ ├── services/
|
||||
│ │ │ │ ├── __init__.py
|
||||
│ │ │ │ ├── auth_service.py # JWT & password logic
|
||||
│ │ │ │ ├── user_service.py # User business logic
|
||||
│ │ │ │ ├── document_service.py # Document business logic
|
||||
│ │ │ │ ├── auth_service.py # JWT & password logic (synchronous)
|
||||
│ │ │ │ ├── user_service.py # User business logic (synchronous)
|
||||
│ │ │ │ ├── document_service.py # Document business logic (synchronous)
|
||||
│ │ │ │ ├── job_service.py # Job processing logic (synchronous)
|
||||
│ │ │ │ └── init_service.py # Admin creation at startup
|
||||
│ │ │ ├── api/
|
||||
│ │ │ │ ├── __init__.py
|
||||
@@ -334,13 +336,20 @@ class ProcessingJob(BaseModel):
|
||||
- **Rationale**: MongoDB is not meant for large files, better performance. Files remain in the file system for easy
|
||||
access.
|
||||
|
||||
### Implementation Order
|
||||
#### Repository and Services Implementation
|
||||
|
||||
- **Choice**: Synchronous implementation using pymongo
|
||||
- **Rationale**: Full compatibility with Celery workers and simplified workflow
|
||||
- **Implementation**: All repositories and services operate synchronously for seamless integration
|
||||
|
||||
### Implementation Status
|
||||
|
||||
1. ✅ Pydantic models for MongoDB collections
|
||||
2. UNDER PROGRESS : Repository layer for data access (files + processing_jobs)
|
||||
3. TODO : Celery tasks for document processing
|
||||
4. TODO : Watchdog file monitoring implementation
|
||||
5. TODO : FastAPI integration and startup coordination
|
||||
2. ✅ Repository layer for data access (files + processing_jobs + users + documents) - synchronous
|
||||
3. ✅ Service layer for business logic (auth, user, document, job) - synchronous
|
||||
4. ✅ Celery tasks for document processing
|
||||
5. ✅ Watchdog file monitoring implementation
|
||||
6. ✅ FastAPI integration and startup coordination
|
||||
|
||||
## Job Management Layer
|
||||
|
||||
@@ -350,7 +359,7 @@ The job management system follows the repository pattern for clean separation be
|
||||
|
||||
#### JobRepository
|
||||
|
||||
Handles direct MongoDB operations for processing jobs:
|
||||
Handles direct MongoDB operations for processing jobs using synchronous pymongo:
|
||||
|
||||
**CRUD Operations:**
|
||||
- `create_job()` - Create new processing job with automatic `created_at` timestamp
|
||||
@@ -367,7 +376,7 @@ Handles direct MongoDB operations for processing jobs:
|
||||
|
||||
#### JobService
|
||||
|
||||
Provides business logic layer with strict status transition validation:
|
||||
Provides synchronous business logic layer with strict status transition validation:
|
||||
|
||||
**Status Transition Methods:**
|
||||
- `mark_job_as_started()` - PENDING → PROCESSING
|
||||
@@ -381,7 +390,6 @@ Provides business logic layer with strict status transition validation:
|
||||
|
||||
#### Custom Exceptions
|
||||
|
||||
**JobNotFoundError**: Raised when job ID doesn't exist
|
||||
**InvalidStatusTransitionError**: Raised for invalid status transitions
|
||||
**JobRepositoryError**: Raised for MongoDB operation failures
|
||||
|
||||
@@ -400,11 +408,17 @@ All other transitions are forbidden and will raise `InvalidStatusTransitionError
|
||||
```
|
||||
src/file-processor/app/
|
||||
├── database/repositories/
|
||||
│ └── job_repository.py # JobRepository class
|
||||
│ ├── job_repository.py # JobRepository class (synchronous)
|
||||
│ ├── user_repository.py # UserRepository class (synchronous)
|
||||
│ ├── document_repository.py # DocumentRepository class (synchronous)
|
||||
│ └── file_repository.py # FileRepository class (synchronous)
|
||||
├── services/
|
||||
│ └── job_service.py # JobService class
|
||||
│ ├── job_service.py # JobService class (synchronous)
|
||||
│ ├── auth_service.py # AuthService class (synchronous)
|
||||
│ ├── user_service.py # UserService class (synchronous)
|
||||
│ └── document_service.py # DocumentService class (synchronous)
|
||||
└── exceptions/
|
||||
└── job_exceptions.py # Custom exceptions
|
||||
└── job_exceptions.py # Custom exceptions
|
||||
```
|
||||
|
||||
### Processing Pipeline Features
|
||||
@@ -414,6 +428,7 @@ src/file-processor/app/
|
||||
- **Status Tracking**: Real-time processing status via `processing_jobs` collection
|
||||
- **Extensible Metadata**: Flexible metadata storage per file type
|
||||
- **Multiple Extraction Methods**: Support for direct text, OCR, and hybrid approaches
|
||||
- **Synchronous Operations**: All database operations use pymongo for Celery compatibility
|
||||
|
||||
## Key Implementation Notes
|
||||
|
||||
@@ -436,6 +451,7 @@ src/file-processor/app/
|
||||
- **Package Manager**: pip (standard)
|
||||
- **External Dependencies**: Listed in each service's requirements.txt
|
||||
- **Standard Library First**: Prefer standard library when possible
|
||||
- **Database Driver**: pymongo for synchronous MongoDB operations
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
@@ -460,6 +476,7 @@ src/file-processor/app/
|
||||
12. **Content in Files Collection**: Extracted content stored with file metadata
|
||||
13. **Direct Task Dispatch**: File watcher directly creates Celery tasks
|
||||
14. **SHA256 Duplicate Detection**: Prevents reprocessing identical files
|
||||
15. **Synchronous Implementation**: All repositories and services use pymongo for Celery compatibility
|
||||
|
||||
### Development Process Requirements
|
||||
|
||||
@@ -470,12 +487,13 @@ src/file-processor/app/
|
||||
|
||||
### Next Implementation Steps
|
||||
|
||||
1. **IN PROGRESS**: Implement file processing pipeline =>
|
||||
1. Create Pydantic models for files and processing_jobs collections
|
||||
2. Implement repository layer for file and processing job data access
|
||||
3. Create Celery tasks for document processing (.txt, .pdf, .docx)
|
||||
4. Implement Watchdog file monitoring with dedicated observer
|
||||
5. Integrate file watcher with FastAPI startup
|
||||
1. **TODO**: Complete file processing pipeline =>
|
||||
1. ✅ Create Pydantic models for files and processing_jobs collections
|
||||
2. ✅ Implement repository layer for file and processing job data access (synchronous)
|
||||
3. ✅ Implement service layer for business logic (synchronous)
|
||||
4. ✅ Create Celery tasks for document processing (.txt, .pdf, .docx)
|
||||
5. ✅ Implement Watchdog file monitoring with dedicated observer
|
||||
6. ✅ Integrate file watcher with FastAPI startup
|
||||
2. Create protected API routes for user management
|
||||
3. Build React monitoring interface with authentication
|
||||
|
||||
@@ -566,4 +584,4 @@ docker-compose up --scale worker=3
|
||||
- **file-processor**: Hot-reload enabled via `--reload` flag
|
||||
- Code changes in `src/file-processor/app/` automatically restart FastAPI
|
||||
- **worker**: No hot-reload (manual restart required for stability)
|
||||
- Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
|
||||
- Code changes in `src/worker/tasks/` require: `docker-compose restart worker`
|
||||
Reference in New Issue
Block a user