187 lines
5.3 KiB
Markdown
187 lines
5.3 KiB
Markdown
# DbEngine
|
|
|
|
A lightweight, git-inspired database engine for Python that maintains complete history of all modifications.
|
|
|
|
## Overview
|
|
|
|
DbEngine is a personal implementation of a versioned database engine that stores snapshots of data changes over time. Each modification creates a new immutable snapshot, allowing you to track the complete history of your data.
|
|
|
|
## Key Features
|
|
|
|
- **Version Control**: Every change creates a new snapshot with a unique digest (SHA-256 hash)
|
|
- **History Tracking**: Access any previous version of your data
|
|
- **Multi-tenant Support**: Isolated data storage per tenant
|
|
- **Thread-safe**: Built-in locking mechanism for concurrent access
|
|
- **Git-inspired Architecture**: Objects are stored in a content-addressable format
|
|
- **Efficient Storage**: Identical objects are stored only once
|
|
|
|
## Architecture
|
|
|
|
The engine uses a file-based storage system with the following structure:
|
|
|
|
```
|
|
.mytools_db/
|
|
├── {tenant_id}/
|
|
│ ├── head # Points to latest version of each entry
|
|
│ └── objects/
|
|
│ └── {digest_prefix}/
|
|
│ └── {full_digest} # Actual object data
|
|
└── refs/ # Shared references
|
|
```
|
|
|
|
## Installation
|
|
|
|
```python
|
|
from db_engine import DbEngine
|
|
|
|
# Initialize with default root
|
|
db = DbEngine()
|
|
|
|
# Or specify custom root directory
|
|
db = DbEngine(root="/path/to/database")
|
|
```
|
|
|
|
## Basic Usage
|
|
|
|
### Initialize Database for a Tenant
|
|
|
|
```python
|
|
tenant_id = "my_company"
|
|
db.init(tenant_id)
|
|
```
|
|
|
|
### Save Data
|
|
|
|
```python
|
|
# Save a complete object
|
|
user_id = "john_doe"
|
|
entry = "users"
|
|
data = {"name": "John", "age": 30}
|
|
|
|
digest = db.save(tenant_id, user_id, entry, data)
|
|
```
|
|
|
|
### Load Data
|
|
|
|
```python
|
|
# Load latest version
|
|
data = db.load(tenant_id, entry="users")
|
|
|
|
# Load specific version by digest
|
|
data = db.load(tenant_id, entry="users", digest="abc123...")
|
|
```
|
|
|
|
### Work with Individual Records
|
|
|
|
```python
|
|
# Add or update a single record
|
|
db.put(tenant_id, user_id, entry="users", key="john", value={"name": "John", "age": 30})
|
|
|
|
# Add or update multiple records at once
|
|
items = {
|
|
"john": {"name": "John", "age": 30},
|
|
"jane": {"name": "Jane", "age": 25}
|
|
}
|
|
db.put_many(tenant_id, user_id, entry="users", items=items)
|
|
|
|
# Get a specific record
|
|
user = db.get(tenant_id, entry="users", key="john")
|
|
|
|
# Get all records
|
|
all_users = db.get(tenant_id, entry="users")
|
|
```
|
|
|
|
### Check Existence
|
|
|
|
```python
|
|
if db.exists(tenant_id, entry="users"):
|
|
print("Entry exists")
|
|
```
|
|
|
|
### Access History
|
|
|
|
```python
|
|
# Get history of an entry (returns list of digests)
|
|
history = db.history(tenant_id, entry="users", max_items=10)
|
|
|
|
# Load a previous version
|
|
old_data = db.load(tenant_id, entry="users", digest=history[1])
|
|
```
|
|
|
|
## Metadata
|
|
|
|
Each snapshot automatically includes metadata:
|
|
|
|
- `__parent__`: Digest of the previous version
|
|
- `__user__`: User ID who made the change
|
|
- `__date__`: Timestamp of the change (format: `YYYYMMDD HH:MM:SS`)
|
|
|
|
## API Reference
|
|
|
|
### Core Methods
|
|
|
|
#### `init(tenant_id: str)`
|
|
Initialize database structure for a tenant.
|
|
|
|
#### `save(tenant_id: str, user_id: str, entry: str, obj: object) -> str`
|
|
Save a complete snapshot. Returns the digest of the saved object.
|
|
|
|
#### `load(tenant_id: str, entry: str, digest: str = None) -> object`
|
|
Load a snapshot. If digest is None, loads the latest version.
|
|
|
|
#### `put(tenant_id: str, user_id: str, entry: str, key: str, value: object) -> bool`
|
|
Add or update a single record. Returns True if a new snapshot was created.
|
|
|
|
#### `put_many(tenant_id: str, user_id: str, entry: str, items: list | dict) -> bool`
|
|
Add or update multiple records. Returns True if a new snapshot was created.
|
|
|
|
#### `get(tenant_id: str, entry: str, key: str = None, digest: str = None) -> object`
|
|
Retrieve record(s). If key is None, returns all records as a list.
|
|
|
|
#### `exists(tenant_id: str, entry: str) -> bool`
|
|
Check if an entry exists.
|
|
|
|
#### `history(tenant_id: str, entry: str, digest: str = None, max_items: int = 1000) -> list`
|
|
Get the history chain of digests for an entry.
|
|
|
|
#### `get_digest(tenant_id: str, entry: str) -> str`
|
|
Get the current digest for an entry.
|
|
|
|
## Usage Patterns
|
|
|
|
### Pattern 1: Snapshot-based (using `save()`)
|
|
Best for saving complete states of complex objects.
|
|
|
|
```python
|
|
config = {"theme": "dark", "language": "en"}
|
|
db.save(tenant_id, user_id, "config", config)
|
|
```
|
|
|
|
### Pattern 2: Record-based (using `put()` / `put_many()`)
|
|
Best for managing collections of items incrementally.
|
|
|
|
```python
|
|
db.put(tenant_id, user_id, "settings", "theme", "dark")
|
|
db.put(tenant_id, user_id, "settings", "language", "en")
|
|
```
|
|
|
|
**Note**: Don't mix these patterns for the same entry, as they use different data structures.
|
|
|
|
## Thread Safety
|
|
|
|
DbEngine uses `RLock` internally, making it safe for multi-threaded applications.
|
|
|
|
## Exceptions
|
|
|
|
- `DbException`: Raised for database-related errors (missing entries, invalid parameters, etc.)
|
|
|
|
## Performance Considerations
|
|
|
|
- Objects are stored as JSON files
|
|
- Identical objects (same SHA-256) are stored only once
|
|
- History chains can become long; use `max_items` parameter to limit traversal
|
|
- File system performance impacts overall speed
|
|
|
|
## License
|
|
|
|
This is a personal implementation. Please check with the author for licensing terms. |