Files
MyDbEngine/README.md
2025-10-17 22:24:19 +02:00

5.3 KiB

DbEngine

A lightweight, git-inspired database engine for Python that maintains complete history of all modifications.

Overview

DbEngine is a personal implementation of a versioned database engine that stores snapshots of data changes over time. Each modification creates a new immutable snapshot, allowing you to track the complete history of your data.

Key Features

  • Version Control: Every change creates a new snapshot with a unique digest (SHA-256 hash)
  • History Tracking: Access any previous version of your data
  • Multi-tenant Support: Isolated data storage per tenant
  • Thread-safe: Built-in locking mechanism for concurrent access
  • Git-inspired Architecture: Objects are stored in a content-addressable format
  • Efficient Storage: Identical objects are stored only once

Architecture

The engine uses a file-based storage system with the following structure:

.mytools_db/
├── {tenant_id}/
│   ├── head                    # Points to latest version of each entry
│   └── objects/
│       └── {digest_prefix}/
│           └── {full_digest}   # Actual object data
└── refs/                       # Shared references

Installation

from db_engine import DbEngine

# Initialize with default root
db = DbEngine()

# Or specify custom root directory
db = DbEngine(root="/path/to/database")

Basic Usage

Initialize Database for a Tenant

tenant_id = "my_company"
db.init(tenant_id)

Save Data

# Save a complete object
user_id = "john_doe"
entry = "users"
data = {"name": "John", "age": 30}

digest = db.save(tenant_id, user_id, entry, data)

Load Data

# Load latest version
data = db.load(tenant_id, entry="users")

# Load specific version by digest
data = db.load(tenant_id, entry="users", digest="abc123...")

Work with Individual Records

# Add or update a single record
db.put(tenant_id, user_id, entry="users", key="john", value={"name": "John", "age": 30})

# Add or update multiple records at once
items = {
    "john": {"name": "John", "age": 30},
    "jane": {"name": "Jane", "age": 25}
}
db.put_many(tenant_id, user_id, entry="users", items=items)

# Get a specific record
user = db.get(tenant_id, entry="users", key="john")

# Get all records
all_users = db.get(tenant_id, entry="users")

Check Existence

if db.exists(tenant_id, entry="users"):
    print("Entry exists")

Access History

# Get history of an entry (returns list of digests)
history = db.history(tenant_id, entry="users", max_items=10)

# Load a previous version
old_data = db.load(tenant_id, entry="users", digest=history[1])

Metadata

Each snapshot automatically includes metadata:

  • __parent__: Digest of the previous version
  • __user__: User ID who made the change
  • __date__: Timestamp of the change (format: YYYYMMDD HH:MM:SS)

API Reference

Core Methods

init(tenant_id: str)

Initialize database structure for a tenant.

save(tenant_id: str, user_id: str, entry: str, obj: object) -> str

Save a complete snapshot. Returns the digest of the saved object.

load(tenant_id: str, entry: str, digest: str = None) -> object

Load a snapshot. If digest is None, loads the latest version.

put(tenant_id: str, user_id: str, entry: str, key: str, value: object) -> bool

Add or update a single record. Returns True if a new snapshot was created.

put_many(tenant_id: str, user_id: str, entry: str, items: list | dict) -> bool

Add or update multiple records. Returns True if a new snapshot was created.

get(tenant_id: str, entry: str, key: str = None, digest: str = None) -> object

Retrieve record(s). If key is None, returns all records as a list.

exists(tenant_id: str, entry: str) -> bool

Check if an entry exists.

history(tenant_id: str, entry: str, digest: str = None, max_items: int = 1000) -> list

Get the history chain of digests for an entry.

get_digest(tenant_id: str, entry: str) -> str

Get the current digest for an entry.

Usage Patterns

Pattern 1: Snapshot-based (using save())

Best for saving complete states of complex objects.

config = {"theme": "dark", "language": "en"}
db.save(tenant_id, user_id, "config", config)

Pattern 2: Record-based (using put() / put_many())

Best for managing collections of items incrementally.

db.put(tenant_id, user_id, "settings", "theme", "dark")
db.put(tenant_id, user_id, "settings", "language", "en")

Note: Don't mix these patterns for the same entry, as they use different data structures.

Thread Safety

DbEngine uses RLock internally, making it safe for multi-threaded applications.

Exceptions

  • DbException: Raised for database-related errors (missing entries, invalid parameters, etc.)

Performance Considerations

  • Objects are stored as JSON files
  • Identical objects (same SHA-256) are stored only once
  • History chains can become long; use max_items parameter to limit traversal
  • File system performance impacts overall speed

License

This is a personal implementation. Please check with the author for licensing terms.