Added Claude management

2025-12-21 15:44:05 +01:00
parent fe09352bed
commit b17fc450a2
7 changed files with 765 additions and 41 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,327 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Available Personas
+
+This project uses specialized personas for different types of work. Use these commands to switch modes:
+
+- **`/developer`** - Full development mode with validation workflow (options-first, wait for approval before coding)
+- **`/unit-tester`** - Specialized mode for writing comprehensive unit tests for existing code
+- **`/technical-writer`** - User documentation writing mode (README, guides, tutorials)
+- **`/reset`** - Return to default Claude Code mode
+
+Each persona has specific rules and workflows defined in `.claude/` directory. See the respective files for detailed guidelines.
+
+## Project Overview
+
+MyDbEngine is a lightweight, git-inspired versioned database engine for Python. It maintains complete history of all data modifications using immutable snapshots with SHA-256 content addressing. The project supports multi-tenant storage with thread-safe operations.
+
+### Quick Start Example
+
+```python
+from dbengine.dbengine import DbEngine
+
+# Initialize engine
+engine = DbEngine(root=".mytools_db")
+engine.init("tenant_1")
+
+# Pattern 1: Snapshot-based (complete state saves)
+engine.save("tenant_1", "user_1", "config", {"theme": "dark", "lang": "en"})
+data = engine.load("tenant_1", "config")
+
+# Pattern 2: Record-based (incremental updates)
+engine.put("tenant_1", "user_1", "users", "john", {"name": "John", "age": 30})
+engine.put("tenant_1", "user_1", "users", "jane", {"name": "Jane", "age": 25})
+all_users = engine.get("tenant_1", "users")  # Returns list of all users
+```
+
+## Development Commands
+
+### Testing
+```bash
+# Run all tests
+pytest
+
+# Run specific test file
+pytest tests/test_dbengine.py
+pytest tests/test_serializer.py
+
+# Run single test function
+pytest tests/test_dbengine.py::test_i_can_save_and_load
+```
+
+### Building and Packaging
+```bash
+# Build package
+python -m build
+
+# Clean build artifacts
+make clean
+
+# Clean package artifacts only
+make clean-package
+```
+
+### Installation
+```bash
+# Install in development mode with test dependencies
+pip install -e .[dev]
+```
+
+## Architecture
+
+### Core Components
+
+**DbEngine** (`src/dbengine/dbengine.py`)
+- Main database engine class using RLock for thread safety
+- Manages tenant-specific storage in `.mytools_db/{tenant_id}/` structure
+- Tracks latest versions via `head` file (JSON mapping entry names to digests)
+- Stores objects in content-addressable format: `objects/{digest_prefix}/{full_digest}`
+- Shared `refs/` directory for cross-tenant pickle-based references
+
+**Serializer** (`src/dbengine/serializer.py`)
+- Converts Python objects to/from JSON-compatible dictionaries
+- Handles circular references using object ID tracking
+- Supports custom serialization via handlers (see handlers.py)
+- Special tags: `__object__`, `__id__`, `__tuple__`, `__set__`, `__ref__`, `__enum__`
+- Objects can define `use_refs()` method to specify fields that should be pickled instead of JSON-serialized
+
+**Handlers** (`src/dbengine/handlers.py`)
+- Extensible handler system for custom type serialization
+- BaseHandler interface: `is_eligible_for()`, `tag()`, `serialize()`, `deserialize()`
+- Currently implements DateHandler for datetime.date objects
+- Use `handlers.register_handler()` to add custom handlers
+
+**Utils** (`src/dbengine/utils.py`)
+- Type checking utilities: `is_primitive()`, `is_dictionary()`, `is_list()`, etc.
+- Class introspection: `get_full_qualified_name()`, `importable_name()`, `get_class()`
+- Stream digest computation with SHA-256
+
+### Storage Architecture
+
+```
+.mytools_db/
+├── {tenant_id}/
+│   ├── head                           # JSON: {"entry_name": "latest_digest"}
+│   └── objects/
+│       └── {digest_prefix}/          # First 24 chars of digest
+│           └── {full_digest}         # JSON snapshot with metadata
+└── refs/                             # Shared pickled references
+    └── {digest_prefix}/
+        └── {full_digest}
+```
+
+### Metadata System
+
+Each snapshot includes automatic metadata fields:
+- `__parent__`: List containing digest of previous version (or `[None]` for first)
+- `__user_id__`: User ID who created the snapshot (was `__user__` in TAG constant)
+- `__date__`: ISO timestamp `YYYYMMDD HH:MM:SS %z`
+
+### Two Usage Patterns
+
+**Pattern 1: Snapshot-based (`save()`/`load()`)**
+- Save complete object states
+- Best for configuration objects or complete state snapshots
+- Direct control over what gets saved
+
+**Pattern 2: Record-based (`put()`/`put_many()`/`get()`)**
+- Incremental updates to dictionary-like collections
+- Automatically creates snapshots only when data changes
+- Returns `True/False` indicating if snapshot was created
+- Best for managing collections of items
+
+**Important**: Do not mix patterns for the same entry - they expect different data structures.
+
+### Common Pitfalls
+
+⚠️ **Mixing save() and put() on the same entry**
+- `save()` expects to store complete snapshots (any object)
+- `put()` expects dictionary-like structures with key-value pairs
+- Using both on the same entry will cause data structure conflicts
+
+⚠️ **Refs are shared across tenants**
+- Objects stored via `use_refs()` go to shared `refs/` directory
+- Not isolated per tenant - identical objects reused across all tenants
+- Good for deduplication, but be aware of cross-tenant sharing
+
+⚠️ **Parent digest is always a list**
+- `__parent__` field is stored as `[digest]` or `[None]`
+- Always access as `data[TAG_PARENT][0]`, not `data[TAG_PARENT]`
+- This allows for future support of multiple parents (merge scenarios)
+
+### Reference System
+
+Objects can opt into pickle-based storage for specific fields:
+1. Define `use_refs()` method returning set of field names
+2. Serializer stores those fields in shared `refs/` directory
+3. Reduces JSON snapshot size and enables cross-tenant deduplication
+4. Example: `DummyObjWithRef` in test_dbengine.py
+
+## Extension Points
+
+### Custom Type Handlers
+
+To serialize custom types that aren't handled by default serialization:
+
+**1. Create a handler class:**
+```python
+from dbengine.handlers import BaseHandler, TAG_SPECIAL
+
+class MyCustomHandler(BaseHandler):
+    def is_eligible_for(self, obj):
+        return isinstance(obj, MyCustomType)
+
+    def tag(self):
+        return "MyCustomType"
+
+    def serialize(self, obj) -> dict:
+        return {
+            TAG_SPECIAL: self.tag(),
+            "data": obj.to_dict()
+        }
+
+    def deserialize(self, data: dict) -> object:
+        return MyCustomType.from_dict(data["data"])
+```
+
+**2. Register the handler:**
+```python
+from dbengine.handlers import handlers
+
+handlers.register_handler(MyCustomHandler())
+```
+
+**When to use handlers:**
+- Complex types that need custom serialization logic
+- Types that can't be pickled reliably
+- Types requiring validation during deserialization
+- External library types (datetime.date example in handlers.py)
+
+### Using References (use_refs)
+
+For objects with large nested data structures that should be pickled instead of JSON-serialized:
+
+```python
+class MyDataObject:
+    def __init__(self, metadata, large_dataframe):
+        self.metadata = metadata
+        self.large_dataframe = large_dataframe  # pandas DataFrame, for example
+
+    @staticmethod
+    def use_refs():
+        """Return set of field names to pickle instead of JSON-serialize"""
+        return {"large_dataframe"}
+```
+
+**When to use refs:**
+- Large data structures (DataFrames, numpy arrays)
+- Objects that lose information in JSON conversion
+- Data shared across multiple snapshots/tenants (deduplication benefit)
+
+**Trade-offs:**
+- ✅ Smaller JSON snapshots
+- ✅ Cross-tenant deduplication
+- ❌ Less human-readable (binary pickle format)
+- ❌ Python version compatibility concerns with pickle
+
+## Testing Notes
+
+- Test fixtures use `DB_ENGINE_ROOT = "TestDBEngineRoot"` for isolation
+- Tests clean up temp directories using `shutil.rmtree()` in fixtures
+- Test classes like `DummyObj`, `DummyObjWithRef`, `DummyObjWithKey` demonstrate usage patterns
+- Thread safety is built-in via RLock but not explicitly tested
+
+## Key Design Decisions
+
+- **Immutability**: Snapshots never modified after creation (git-style)
+- **Content Addressing**: Identical objects stored only once (deduplication via SHA-256)
+- **Change Detection**: `put()` and `put_many()` skip saving if data unchanged
+- **Thread Safety**: All DbEngine operations protected by RLock
+- **No Dependencies**: Core engine has zero runtime dependencies (pytest only for dev)
+
+## Development Workflow and Guidelines
+
+### Development Process
+
+**Code must always be testable**. Before writing any code:
+
+1. **Explain available options first** - Present different approaches to solve the problem
+2. **Wait for validation** - Ensure mutual understanding of requirements before implementation
+3. **No code without approval** - Only proceed after explicit validation
+
+### Collaboration Style
+
+**Ask questions to clarify understanding or suggest alternative approaches:**
+- Ask questions **one at a time**
+- Wait for complete answer before asking the next question
+- Indicate progress: "Question 1/5" if multiple questions are needed
+- Never assume - always clarify ambiguities
+
+### Communication
+
+**Conversations**: French or English
+**Code, documentation, comments**: English only
+
+### Code Standards
+
+**Follow PEP 8** conventions strictly:
+- Variable and function names: `snake_case`
+- Explicit, descriptive naming
+- **No emojis in code**
+
+**Documentation**:
+- Use Google or NumPy docstring format
+- Document all public functions and classes
+- Include type hints where applicable
+
+### Dependency Management
+
+**When introducing new dependencies:**
+- List all external dependencies explicitly
+- Propose alternatives using Python standard library when possible
+- Explain why each dependency is needed
+
+### Unit Testing with pytest
+
+**Test naming patterns:**
+- Passing tests: `test_i_can_xxx` - Tests that should succeed
+- Failing tests: `test_i_cannot_xxx` - Edge cases that should raise errors/exceptions
+
+**Test structure:**
+- Use **functions**, not classes (unless inheritance is required)
+- Before writing tests, **list all planned tests with explanations**
+- Wait for validation before implementing tests
+
+**Example:**
+```python
+def test_i_can_save_and_load_object():
+    """Test that an object can be saved and loaded successfully."""
+    engine = DbEngine(root="test_db")
+    engine.init("tenant_1")
+    digest = engine.save("tenant_1", "user_1", "entry_1", {"key": "value"})
+    assert digest is not None
+
+def test_i_cannot_save_with_empty_tenant_id():
+    """Test that saving with empty tenant_id raises DbException."""
+    engine = DbEngine(root="test_db")
+    with pytest.raises(DbException):
+        engine.save("", "user_1", "entry_1", {"key": "value"})
+```
+
+### File Management
+
+**Always specify the full file path** when adding or modifying files:
+```
+✅ Modifying: src/dbengine/dbengine.py
+✅ Creating: tests/test_new_feature.py
+```
+
+### Error Handling
+
+**When errors occur:**
+1. **Explain the problem clearly first**
+2. **Do not propose a fix immediately**
+3. **Wait for validation** that the diagnosis is correct
+4. Only then propose solutions