Introducing columns formulas

2026-02-13 21:38:00 +01:00
parent 0df78c0513
commit e8443f07f9
29 changed files with 3889 additions and 15 deletions
@@ -0,0 +1,365 @@
+# DataGrid Formulas
+
+## Overview
+
+The DataGrid formula system adds computed columns to the DataGrid. A formula column applies a single expression to every
+row, producing derived values from existing data — within the same table or across tables.
+
+The system is designed for:
+
+- **Column-level formulas**: one formula per column, applied to all rows
+- **Cross-table references**: direct syntax to reference columns from other tables
+- **Reactive recalculation**: dirty flag propagation with page-aware computation
+- **Cell-level overrides** (planned): individual cells can override the column formula
+
+## Formula Language
+
+### Basic Syntax
+
+A formula is an expression that references columns with `{ColumnName}` and produces a value for each row:
+
+```
+{Price} * {Quantity}
+```
+
+References use curly braces `{}` to distinguish column names from keywords and functions. Column names are matched by ID
+or title.
+
+### Operators
+
+#### Arithmetic
+
+| Operator | Description    | Example                |
+|----------|----------------|------------------------|
+| `+`      | Addition       | `{Price} + {Tax}`      |
+| `-`      | Subtraction    | `{Total} - {Discount}` |
+| `*`      | Multiplication | `{Price} * {Quantity}` |
+| `/`      | Division       | `{Total} / {Count}`    |
+| `%`      | Modulo         | `{Value} % 2`          |
+| `^`      | Power          | `{Base} ^ 2`           |
+
+#### Comparison
+
+| Operator     | Description        | Example                         |
+|--------------|--------------------|---------------------------------|
+| `==`         | Equal              | `{Status} == "active"`          |
+| `!=`         | Not equal          | `{Status} != "deleted"`         |
+| `>`          | Greater than       | `{Price} > 100`                 |
+| `<`          | Less than          | `{Stock} < 10`                  |
+| `>=`         | Greater or equal   | `{Score} >= 80`                 |
+| `<=`         | Less or equal      | `{Age} <= 18`                   |
+| `contains`   | String contains    | `{Name} contains "Corp"`        |
+| `startswith` | String starts with | `{Code} startswith "ERR"`       |
+| `endswith`   | String ends with   | `{File} endswith ".csv"`        |
+| `in`         | Value in list      | `{Status} in ["active", "new"]` |
+| `between`    | Value in range     | `{Age} between 18 and 65`       |
+| `isempty`    | Value is empty     | `{Notes} isempty`               |
+| `isnotempty` | Value is not empty | `{Email} isnotempty`            |
+| `isnan`      | Value is NaN       | `{Score} isnan`                 |
+
+#### Logical
+
+| Operator | Description | Example                               |
+|----------|-------------|---------------------------------------|
+| `and`    | Logical AND | `{Age} > 18 and {Status} == "active"` |
+| `or`     | Logical OR  | `{Type} == "A" or {Type} == "B"`      |
+| `not`    | Negation    | `not {Status} == "deleted"`           |
+
+Parentheses control precedence: `({Type} == "A" or {Type} == "B") and {Active} == True`
+
+### Conditions (suffix-if)
+
+Conditions use a **suffix-if** syntax: the result expression comes first, then the condition. This keeps the focus on
+the output, not the branching logic.
+
+#### Simple condition (no else — result is None when false)
+
+```
+{Price} * 0.8 if {Country} == "FR"
+```
+
+#### With else
+
+```
+{Price} * 0.8 if {Country} == "FR" else {Price}
+```
+
+#### Chained conditions
+
+```
+{Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price}
+```
+
+#### With logical operators
+
+```
+{Price} * 0.8 if {Country} == "FR" and {Quantity} > 10 else {Price}
+```
+
+#### With grouping
+
+```
+{Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10
+```
+
+### Functions
+
+#### Math
+
+| Function          | Description           | Example                       |
+|-------------------|-----------------------|-------------------------------|
+| `round(expr, n)`  | Round to n decimals   | `round({Price} * 1.2, 2)`     |
+| `abs(expr)`       | Absolute value        | `abs({Balance})`              |
+| `min(expr, expr)` | Minimum of two values | `min({Price}, {MaxPrice})`    |
+| `max(expr, expr)` | Maximum of two values | `max({Score}, 0)`             |
+| `sum(expr, ...)`  | Sum of values         | `sum({Q1}, {Q2}, {Q3}, {Q4})` |
+| `avg(expr, ...)`  | Average of values     | `avg({Q1}, {Q2}, {Q3}, {Q4})` |
+
+#### Text
+
+| Function            | Description         | Example                        |
+|---------------------|---------------------|--------------------------------|
+| `upper(expr)`       | Uppercase           | `upper({Name})`                |
+| `lower(expr)`       | Lowercase           | `lower({Email})`               |
+| `len(expr)`         | String length       | `len({Description})`           |
+| `concat(expr, ...)` | Concatenate strings | `concat({First}, " ", {Last})` |
+| `trim(expr)`        | Remove whitespace   | `trim({Input})`                |
+| `left(expr, n)`     | First n characters  | `left({Code}, 3)`              |
+| `right(expr, n)`    | Last n characters   | `right({Phone}, 4)`            |
+
+#### Date
+
+| Function               | Description        | Example                        |
+|------------------------|--------------------|--------------------------------|
+| `year(expr)`           | Extract year       | `year({CreatedAt})`            |
+| `month(expr)`          | Extract month      | `month({CreatedAt})`           |
+| `day(expr)`            | Extract day        | `day({CreatedAt})`             |
+| `today()`              | Current date       | `datediff({DueDate}, today())` |
+| `datediff(expr, expr)` | Difference in days | `datediff({End}, {Start})`     |
+
+#### Aggregation (for cross-table contexts)
+
+| Function      | Description  | Example                                             |
+|---------------|--------------|-----------------------------------------------------|
+| `sum(expr)`   | Sum values   | `sum({Orders.Amount WHERE Orders.ClientId = Id})`   |
+| `count(expr)` | Count values | `count({Orders.Id WHERE Orders.ClientId = Id})`     |
+| `avg(expr)`   | Average      | `avg({Reviews.Score WHERE Reviews.ProductId = Id})` |
+| `min(expr)`   | Minimum      | `min({Bids.Price WHERE Bids.ItemId = Id})`          |
+| `max(expr)`   | Maximum      | `max({Bids.Price WHERE Bids.ItemId = Id})`          |
+
+## Cross-Table References
+
+### Direct Reference
+
+Reference a column from another table using `{TableName.ColumnName}`:
+
+```
+{Products.Price} * {Quantity}
+```
+
+### Join Resolution (implicit)
+
+When referencing another table without a WHERE clause, the join is resolved automatically:
+
+1. **By `id` column**: if both tables have a column named `id`, rows are matched on equal `id` values
+2. **By row index**: if no `id` column exists in both tables, rows are matched by their internal row index (stable
+   across sort/filter)
+
+### Explicit Join (WHERE clause)
+
+For explicit control over which row of the other table to use:
+
+```
+{Products.Price WHERE Products.Code = ProductCode} * {Quantity}
+```
+
+Inside the WHERE clause:
+
+- `Products.Code` refers to a column in the referenced table
+- `ProductCode` (no `Table.` prefix) refers to a column in the current table
+
+### Aggregation with Cross-Table
+
+When a cross-table reference matches multiple rows, use an aggregation function:
+
+```
+sum({OrderLines.Amount WHERE OrderLines.OrderId = Id})
+```
+
+Without aggregation, a multi-row match returns the first matching value.
+
+## Calculation Engine
+
+### Dependency Graph (DAG)
+
+The formula system maintains a **Directed Acyclic Graph** of dependencies between columns:
+
+- **Nodes**: each formula column is a node, identified by `table_name.column_id`
+- **Edges**: if column A's formula references column B, an edge B → A exists ("A depends on B")
+- Both directions are tracked:
+    - **Precedents**: columns that a formula reads from
+    - **Dependents**: columns that need recalculation when this column changes
+
+Cross-table references create edges that span DataGrid instances, managed at the `DataGridsManager` level.
+
+### Dirty Flag Propagation
+
+When a source column's data changes:
+
+1. The source column is marked **dirty**
+2. All direct dependents are marked dirty
+3. Propagation continues recursively through the DAG
+4. Each dirty column maintains a **dirty row set**: the specific row indices that need recalculation
+
+This propagation is **immediate** (fast — only flag marking, no computation).
+
+### Recalculation Strategy (Hybrid)
+
+Actual computation is **deferred to rendering time**:
+
+1. On value change → dirty flags propagate instantly through the DAG
+2. On page render (`mk_body_content_page`) → only dirty rows within the visible page (up to 1000 rows) are recalculated
+3. Off-screen pages remain dirty until scrolled into view
+4. Calculation follows **topological order** of the DAG to ensure precedents are computed before dependents
+
+### Cycle Detection
+
+Before adding a formula, the engine checks for cycles in the DAG using Kahn's algorithm during topological sort. If a
+cycle is detected:
+
+- The formula is **rejected**
+- The editor displays an error identifying the circular dependency chain
+- The previous formula (if any) remains unchanged
+
+### Caching
+
+Each formula column caches its computed values:
+
+- Results are stored in `ns_fast_access[col_id]` alongside raw data columns
+- The dirty row set tracks which cached values are stale
+- Non-dirty rows return their cached value without re-evaluation
+- Cache is invalidated per-row when source data changes
+
+## Evaluation
+
+### Row-by-Row Execution
+
+Formulas are evaluated **row-by-row** within the page being rendered. For each row:
+
+1. Resolve column references `{ColumnName}` to the cell value at the current row index
+2. Resolve cross-table references `{Table.Column}` via the join mechanism
+3. Evaluate the expression with resolved values
+4. Store the result in the cache (`ns_fast_access`)
+
+### Parser
+
+The formula language uses a **custom grammar** parsed with Lark (consistent with the formatting DSL). The parser:
+
+1. Tokenizes the formula string
+2. Builds an AST (Abstract Syntax Tree)
+3. Transforms the AST into an evaluable representation
+4. Extracts column references for dependency graph registration
+
+### Error Handling
+
+| Error Type            | Behavior                                              |
+|-----------------------|-------------------------------------------------------|
+| Syntax error          | Editor highlights the error, formula not saved        |
+| Unknown column        | Editor highlights, autocompletion suggests fixes      |
+| Type mismatch         | Cell displays error indicator, other cells unaffected |
+| Division by zero      | Cell displays `#DIV/0!` or None                       |
+| Circular dependency   | Formula rejected, editor shows cycle chain            |
+| Cross-table not found | Editor highlights unknown table name                  |
+| No join match         | Cell displays None                                    |
+
+## User Interface
+
+### Creating a Formula Column
+
+Formula columns are created and edited through the **DataGridColumnsManager**:
+
+1. User opens the Columns Manager panel
+2. Adds a new column or edits an existing one
+3. Selects column type **"Formula"**
+4. A **DslEditor** (CodeMirror 5) opens for formula input
+5. The editor provides:
+    - **Syntax highlighting**: keywords, column references, functions, operators
+    - **Autocompletion**: column names (current table and other tables), function names, table names
+    - **Validation**: real-time syntax checking and dependency cycle detection
+    - **Error markers**: inline error indicators with descriptions
+
+### Formula Column Properties
+
+A formula column extends `DataGridColumnState` with:
+
+| Property                                                                  | Type          | Description                                    |
+|---------------------------------------------------------------------------|---------------|------------------------------------------------|
+| `formula`                                                                 | `str` or None | The formula expression (None for data columns) |
+| `col_type`                                                                | `ColumnType`  | Set to `ColumnType.Formula`                    |
+| Other properties (`title`, `visible`, `width`, `format`) remain unchanged |
+
+Formula columns are **read-only** in the grid body — cell values are computed, not editable. Formatting rules from the
+formatting DSL apply to formula columns like any other column.
+
+## Integration Points
+
+| Component                | Role                                                     |
+|--------------------------|----------------------------------------------------------|
+| `DataGridColumnState`    | Stores `formula` field and `ColumnType.Formula` type     |
+| `DatagridStore`          | `ns_fast_access` caches formula results as numpy arrays  |
+| `DataGridColumnsManager` | UI for creating/editing formula columns                  |
+| `DataGridsManager`       | Hosts the global dependency DAG across all tables        |
+| `DslEditor`              | CodeMirror 5 editor with highlighting and autocompletion |
+| `FormattingEngine`       | Applies formatting rules AFTER formula evaluation        |
+| `mk_body_content_page()` | Triggers formula computation for visible rows            |
+| `mk_body_cell_content()` | Reads computed values from `ns_fast_access`              |
+
+## Syntax Summary
+
+```
+# Basic arithmetic
+{Price} * {Quantity}
+
+# Function call
+round({Price} * 1.2, 2)
+
+# Simple condition (None if false)
+{Price} * 0.8 if {Country} == "FR"
+
+# Condition with else
+{Price} * 0.8 if {Country} == "FR" else {Price}
+
+# Chained conditions
+{Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price}
+
+# Logical operators
+{Price} * 0.8 if {Country} == "FR" and {Quantity} > 10
+
+# Grouping
+{Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10
+
+# Cross-table (implicit join on id)
+{Products.Price} * {Quantity}
+
+# Cross-table (explicit join)
+{Products.Price WHERE Products.Code = ProductCode} * {Quantity}
+
+# Cross-table aggregation
+sum({OrderLines.Amount WHERE OrderLines.OrderId = Id})
+
+# Nested functions
+round(avg({Q1}, {Q2}, {Q3}, {Q4}), 1)
+
+# Text operations
+concat(upper(left({FirstName}, 1)), ". ", {LastName})
+```
+
+## Future: Cell-Level Overrides
+
+The architecture supports adding cell-level formula overrides with ~20-30% additional work:
+
+- **Storage**: sparse dict `cell_formulas: dict[(col_id, row_index), str]` (same pattern as `cell_formats`)
+- **DAG**: new node type `table.column[row]` alongside existing `table.column` nodes
+- **Evaluation**: "does this cell have an override? If yes, use it. Otherwise, use the column formula."
+- **Node ID scheme**: designed to be extensible from the start (`table.column` for columns, `table.column[row]` for
+  cells)