Introducing columns formulas

This commit is contained in:
2026-02-13 21:38:00 +01:00
parent 0df78c0513
commit e8443f07f9
29 changed files with 3889 additions and 15 deletions

365
docs/Datagrid Formulas.md Normal file
View File

@@ -0,0 +1,365 @@
# DataGrid Formulas
## Overview
The DataGrid formula system adds computed columns to the DataGrid. A formula column applies a single expression to every
row, producing derived values from existing data — within the same table or across tables.
The system is designed for:
- **Column-level formulas**: one formula per column, applied to all rows
- **Cross-table references**: direct syntax to reference columns from other tables
- **Reactive recalculation**: dirty flag propagation with page-aware computation
- **Cell-level overrides** (planned): individual cells can override the column formula
## Formula Language
### Basic Syntax
A formula is an expression that references columns with `{ColumnName}` and produces a value for each row:
```
{Price} * {Quantity}
```
References use curly braces `{}` to distinguish column names from keywords and functions. Column names are matched by ID
or title.
### Operators
#### Arithmetic
| Operator | Description | Example |
|----------|----------------|------------------------|
| `+` | Addition | `{Price} + {Tax}` |
| `-` | Subtraction | `{Total} - {Discount}` |
| `*` | Multiplication | `{Price} * {Quantity}` |
| `/` | Division | `{Total} / {Count}` |
| `%` | Modulo | `{Value} % 2` |
| `^` | Power | `{Base} ^ 2` |
#### Comparison
| Operator | Description | Example |
|--------------|--------------------|---------------------------------|
| `==` | Equal | `{Status} == "active"` |
| `!=` | Not equal | `{Status} != "deleted"` |
| `>` | Greater than | `{Price} > 100` |
| `<` | Less than | `{Stock} < 10` |
| `>=` | Greater or equal | `{Score} >= 80` |
| `<=` | Less or equal | `{Age} <= 18` |
| `contains` | String contains | `{Name} contains "Corp"` |
| `startswith` | String starts with | `{Code} startswith "ERR"` |
| `endswith` | String ends with | `{File} endswith ".csv"` |
| `in` | Value in list | `{Status} in ["active", "new"]` |
| `between` | Value in range | `{Age} between 18 and 65` |
| `isempty` | Value is empty | `{Notes} isempty` |
| `isnotempty` | Value is not empty | `{Email} isnotempty` |
| `isnan` | Value is NaN | `{Score} isnan` |
#### Logical
| Operator | Description | Example |
|----------|-------------|---------------------------------------|
| `and` | Logical AND | `{Age} > 18 and {Status} == "active"` |
| `or` | Logical OR | `{Type} == "A" or {Type} == "B"` |
| `not` | Negation | `not {Status} == "deleted"` |
Parentheses control precedence: `({Type} == "A" or {Type} == "B") and {Active} == True`
### Conditions (suffix-if)
Conditions use a **suffix-if** syntax: the result expression comes first, then the condition. This keeps the focus on
the output, not the branching logic.
#### Simple condition (no else — result is None when false)
```
{Price} * 0.8 if {Country} == "FR"
```
#### With else
```
{Price} * 0.8 if {Country} == "FR" else {Price}
```
#### Chained conditions
```
{Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price}
```
#### With logical operators
```
{Price} * 0.8 if {Country} == "FR" and {Quantity} > 10 else {Price}
```
#### With grouping
```
{Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10
```
### Functions
#### Math
| Function | Description | Example |
|-------------------|-----------------------|-------------------------------|
| `round(expr, n)` | Round to n decimals | `round({Price} * 1.2, 2)` |
| `abs(expr)` | Absolute value | `abs({Balance})` |
| `min(expr, expr)` | Minimum of two values | `min({Price}, {MaxPrice})` |
| `max(expr, expr)` | Maximum of two values | `max({Score}, 0)` |
| `sum(expr, ...)` | Sum of values | `sum({Q1}, {Q2}, {Q3}, {Q4})` |
| `avg(expr, ...)` | Average of values | `avg({Q1}, {Q2}, {Q3}, {Q4})` |
#### Text
| Function | Description | Example |
|---------------------|---------------------|--------------------------------|
| `upper(expr)` | Uppercase | `upper({Name})` |
| `lower(expr)` | Lowercase | `lower({Email})` |
| `len(expr)` | String length | `len({Description})` |
| `concat(expr, ...)` | Concatenate strings | `concat({First}, " ", {Last})` |
| `trim(expr)` | Remove whitespace | `trim({Input})` |
| `left(expr, n)` | First n characters | `left({Code}, 3)` |
| `right(expr, n)` | Last n characters | `right({Phone}, 4)` |
#### Date
| Function | Description | Example |
|------------------------|--------------------|--------------------------------|
| `year(expr)` | Extract year | `year({CreatedAt})` |
| `month(expr)` | Extract month | `month({CreatedAt})` |
| `day(expr)` | Extract day | `day({CreatedAt})` |
| `today()` | Current date | `datediff({DueDate}, today())` |
| `datediff(expr, expr)` | Difference in days | `datediff({End}, {Start})` |
#### Aggregation (for cross-table contexts)
| Function | Description | Example |
|---------------|--------------|-----------------------------------------------------|
| `sum(expr)` | Sum values | `sum({Orders.Amount WHERE Orders.ClientId = Id})` |
| `count(expr)` | Count values | `count({Orders.Id WHERE Orders.ClientId = Id})` |
| `avg(expr)` | Average | `avg({Reviews.Score WHERE Reviews.ProductId = Id})` |
| `min(expr)` | Minimum | `min({Bids.Price WHERE Bids.ItemId = Id})` |
| `max(expr)` | Maximum | `max({Bids.Price WHERE Bids.ItemId = Id})` |
## Cross-Table References
### Direct Reference
Reference a column from another table using `{TableName.ColumnName}`:
```
{Products.Price} * {Quantity}
```
### Join Resolution (implicit)
When referencing another table without a WHERE clause, the join is resolved automatically:
1. **By `id` column**: if both tables have a column named `id`, rows are matched on equal `id` values
2. **By row index**: if no `id` column exists in both tables, rows are matched by their internal row index (stable
across sort/filter)
### Explicit Join (WHERE clause)
For explicit control over which row of the other table to use:
```
{Products.Price WHERE Products.Code = ProductCode} * {Quantity}
```
Inside the WHERE clause:
- `Products.Code` refers to a column in the referenced table
- `ProductCode` (no `Table.` prefix) refers to a column in the current table
### Aggregation with Cross-Table
When a cross-table reference matches multiple rows, use an aggregation function:
```
sum({OrderLines.Amount WHERE OrderLines.OrderId = Id})
```
Without aggregation, a multi-row match returns the first matching value.
## Calculation Engine
### Dependency Graph (DAG)
The formula system maintains a **Directed Acyclic Graph** of dependencies between columns:
- **Nodes**: each formula column is a node, identified by `table_name.column_id`
- **Edges**: if column A's formula references column B, an edge B → A exists ("A depends on B")
- Both directions are tracked:
- **Precedents**: columns that a formula reads from
- **Dependents**: columns that need recalculation when this column changes
Cross-table references create edges that span DataGrid instances, managed at the `DataGridsManager` level.
### Dirty Flag Propagation
When a source column's data changes:
1. The source column is marked **dirty**
2. All direct dependents are marked dirty
3. Propagation continues recursively through the DAG
4. Each dirty column maintains a **dirty row set**: the specific row indices that need recalculation
This propagation is **immediate** (fast — only flag marking, no computation).
### Recalculation Strategy (Hybrid)
Actual computation is **deferred to rendering time**:
1. On value change → dirty flags propagate instantly through the DAG
2. On page render (`mk_body_content_page`) → only dirty rows within the visible page (up to 1000 rows) are recalculated
3. Off-screen pages remain dirty until scrolled into view
4. Calculation follows **topological order** of the DAG to ensure precedents are computed before dependents
### Cycle Detection
Before adding a formula, the engine checks for cycles in the DAG using Kahn's algorithm during topological sort. If a
cycle is detected:
- The formula is **rejected**
- The editor displays an error identifying the circular dependency chain
- The previous formula (if any) remains unchanged
### Caching
Each formula column caches its computed values:
- Results are stored in `ns_fast_access[col_id]` alongside raw data columns
- The dirty row set tracks which cached values are stale
- Non-dirty rows return their cached value without re-evaluation
- Cache is invalidated per-row when source data changes
## Evaluation
### Row-by-Row Execution
Formulas are evaluated **row-by-row** within the page being rendered. For each row:
1. Resolve column references `{ColumnName}` to the cell value at the current row index
2. Resolve cross-table references `{Table.Column}` via the join mechanism
3. Evaluate the expression with resolved values
4. Store the result in the cache (`ns_fast_access`)
### Parser
The formula language uses a **custom grammar** parsed with Lark (consistent with the formatting DSL). The parser:
1. Tokenizes the formula string
2. Builds an AST (Abstract Syntax Tree)
3. Transforms the AST into an evaluable representation
4. Extracts column references for dependency graph registration
### Error Handling
| Error Type | Behavior |
|-----------------------|-------------------------------------------------------|
| Syntax error | Editor highlights the error, formula not saved |
| Unknown column | Editor highlights, autocompletion suggests fixes |
| Type mismatch | Cell displays error indicator, other cells unaffected |
| Division by zero | Cell displays `#DIV/0!` or None |
| Circular dependency | Formula rejected, editor shows cycle chain |
| Cross-table not found | Editor highlights unknown table name |
| No join match | Cell displays None |
## User Interface
### Creating a Formula Column
Formula columns are created and edited through the **DataGridColumnsManager**:
1. User opens the Columns Manager panel
2. Adds a new column or edits an existing one
3. Selects column type **"Formula"**
4. A **DslEditor** (CodeMirror 5) opens for formula input
5. The editor provides:
- **Syntax highlighting**: keywords, column references, functions, operators
- **Autocompletion**: column names (current table and other tables), function names, table names
- **Validation**: real-time syntax checking and dependency cycle detection
- **Error markers**: inline error indicators with descriptions
### Formula Column Properties
A formula column extends `DataGridColumnState` with:
| Property | Type | Description |
|---------------------------------------------------------------------------|---------------|------------------------------------------------|
| `formula` | `str` or None | The formula expression (None for data columns) |
| `col_type` | `ColumnType` | Set to `ColumnType.Formula` |
| Other properties (`title`, `visible`, `width`, `format`) remain unchanged |
Formula columns are **read-only** in the grid body — cell values are computed, not editable. Formatting rules from the
formatting DSL apply to formula columns like any other column.
## Integration Points
| Component | Role |
|--------------------------|----------------------------------------------------------|
| `DataGridColumnState` | Stores `formula` field and `ColumnType.Formula` type |
| `DatagridStore` | `ns_fast_access` caches formula results as numpy arrays |
| `DataGridColumnsManager` | UI for creating/editing formula columns |
| `DataGridsManager` | Hosts the global dependency DAG across all tables |
| `DslEditor` | CodeMirror 5 editor with highlighting and autocompletion |
| `FormattingEngine` | Applies formatting rules AFTER formula evaluation |
| `mk_body_content_page()` | Triggers formula computation for visible rows |
| `mk_body_cell_content()` | Reads computed values from `ns_fast_access` |
## Syntax Summary
```
# Basic arithmetic
{Price} * {Quantity}
# Function call
round({Price} * 1.2, 2)
# Simple condition (None if false)
{Price} * 0.8 if {Country} == "FR"
# Condition with else
{Price} * 0.8 if {Country} == "FR" else {Price}
# Chained conditions
{Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price}
# Logical operators
{Price} * 0.8 if {Country} == "FR" and {Quantity} > 10
# Grouping
{Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10
# Cross-table (implicit join on id)
{Products.Price} * {Quantity}
# Cross-table (explicit join)
{Products.Price WHERE Products.Code = ProductCode} * {Quantity}
# Cross-table aggregation
sum({OrderLines.Amount WHERE OrderLines.OrderId = Id})
# Nested functions
round(avg({Q1}, {Q2}, {Q3}, {Q4}), 1)
# Text operations
concat(upper(left({FirstName}, 1)), ". ", {LastName})
```
## Future: Cell-Level Overrides
The architecture supports adding cell-level formula overrides with ~20-30% additional work:
- **Storage**: sparse dict `cell_formulas: dict[(col_id, row_index), str]` (same pattern as `cell_formats`)
- **DAG**: new node type `table.column[row]` alongside existing `table.column` nodes
- **Evaluation**: "does this cell have an override? If yes, use it. Otherwise, use the column formula."
- **Node ID scheme**: designed to be extensible from the start (`table.column` for columns, `table.column[row]` for
cells)