Introducing columns formulas
This commit is contained in:
365
docs/Datagrid Formulas.md
Normal file
365
docs/Datagrid Formulas.md
Normal file
@@ -0,0 +1,365 @@
|
||||
# DataGrid Formulas
|
||||
|
||||
## Overview
|
||||
|
||||
The DataGrid formula system adds computed columns to the DataGrid. A formula column applies a single expression to every
|
||||
row, producing derived values from existing data — within the same table or across tables.
|
||||
|
||||
The system is designed for:
|
||||
|
||||
- **Column-level formulas**: one formula per column, applied to all rows
|
||||
- **Cross-table references**: direct syntax to reference columns from other tables
|
||||
- **Reactive recalculation**: dirty flag propagation with page-aware computation
|
||||
- **Cell-level overrides** (planned): individual cells can override the column formula
|
||||
|
||||
## Formula Language
|
||||
|
||||
### Basic Syntax
|
||||
|
||||
A formula is an expression that references columns with `{ColumnName}` and produces a value for each row:
|
||||
|
||||
```
|
||||
{Price} * {Quantity}
|
||||
```
|
||||
|
||||
References use curly braces `{}` to distinguish column names from keywords and functions. Column names are matched by ID
|
||||
or title.
|
||||
|
||||
### Operators
|
||||
|
||||
#### Arithmetic
|
||||
|
||||
| Operator | Description | Example |
|
||||
|----------|----------------|------------------------|
|
||||
| `+` | Addition | `{Price} + {Tax}` |
|
||||
| `-` | Subtraction | `{Total} - {Discount}` |
|
||||
| `*` | Multiplication | `{Price} * {Quantity}` |
|
||||
| `/` | Division | `{Total} / {Count}` |
|
||||
| `%` | Modulo | `{Value} % 2` |
|
||||
| `^` | Power | `{Base} ^ 2` |
|
||||
|
||||
#### Comparison
|
||||
|
||||
| Operator | Description | Example |
|
||||
|--------------|--------------------|---------------------------------|
|
||||
| `==` | Equal | `{Status} == "active"` |
|
||||
| `!=` | Not equal | `{Status} != "deleted"` |
|
||||
| `>` | Greater than | `{Price} > 100` |
|
||||
| `<` | Less than | `{Stock} < 10` |
|
||||
| `>=` | Greater or equal | `{Score} >= 80` |
|
||||
| `<=` | Less or equal | `{Age} <= 18` |
|
||||
| `contains` | String contains | `{Name} contains "Corp"` |
|
||||
| `startswith` | String starts with | `{Code} startswith "ERR"` |
|
||||
| `endswith` | String ends with | `{File} endswith ".csv"` |
|
||||
| `in` | Value in list | `{Status} in ["active", "new"]` |
|
||||
| `between` | Value in range | `{Age} between 18 and 65` |
|
||||
| `isempty` | Value is empty | `{Notes} isempty` |
|
||||
| `isnotempty` | Value is not empty | `{Email} isnotempty` |
|
||||
| `isnan` | Value is NaN | `{Score} isnan` |
|
||||
|
||||
#### Logical
|
||||
|
||||
| Operator | Description | Example |
|
||||
|----------|-------------|---------------------------------------|
|
||||
| `and` | Logical AND | `{Age} > 18 and {Status} == "active"` |
|
||||
| `or` | Logical OR | `{Type} == "A" or {Type} == "B"` |
|
||||
| `not` | Negation | `not {Status} == "deleted"` |
|
||||
|
||||
Parentheses control precedence: `({Type} == "A" or {Type} == "B") and {Active} == True`
|
||||
|
||||
### Conditions (suffix-if)
|
||||
|
||||
Conditions use a **suffix-if** syntax: the result expression comes first, then the condition. This keeps the focus on
|
||||
the output, not the branching logic.
|
||||
|
||||
#### Simple condition (no else — result is None when false)
|
||||
|
||||
```
|
||||
{Price} * 0.8 if {Country} == "FR"
|
||||
```
|
||||
|
||||
#### With else
|
||||
|
||||
```
|
||||
{Price} * 0.8 if {Country} == "FR" else {Price}
|
||||
```
|
||||
|
||||
#### Chained conditions
|
||||
|
||||
```
|
||||
{Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price}
|
||||
```
|
||||
|
||||
#### With logical operators
|
||||
|
||||
```
|
||||
{Price} * 0.8 if {Country} == "FR" and {Quantity} > 10 else {Price}
|
||||
```
|
||||
|
||||
#### With grouping
|
||||
|
||||
```
|
||||
{Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10
|
||||
```
|
||||
|
||||
### Functions
|
||||
|
||||
#### Math
|
||||
|
||||
| Function | Description | Example |
|
||||
|-------------------|-----------------------|-------------------------------|
|
||||
| `round(expr, n)` | Round to n decimals | `round({Price} * 1.2, 2)` |
|
||||
| `abs(expr)` | Absolute value | `abs({Balance})` |
|
||||
| `min(expr, expr)` | Minimum of two values | `min({Price}, {MaxPrice})` |
|
||||
| `max(expr, expr)` | Maximum of two values | `max({Score}, 0)` |
|
||||
| `sum(expr, ...)` | Sum of values | `sum({Q1}, {Q2}, {Q3}, {Q4})` |
|
||||
| `avg(expr, ...)` | Average of values | `avg({Q1}, {Q2}, {Q3}, {Q4})` |
|
||||
|
||||
#### Text
|
||||
|
||||
| Function | Description | Example |
|
||||
|---------------------|---------------------|--------------------------------|
|
||||
| `upper(expr)` | Uppercase | `upper({Name})` |
|
||||
| `lower(expr)` | Lowercase | `lower({Email})` |
|
||||
| `len(expr)` | String length | `len({Description})` |
|
||||
| `concat(expr, ...)` | Concatenate strings | `concat({First}, " ", {Last})` |
|
||||
| `trim(expr)` | Remove whitespace | `trim({Input})` |
|
||||
| `left(expr, n)` | First n characters | `left({Code}, 3)` |
|
||||
| `right(expr, n)` | Last n characters | `right({Phone}, 4)` |
|
||||
|
||||
#### Date
|
||||
|
||||
| Function | Description | Example |
|
||||
|------------------------|--------------------|--------------------------------|
|
||||
| `year(expr)` | Extract year | `year({CreatedAt})` |
|
||||
| `month(expr)` | Extract month | `month({CreatedAt})` |
|
||||
| `day(expr)` | Extract day | `day({CreatedAt})` |
|
||||
| `today()` | Current date | `datediff({DueDate}, today())` |
|
||||
| `datediff(expr, expr)` | Difference in days | `datediff({End}, {Start})` |
|
||||
|
||||
#### Aggregation (for cross-table contexts)
|
||||
|
||||
| Function | Description | Example |
|
||||
|---------------|--------------|-----------------------------------------------------|
|
||||
| `sum(expr)` | Sum values | `sum({Orders.Amount WHERE Orders.ClientId = Id})` |
|
||||
| `count(expr)` | Count values | `count({Orders.Id WHERE Orders.ClientId = Id})` |
|
||||
| `avg(expr)` | Average | `avg({Reviews.Score WHERE Reviews.ProductId = Id})` |
|
||||
| `min(expr)` | Minimum | `min({Bids.Price WHERE Bids.ItemId = Id})` |
|
||||
| `max(expr)` | Maximum | `max({Bids.Price WHERE Bids.ItemId = Id})` |
|
||||
|
||||
## Cross-Table References
|
||||
|
||||
### Direct Reference
|
||||
|
||||
Reference a column from another table using `{TableName.ColumnName}`:
|
||||
|
||||
```
|
||||
{Products.Price} * {Quantity}
|
||||
```
|
||||
|
||||
### Join Resolution (implicit)
|
||||
|
||||
When referencing another table without a WHERE clause, the join is resolved automatically:
|
||||
|
||||
1. **By `id` column**: if both tables have a column named `id`, rows are matched on equal `id` values
|
||||
2. **By row index**: if no `id` column exists in both tables, rows are matched by their internal row index (stable
|
||||
across sort/filter)
|
||||
|
||||
### Explicit Join (WHERE clause)
|
||||
|
||||
For explicit control over which row of the other table to use:
|
||||
|
||||
```
|
||||
{Products.Price WHERE Products.Code = ProductCode} * {Quantity}
|
||||
```
|
||||
|
||||
Inside the WHERE clause:
|
||||
|
||||
- `Products.Code` refers to a column in the referenced table
|
||||
- `ProductCode` (no `Table.` prefix) refers to a column in the current table
|
||||
|
||||
### Aggregation with Cross-Table
|
||||
|
||||
When a cross-table reference matches multiple rows, use an aggregation function:
|
||||
|
||||
```
|
||||
sum({OrderLines.Amount WHERE OrderLines.OrderId = Id})
|
||||
```
|
||||
|
||||
Without aggregation, a multi-row match returns the first matching value.
|
||||
|
||||
## Calculation Engine
|
||||
|
||||
### Dependency Graph (DAG)
|
||||
|
||||
The formula system maintains a **Directed Acyclic Graph** of dependencies between columns:
|
||||
|
||||
- **Nodes**: each formula column is a node, identified by `table_name.column_id`
|
||||
- **Edges**: if column A's formula references column B, an edge B → A exists ("A depends on B")
|
||||
- Both directions are tracked:
|
||||
- **Precedents**: columns that a formula reads from
|
||||
- **Dependents**: columns that need recalculation when this column changes
|
||||
|
||||
Cross-table references create edges that span DataGrid instances, managed at the `DataGridsManager` level.
|
||||
|
||||
### Dirty Flag Propagation
|
||||
|
||||
When a source column's data changes:
|
||||
|
||||
1. The source column is marked **dirty**
|
||||
2. All direct dependents are marked dirty
|
||||
3. Propagation continues recursively through the DAG
|
||||
4. Each dirty column maintains a **dirty row set**: the specific row indices that need recalculation
|
||||
|
||||
This propagation is **immediate** (fast — only flag marking, no computation).
|
||||
|
||||
### Recalculation Strategy (Hybrid)
|
||||
|
||||
Actual computation is **deferred to rendering time**:
|
||||
|
||||
1. On value change → dirty flags propagate instantly through the DAG
|
||||
2. On page render (`mk_body_content_page`) → only dirty rows within the visible page (up to 1000 rows) are recalculated
|
||||
3. Off-screen pages remain dirty until scrolled into view
|
||||
4. Calculation follows **topological order** of the DAG to ensure precedents are computed before dependents
|
||||
|
||||
### Cycle Detection
|
||||
|
||||
Before adding a formula, the engine checks for cycles in the DAG using Kahn's algorithm during topological sort. If a
|
||||
cycle is detected:
|
||||
|
||||
- The formula is **rejected**
|
||||
- The editor displays an error identifying the circular dependency chain
|
||||
- The previous formula (if any) remains unchanged
|
||||
|
||||
### Caching
|
||||
|
||||
Each formula column caches its computed values:
|
||||
|
||||
- Results are stored in `ns_fast_access[col_id]` alongside raw data columns
|
||||
- The dirty row set tracks which cached values are stale
|
||||
- Non-dirty rows return their cached value without re-evaluation
|
||||
- Cache is invalidated per-row when source data changes
|
||||
|
||||
## Evaluation
|
||||
|
||||
### Row-by-Row Execution
|
||||
|
||||
Formulas are evaluated **row-by-row** within the page being rendered. For each row:
|
||||
|
||||
1. Resolve column references `{ColumnName}` to the cell value at the current row index
|
||||
2. Resolve cross-table references `{Table.Column}` via the join mechanism
|
||||
3. Evaluate the expression with resolved values
|
||||
4. Store the result in the cache (`ns_fast_access`)
|
||||
|
||||
### Parser
|
||||
|
||||
The formula language uses a **custom grammar** parsed with Lark (consistent with the formatting DSL). The parser:
|
||||
|
||||
1. Tokenizes the formula string
|
||||
2. Builds an AST (Abstract Syntax Tree)
|
||||
3. Transforms the AST into an evaluable representation
|
||||
4. Extracts column references for dependency graph registration
|
||||
|
||||
### Error Handling
|
||||
|
||||
| Error Type | Behavior |
|
||||
|-----------------------|-------------------------------------------------------|
|
||||
| Syntax error | Editor highlights the error, formula not saved |
|
||||
| Unknown column | Editor highlights, autocompletion suggests fixes |
|
||||
| Type mismatch | Cell displays error indicator, other cells unaffected |
|
||||
| Division by zero | Cell displays `#DIV/0!` or None |
|
||||
| Circular dependency | Formula rejected, editor shows cycle chain |
|
||||
| Cross-table not found | Editor highlights unknown table name |
|
||||
| No join match | Cell displays None |
|
||||
|
||||
## User Interface
|
||||
|
||||
### Creating a Formula Column
|
||||
|
||||
Formula columns are created and edited through the **DataGridColumnsManager**:
|
||||
|
||||
1. User opens the Columns Manager panel
|
||||
2. Adds a new column or edits an existing one
|
||||
3. Selects column type **"Formula"**
|
||||
4. A **DslEditor** (CodeMirror 5) opens for formula input
|
||||
5. The editor provides:
|
||||
- **Syntax highlighting**: keywords, column references, functions, operators
|
||||
- **Autocompletion**: column names (current table and other tables), function names, table names
|
||||
- **Validation**: real-time syntax checking and dependency cycle detection
|
||||
- **Error markers**: inline error indicators with descriptions
|
||||
|
||||
### Formula Column Properties
|
||||
|
||||
A formula column extends `DataGridColumnState` with:
|
||||
|
||||
| Property | Type | Description |
|
||||
|---------------------------------------------------------------------------|---------------|------------------------------------------------|
|
||||
| `formula` | `str` or None | The formula expression (None for data columns) |
|
||||
| `col_type` | `ColumnType` | Set to `ColumnType.Formula` |
|
||||
| Other properties (`title`, `visible`, `width`, `format`) remain unchanged |
|
||||
|
||||
Formula columns are **read-only** in the grid body — cell values are computed, not editable. Formatting rules from the
|
||||
formatting DSL apply to formula columns like any other column.
|
||||
|
||||
## Integration Points
|
||||
|
||||
| Component | Role |
|
||||
|--------------------------|----------------------------------------------------------|
|
||||
| `DataGridColumnState` | Stores `formula` field and `ColumnType.Formula` type |
|
||||
| `DatagridStore` | `ns_fast_access` caches formula results as numpy arrays |
|
||||
| `DataGridColumnsManager` | UI for creating/editing formula columns |
|
||||
| `DataGridsManager` | Hosts the global dependency DAG across all tables |
|
||||
| `DslEditor` | CodeMirror 5 editor with highlighting and autocompletion |
|
||||
| `FormattingEngine` | Applies formatting rules AFTER formula evaluation |
|
||||
| `mk_body_content_page()` | Triggers formula computation for visible rows |
|
||||
| `mk_body_cell_content()` | Reads computed values from `ns_fast_access` |
|
||||
|
||||
## Syntax Summary
|
||||
|
||||
```
|
||||
# Basic arithmetic
|
||||
{Price} * {Quantity}
|
||||
|
||||
# Function call
|
||||
round({Price} * 1.2, 2)
|
||||
|
||||
# Simple condition (None if false)
|
||||
{Price} * 0.8 if {Country} == "FR"
|
||||
|
||||
# Condition with else
|
||||
{Price} * 0.8 if {Country} == "FR" else {Price}
|
||||
|
||||
# Chained conditions
|
||||
{Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price}
|
||||
|
||||
# Logical operators
|
||||
{Price} * 0.8 if {Country} == "FR" and {Quantity} > 10
|
||||
|
||||
# Grouping
|
||||
{Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10
|
||||
|
||||
# Cross-table (implicit join on id)
|
||||
{Products.Price} * {Quantity}
|
||||
|
||||
# Cross-table (explicit join)
|
||||
{Products.Price WHERE Products.Code = ProductCode} * {Quantity}
|
||||
|
||||
# Cross-table aggregation
|
||||
sum({OrderLines.Amount WHERE OrderLines.OrderId = Id})
|
||||
|
||||
# Nested functions
|
||||
round(avg({Q1}, {Q2}, {Q3}, {Q4}), 1)
|
||||
|
||||
# Text operations
|
||||
concat(upper(left({FirstName}, 1)), ". ", {LastName})
|
||||
```
|
||||
|
||||
## Future: Cell-Level Overrides
|
||||
|
||||
The architecture supports adding cell-level formula overrides with ~20-30% additional work:
|
||||
|
||||
- **Storage**: sparse dict `cell_formulas: dict[(col_id, row_index), str]` (same pattern as `cell_formats`)
|
||||
- **DAG**: new node type `table.column[row]` alongside existing `table.column` nodes
|
||||
- **Evaluation**: "does this cell have an override? If yes, use it. Otherwise, use the column formula."
|
||||
- **Node ID scheme**: designed to be extensible from the start (`table.column` for columns, `table.column[row]` for
|
||||
cells)
|
||||
Reference in New Issue
Block a user