# DataGrid Formulas ## Overview The DataGrid formula system adds computed columns to the DataGrid. A formula column applies a single expression to every row, producing derived values from existing data — within the same table or across tables. The system is designed for: - **Column-level formulas**: one formula per column, applied to all rows - **Cross-table references**: direct syntax to reference columns from other tables - **Reactive recalculation**: dirty flag propagation with page-aware computation - **Cell-level overrides** (planned): individual cells can override the column formula ## Formula Language ### Basic Syntax A formula is an expression that references columns with `{ColumnName}` and produces a value for each row: ``` {Price} * {Quantity} ``` References use curly braces `{}` to distinguish column names from keywords and functions. Column names are matched by ID or title. ### Operators #### Arithmetic | Operator | Description | Example | |----------|----------------|------------------------| | `+` | Addition | `{Price} + {Tax}` | | `-` | Subtraction | `{Total} - {Discount}` | | `*` | Multiplication | `{Price} * {Quantity}` | | `/` | Division | `{Total} / {Count}` | | `%` | Modulo | `{Value} % 2` | | `^` | Power | `{Base} ^ 2` | #### Comparison | Operator | Description | Example | |--------------|--------------------|---------------------------------| | `==` | Equal | `{Status} == "active"` | | `!=` | Not equal | `{Status} != "deleted"` | | `>` | Greater than | `{Price} > 100` | | `<` | Less than | `{Stock} < 10` | | `>=` | Greater or equal | `{Score} >= 80` | | `<=` | Less or equal | `{Age} <= 18` | | `contains` | String contains | `{Name} contains "Corp"` | | `startswith` | String starts with | `{Code} startswith "ERR"` | | `endswith` | String ends with | `{File} endswith ".csv"` | | `in` | Value in list | `{Status} in ["active", "new"]` | | `between` | Value in range | `{Age} between 18 and 65` | | `isempty` | Value is empty | `{Notes} isempty` | | `isnotempty` | Value is not empty | `{Email} isnotempty` | | `isnan` | Value is NaN | `{Score} isnan` | #### Logical | Operator | Description | Example | |----------|-------------|---------------------------------------| | `and` | Logical AND | `{Age} > 18 and {Status} == "active"` | | `or` | Logical OR | `{Type} == "A" or {Type} == "B"` | | `not` | Negation | `not {Status} == "deleted"` | Parentheses control precedence: `({Type} == "A" or {Type} == "B") and {Active} == True` ### Conditions (suffix-if) Conditions use a **suffix-if** syntax: the result expression comes first, then the condition. This keeps the focus on the output, not the branching logic. #### Simple condition (no else — result is None when false) ``` {Price} * 0.8 if {Country} == "FR" ``` #### With else ``` {Price} * 0.8 if {Country} == "FR" else {Price} ``` #### Chained conditions ``` {Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price} ``` #### With logical operators ``` {Price} * 0.8 if {Country} == "FR" and {Quantity} > 10 else {Price} ``` #### With grouping ``` {Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10 ``` ### Functions #### Math | Function | Description | Example | |-------------------|-----------------------|-------------------------------| | `round(expr, n)` | Round to n decimals | `round({Price} * 1.2, 2)` | | `abs(expr)` | Absolute value | `abs({Balance})` | | `min(expr, expr)` | Minimum of two values | `min({Price}, {MaxPrice})` | | `max(expr, expr)` | Maximum of two values | `max({Score}, 0)` | | `sum(expr, ...)` | Sum of values | `sum({Q1}, {Q2}, {Q3}, {Q4})` | | `avg(expr, ...)` | Average of values | `avg({Q1}, {Q2}, {Q3}, {Q4})` | #### Text | Function | Description | Example | |---------------------|---------------------|--------------------------------| | `upper(expr)` | Uppercase | `upper({Name})` | | `lower(expr)` | Lowercase | `lower({Email})` | | `len(expr)` | String length | `len({Description})` | | `concat(expr, ...)` | Concatenate strings | `concat({First}, " ", {Last})` | | `trim(expr)` | Remove whitespace | `trim({Input})` | | `left(expr, n)` | First n characters | `left({Code}, 3)` | | `right(expr, n)` | Last n characters | `right({Phone}, 4)` | #### Date | Function | Description | Example | |------------------------|--------------------|--------------------------------| | `year(expr)` | Extract year | `year({CreatedAt})` | | `month(expr)` | Extract month | `month({CreatedAt})` | | `day(expr)` | Extract day | `day({CreatedAt})` | | `today()` | Current date | `datediff({DueDate}, today())` | | `datediff(expr, expr)` | Difference in days | `datediff({End}, {Start})` | #### Aggregation (for cross-table contexts) | Function | Description | Example | |---------------|--------------|-----------------------------------------------------| | `sum(expr)` | Sum values | `sum({Orders.Amount WHERE Orders.ClientId = Id})` | | `count(expr)` | Count values | `count({Orders.Id WHERE Orders.ClientId = Id})` | | `avg(expr)` | Average | `avg({Reviews.Score WHERE Reviews.ProductId = Id})` | | `min(expr)` | Minimum | `min({Bids.Price WHERE Bids.ItemId = Id})` | | `max(expr)` | Maximum | `max({Bids.Price WHERE Bids.ItemId = Id})` | ## Cross-Table References ### Direct Reference Reference a column from another table using `{TableName.ColumnName}`: ``` {Products.Price} * {Quantity} ``` ### Join Resolution (implicit) When referencing another table without a WHERE clause, the join is resolved automatically: 1. **By `id` column**: if both tables have a column named `id`, rows are matched on equal `id` values 2. **By row index**: if no `id` column exists in both tables, rows are matched by their internal row index (stable across sort/filter) ### Explicit Join (WHERE clause) For explicit control over which row of the other table to use: ``` {Products.Price WHERE Products.Code = ProductCode} * {Quantity} ``` Inside the WHERE clause: - `Products.Code` refers to a column in the referenced table - `ProductCode` (no `Table.` prefix) refers to a column in the current table ### Aggregation with Cross-Table When a cross-table reference matches multiple rows, use an aggregation function: ``` sum({OrderLines.Amount WHERE OrderLines.OrderId = Id}) ``` Without aggregation, a multi-row match returns the first matching value. ## Calculation Engine ### Dependency Graph (DAG) The formula system maintains a **Directed Acyclic Graph** of dependencies between columns: - **Nodes**: each formula column is a node, identified by `table_name.column_id` - **Edges**: if column A's formula references column B, an edge B → A exists ("A depends on B") - Both directions are tracked: - **Precedents**: columns that a formula reads from - **Dependents**: columns that need recalculation when this column changes Cross-table references create edges that span DataGrid instances, managed at the `DataGridsManager` level. ### Dirty Flag Propagation When a source column's data changes: 1. The source column is marked **dirty** 2. All direct dependents are marked dirty 3. Propagation continues recursively through the DAG 4. Each dirty column maintains a **dirty row set**: the specific row indices that need recalculation This propagation is **immediate** (fast — only flag marking, no computation). ### Recalculation Strategy (Hybrid) Actual computation is **deferred to rendering time**: 1. On value change → dirty flags propagate instantly through the DAG 2. On page render (`mk_body_content_page`) → only dirty rows within the visible page (up to 1000 rows) are recalculated 3. Off-screen pages remain dirty until scrolled into view 4. Calculation follows **topological order** of the DAG to ensure precedents are computed before dependents ### Cycle Detection Before adding a formula, the engine checks for cycles in the DAG using Kahn's algorithm during topological sort. If a cycle is detected: - The formula is **rejected** - The editor displays an error identifying the circular dependency chain - The previous formula (if any) remains unchanged ### Caching Each formula column caches its computed values: - Results are stored in `ns_fast_access[col_id]` alongside raw data columns - The dirty row set tracks which cached values are stale - Non-dirty rows return their cached value without re-evaluation - Cache is invalidated per-row when source data changes ## Evaluation ### Row-by-Row Execution Formulas are evaluated **row-by-row** within the page being rendered. For each row: 1. Resolve column references `{ColumnName}` to the cell value at the current row index 2. Resolve cross-table references `{Table.Column}` via the join mechanism 3. Evaluate the expression with resolved values 4. Store the result in the cache (`ns_fast_access`) ### Parser The formula language uses a **custom grammar** parsed with Lark (consistent with the formatting DSL). The parser: 1. Tokenizes the formula string 2. Builds an AST (Abstract Syntax Tree) 3. Transforms the AST into an evaluable representation 4. Extracts column references for dependency graph registration ### Error Handling | Error Type | Behavior | |-----------------------|-------------------------------------------------------| | Syntax error | Editor highlights the error, formula not saved | | Unknown column | Editor highlights, autocompletion suggests fixes | | Type mismatch | Cell displays error indicator, other cells unaffected | | Division by zero | Cell displays `#DIV/0!` or None | | Circular dependency | Formula rejected, editor shows cycle chain | | Cross-table not found | Editor highlights unknown table name | | No join match | Cell displays None | ## User Interface ### Creating a Formula Column Formula columns are created and edited through the **DataGridColumnsManager**: 1. User opens the Columns Manager panel 2. Adds a new column or edits an existing one 3. Selects column type **"Formula"** 4. A **DslEditor** (CodeMirror 5) opens for formula input 5. The editor provides: - **Syntax highlighting**: keywords, column references, functions, operators - **Autocompletion**: column names (current table and other tables), function names, table names - **Validation**: real-time syntax checking and dependency cycle detection - **Error markers**: inline error indicators with descriptions ### Formula Column Properties A formula column extends `DataGridColumnState` with: | Property | Type | Description | |---------------------------------------------------------------------------|---------------|------------------------------------------------| | `formula` | `str` or None | The formula expression (None for data columns) | | `col_type` | `ColumnType` | Set to `ColumnType.Formula` | | Other properties (`title`, `visible`, `width`, `format`) remain unchanged | Formula columns are **read-only** in the grid body — cell values are computed, not editable. Formatting rules from the formatting DSL apply to formula columns like any other column. ## Integration Points | Component | Role | |--------------------------|----------------------------------------------------------| | `DataGridColumnState` | Stores `formula` field and `ColumnType.Formula` type | | `DatagridStore` | `ns_fast_access` caches formula results as numpy arrays | | `DataGridColumnsManager` | UI for creating/editing formula columns | | `DataGridsManager` | Hosts the global dependency DAG across all tables | | `DslEditor` | CodeMirror 5 editor with highlighting and autocompletion | | `FormattingEngine` | Applies formatting rules AFTER formula evaluation | | `mk_body_content_page()` | Triggers formula computation for visible rows | | `mk_body_cell_content()` | Reads computed values from `ns_fast_access` | ## Syntax Summary ``` # Basic arithmetic {Price} * {Quantity} # Function call round({Price} * 1.2, 2) # Simple condition (None if false) {Price} * 0.8 if {Country} == "FR" # Condition with else {Price} * 0.8 if {Country} == "FR" else {Price} # Chained conditions {Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price} # Logical operators {Price} * 0.8 if {Country} == "FR" and {Quantity} > 10 # Grouping {Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10 # Cross-table (implicit join on id) {Products.Price} * {Quantity} # Cross-table (explicit join) {Products.Price WHERE Products.Code = ProductCode} * {Quantity} # Cross-table aggregation sum({OrderLines.Amount WHERE OrderLines.OrderId = Id}) # Nested functions round(avg({Q1}, {Q2}, {Q3}, {Q4}), 1) # Text operations concat(upper(left({FirstName}, 1)), ". ", {LastName}) ``` ## Future: Cell-Level Overrides The architecture supports adding cell-level formula overrides with ~20-30% additional work: - **Storage**: sparse dict `cell_formulas: dict[(col_id, row_index), str]` (same pattern as `cell_formats`) - **DAG**: new node type `table.column[row]` alongside existing `table.column` nodes - **Evaluation**: "does this cell have an override? If yes, use it. Otherwise, use the column formula." - **Node ID scheme**: designed to be extensible from the start (`table.column` for columns, `table.column[row]` for cells)