# DataGrid Formulas

## Overview

The DataGrid formula system adds computed columns to the DataGrid. A formula column applies a single expression to every
row, producing derived values from existing data — within the same table or across tables.

The system is designed for:

- **Column-level formulas**: one formula per column, applied to all rows
- **Cross-table references**: direct syntax to reference columns from other tables
- **Reactive recalculation**: dirty flag propagation with page-aware computation
- **Cell-level overrides** (planned): individual cells can override the column formula

## Formula Language

### Basic Syntax

A formula is an expression that references columns with `{ColumnName}` and produces a value for each row:

```
{Price} * {Quantity}
```

References use curly braces `{}` to distinguish column names from keywords and functions. Column names are matched by ID
or title.

### Operators

#### Arithmetic

| Operator | Description    | Example                |
|----------|----------------|------------------------|
| `+`      | Addition       | `{Price} + {Tax}`      |
| `-`      | Subtraction    | `{Total} - {Discount}` |
| `*`      | Multiplication | `{Price} * {Quantity}` |
| `/`      | Division       | `{Total} / {Count}`    |
| `%`      | Modulo         | `{Value} % 2`          |
| `^`      | Power          | `{Base} ^ 2`           |

#### Comparison

| Operator     | Description        | Example                         |
|--------------|--------------------|---------------------------------|
| `==`         | Equal              | `{Status} == "active"`          |
| `!=`         | Not equal          | `{Status} != "deleted"`         |
| `>`          | Greater than       | `{Price} > 100`                 |
| `<`          | Less than          | `{Stock} < 10`                  |
| `>=`         | Greater or equal   | `{Score} >= 80`                 |
| `<=`         | Less or equal      | `{Age} <= 18`                   |
| `contains`   | String contains    | `{Name} contains "Corp"`        |
| `startswith` | String starts with | `{Code} startswith "ERR"`       |
| `endswith`   | String ends with   | `{File} endswith ".csv"`        |
| `in`         | Value in list      | `{Status} in ["active", "new"]` |
| `between`    | Value in range     | `{Age} between 18 and 65`       |
| `isempty`    | Value is empty     | `{Notes} isempty`               |
| `isnotempty` | Value is not empty | `{Email} isnotempty`            |
| `isnan`      | Value is NaN       | `{Score} isnan`                 |

#### Logical

| Operator | Description | Example                               |
|----------|-------------|---------------------------------------|
| `and`    | Logical AND | `{Age} > 18 and {Status} == "active"` |
| `or`     | Logical OR  | `{Type} == "A" or {Type} == "B"`      |
| `not`    | Negation    | `not {Status} == "deleted"`           |

Parentheses control precedence: `({Type} == "A" or {Type} == "B") and {Active} == True`

### Conditions (suffix-if)

Conditions use a **suffix-if** syntax: the result expression comes first, then the condition. This keeps the focus on
the output, not the branching logic.

#### Simple condition (no else — result is None when false)

```
{Price} * 0.8 if {Country} == "FR"
```

#### With else

```
{Price} * 0.8 if {Country} == "FR" else {Price}
```

#### Chained conditions

```
{Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price}
```

#### With logical operators

```
{Price} * 0.8 if {Country} == "FR" and {Quantity} > 10 else {Price}
```

#### With grouping

```
{Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10
```

### Functions

#### Math

| Function          | Description           | Example                       |
|-------------------|-----------------------|-------------------------------|
| `round(expr, n)`  | Round to n decimals   | `round({Price} * 1.2, 2)`     |
| `abs(expr)`       | Absolute value        | `abs({Balance})`              |
| `min(expr, expr)` | Minimum of two values | `min({Price}, {MaxPrice})`    |
| `max(expr, expr)` | Maximum of two values | `max({Score}, 0)`             |
| `sum(expr, ...)`  | Sum of values         | `sum({Q1}, {Q2}, {Q3}, {Q4})` |
| `avg(expr, ...)`  | Average of values     | `avg({Q1}, {Q2}, {Q3}, {Q4})` |

#### Text

| Function            | Description         | Example                        |
|---------------------|---------------------|--------------------------------|
| `upper(expr)`       | Uppercase           | `upper({Name})`                |
| `lower(expr)`       | Lowercase           | `lower({Email})`               |
| `len(expr)`         | String length       | `len({Description})`           |
| `concat(expr, ...)` | Concatenate strings | `concat({First}, " ", {Last})` |
| `trim(expr)`        | Remove whitespace   | `trim({Input})`                |
| `left(expr, n)`     | First n characters  | `left({Code}, 3)`              |
| `right(expr, n)`    | Last n characters   | `right({Phone}, 4)`            |

#### Date

| Function               | Description        | Example                        |
|------------------------|--------------------|--------------------------------|
| `year(expr)`           | Extract year       | `year({CreatedAt})`            |
| `month(expr)`          | Extract month      | `month({CreatedAt})`           |
| `day(expr)`            | Extract day        | `day({CreatedAt})`             |
| `today()`              | Current date       | `datediff({DueDate}, today())` |
| `datediff(expr, expr)` | Difference in days | `datediff({End}, {Start})`     |

#### Aggregation (for cross-table contexts)

| Function      | Description  | Example                                             |
|---------------|--------------|-----------------------------------------------------|
| `sum(expr)`   | Sum values   | `sum({Orders.Amount WHERE Orders.ClientId = Id})`   |
| `count(expr)` | Count values | `count({Orders.Id WHERE Orders.ClientId = Id})`     |
| `avg(expr)`   | Average      | `avg({Reviews.Score WHERE Reviews.ProductId = Id})` |
| `min(expr)`   | Minimum      | `min({Bids.Price WHERE Bids.ItemId = Id})`          |
| `max(expr)`   | Maximum      | `max({Bids.Price WHERE Bids.ItemId = Id})`          |

## Cross-Table References

### Direct Reference

Reference a column from another table using `{TableName.ColumnName}`:

```
{Products.Price} * {Quantity}
```

### Join Resolution (implicit)

When referencing another table without a WHERE clause, the join is resolved automatically:

1. **By `id` column**: if both tables have a column named `id`, rows are matched on equal `id` values
2. **By row index**: if no `id` column exists in both tables, rows are matched by their internal row index (stable
   across sort/filter)

### Explicit Join (WHERE clause)

For explicit control over which row of the other table to use:

```
{Products.Price WHERE Products.Code = ProductCode} * {Quantity}
```

Inside the WHERE clause:

- `Products.Code` refers to a column in the referenced table
- `ProductCode` (no `Table.` prefix) refers to a column in the current table

### Aggregation with Cross-Table

When a cross-table reference matches multiple rows, use an aggregation function:

```
sum({OrderLines.Amount WHERE OrderLines.OrderId = Id})
```

Without aggregation, a multi-row match returns the first matching value.

## Calculation Engine

### Dependency Graph (DAG)

The formula system maintains a **Directed Acyclic Graph** of dependencies between columns:

- **Nodes**: each formula column is a node, identified by `table_name.column_id`
- **Edges**: if column A's formula references column B, an edge B → A exists ("A depends on B")
- Both directions are tracked:
    - **Precedents**: columns that a formula reads from
    - **Dependents**: columns that need recalculation when this column changes

Cross-table references create edges that span DataGrid instances, managed at the `DataGridsManager` level.

### Dirty Flag Propagation

When a source column's data changes:

1. The source column is marked **dirty**
2. All direct dependents are marked dirty
3. Propagation continues recursively through the DAG
4. Each dirty column maintains a **dirty row set**: the specific row indices that need recalculation

This propagation is **immediate** (fast — only flag marking, no computation).

### Recalculation Strategy (Hybrid)

Actual computation is **deferred to rendering time**:

1. On value change → dirty flags propagate instantly through the DAG
2. On page render (`mk_body_content_page`) → only dirty rows within the visible page (up to 1000 rows) are recalculated
3. Off-screen pages remain dirty until scrolled into view
4. Calculation follows **topological order** of the DAG to ensure precedents are computed before dependents

### Cycle Detection

Before adding a formula, the engine checks for cycles in the DAG using Kahn's algorithm during topological sort. If a
cycle is detected:

- The formula is **rejected**
- The editor displays an error identifying the circular dependency chain
- The previous formula (if any) remains unchanged

### Caching

Each formula column caches its computed values:

- Results are stored in `ns_fast_access[col_id]` alongside raw data columns
- The dirty row set tracks which cached values are stale
- Non-dirty rows return their cached value without re-evaluation
- Cache is invalidated per-row when source data changes

## Evaluation

### Row-by-Row Execution

Formulas are evaluated **row-by-row** within the page being rendered. For each row:

1. Resolve column references `{ColumnName}` to the cell value at the current row index
2. Resolve cross-table references `{Table.Column}` via the join mechanism
3. Evaluate the expression with resolved values
4. Store the result in the cache (`ns_fast_access`)

### Parser

The formula language uses a **custom grammar** parsed with Lark (consistent with the formatting DSL). The parser:

1. Tokenizes the formula string
2. Builds an AST (Abstract Syntax Tree)
3. Transforms the AST into an evaluable representation
4. Extracts column references for dependency graph registration

### Error Handling

| Error Type            | Behavior                                              |
|-----------------------|-------------------------------------------------------|
| Syntax error          | Editor highlights the error, formula not saved        |
| Unknown column        | Editor highlights, autocompletion suggests fixes      |
| Type mismatch         | Cell displays error indicator, other cells unaffected |
| Division by zero      | Cell displays `#DIV/0!` or None                       |
| Circular dependency   | Formula rejected, editor shows cycle chain            |
| Cross-table not found | Editor highlights unknown table name                  |
| No join match         | Cell displays None                                    |

## User Interface

### Creating a Formula Column

Formula columns are created and edited through the **DataGridColumnsManager**:

1. User opens the Columns Manager panel
2. Adds a new column or edits an existing one
3. Selects column type **"Formula"**
4. A **DslEditor** (CodeMirror 5) opens for formula input
5. The editor provides:
    - **Syntax highlighting**: keywords, column references, functions, operators
    - **Autocompletion**: column names (current table and other tables), function names, table names
    - **Validation**: real-time syntax checking and dependency cycle detection
    - **Error markers**: inline error indicators with descriptions

### Formula Column Properties

A formula column extends `DataGridColumnState` with:

| Property                                                                  | Type          | Description                                    |
|---------------------------------------------------------------------------|---------------|------------------------------------------------|
| `formula`                                                                 | `str` or None | The formula expression (None for data columns) |
| `col_type`                                                                | `ColumnType`  | Set to `ColumnType.Formula`                    |
| Other properties (`title`, `visible`, `width`, `format`) remain unchanged |

Formula columns are **read-only** in the grid body — cell values are computed, not editable. Formatting rules from the
formatting DSL apply to formula columns like any other column.

## Integration Points

| Component                | Role                                                     |
|--------------------------|----------------------------------------------------------|
| `DataGridColumnState`    | Stores `formula` field and `ColumnType.Formula` type     |
| `DatagridStore`          | `ns_fast_access` caches formula results as numpy arrays  |
| `DataGridColumnsManager` | UI for creating/editing formula columns                  |
| `DataGridsManager`       | Hosts the global dependency DAG across all tables        |
| `DslEditor`              | CodeMirror 5 editor with highlighting and autocompletion |
| `FormattingEngine`       | Applies formatting rules AFTER formula evaluation        |
| `mk_body_content_page()` | Triggers formula computation for visible rows            |
| `mk_body_cell_content()` | Reads computed values from `ns_fast_access`              |

## Syntax Summary

```
# Basic arithmetic
{Price} * {Quantity}

# Function call
round({Price} * 1.2, 2)

# Simple condition (None if false)
{Price} * 0.8 if {Country} == "FR"

# Condition with else
{Price} * 0.8 if {Country} == "FR" else {Price}

# Chained conditions
{Price} * 0.8 if {Country} == "FR" else {Price} * 0.9 if {Country} == "DE" else {Price}

# Logical operators
{Price} * 0.8 if {Country} == "FR" and {Quantity} > 10

# Grouping
{Price} * 0.8 if ({Country} == "FR" or {Country} == "DE") and {Quantity} > 10

# Cross-table (implicit join on id)
{Products.Price} * {Quantity}

# Cross-table (explicit join)
{Products.Price WHERE Products.Code = ProductCode} * {Quantity}

# Cross-table aggregation
sum({OrderLines.Amount WHERE OrderLines.OrderId = Id})

# Nested functions
round(avg({Q1}, {Q2}, {Q3}, {Q4}), 1)

# Text operations
concat(upper(left({FirstName}, 1)), ". ", {LastName})
```

## Future: Cell-Level Overrides

The architecture supports adding cell-level formula overrides with ~20-30% additional work:

- **Storage**: sparse dict `cell_formulas: dict[(col_id, row_index), str]` (same pattern as `cell_formats`)
- **DAG**: new node type `table.column[row]` alongside existing `table.column` nodes
- **Evaluation**: "does this cell have an override? If yes, use it. Otherwise, use the column formula."
- **Node ID scheme**: designed to be extensible from the start (`table.column` for columns, `table.column[row]` for
  cells)