Files

Kodjo Sossouvi 72d6cce6ff Adding Profiler module

2026-03-21 18:08:34 +01:00

8.0 KiB

Raw Blame History

Profiler — Design & Implementation Plan

Context

Performance issues were identified during keyboard navigation in the DataGrid (173ms server-side per command call). The HTMX debug traces (via htmx_debug.js) confirmed the bottleneck is server-side. A persistent, in-application profiling system is needed for continuous analysis across sessions and future investigations.

Design Decisions

Data Collection Strategy

Two complementary levels:

Level A (route handler): One trace per /myfasthtml/commands call. Captures total server-side duration including lookup, execution, and HTMX swap overhead.
Level B (granular spans): Decomposition of each trace into named phases. Activated by placing probes in the code.

Both levels are active simultaneously. Level A gives the global picture; Level B gives the breakdown.

Probe Mechanisms

Four complementary mechanisms, chosen based on the context:

1. Context manager — partial block instrumentation

with profiler.span("oob_swap"):
    # only this block is timed
    result = build_oob_elements(...)

Metadata can be attached during execution:

with profiler.span("query") as span:
    rows = db.query(...)
    span.set("row_count", len(rows))

2. Decorator — full function instrumentation

@profiler.span("callback")
def execute_callback(self, client_response):
    ...

Function arguments are captured automatically. Metadata can be attached via current_span():

@profiler.span("process")
def process(self, rows):
    result = do_work(rows)
    profiler.current_span().set("row_count", len(result))
    return result

3. Cumulative span — loop instrumentation

For loops with many iterations. Aggregates instead of creating one span per iteration.

for row in rows:
    with profiler.cumulative_span("process_row"):
        process(row)

# or as a decorator
@profiler.cumulative_span("process_row")
def process_row(self, row):
    ...

Exposes: count, total, min, max, avg. Single entry in the trace tree regardless of iteration count.

4. `trace_all` — class-level static instrumentation

Wraps all methods of a class at definition time. No runtime overhead beyond the spans themselves.

@profiler.trace_all
class DataGrid(MultipleInstance):
    def navigate_cell(self, ...):  # auto-spanned
        ...

# Exclude specific methods
@profiler.trace_all(exclude=["__ft__", "render"])
class DataGrid(MultipleInstance):
    ...

Implementation: uses inspect to iterate over methods and wraps each with @profiler.span(). No sys.settrace() involved — pure static wrapping.

5. `trace_calls` — sub-call exploration

Traces all function calls made within a single function, recursively. Used for exploration when the bottleneck location is unknown.

@profiler.trace_calls
def navigate_cell(self, ...):
    self._update_selection()   # auto-traced as child span
    self._compute_visible()    # auto-traced as child span
    db.query(...)              # auto-traced as child span

Implementation: uses sys.setprofile() scoped to the decorated function's execution only. Overhead is localized to that function's call stack. This is an exploration tool — use it to identify hotspots, then replace with explicit probes.

Span Hierarchy

Hierarchy is determined by code nesting via a ContextVar stack (async-safe). No explicit parent references required.

with profiler.span("execute"):           # root
    with profiler.span("callback"):      # child of execute
        result = self.callback(...)
    with profiler.span("oob_swap"):      # sibling of callback
        ...

When a command calls another command, the second command's spans automatically become children of the first command's active span.

profiler.current_span() provides access to the active span from anywhere in the call stack.

Storage

Scope: Global (all sessions). Profiling measures server behavior, not per-user state.
Structure: deque with a configurable maximum size.
Default size: 500 traces (constant PROFILER_MAX_TRACES).
Eviction: Oldest traces are dropped when the buffer is full (FIFO).
Persistence: In-memory only. Lost on server restart.

Toggle and Clear

profiler.enabled — boolean flag. When False, all probe mechanisms are no-ops (zero overhead).
profiler.clear() — empties the trace buffer.
Both are controllable from the UI control.

Overhead Measurement

The ProfilingManager self-profiles its own span.__enter__ and span.__exit__ calls. Exposes:

overhead_per_span_ns — average cost of one span boundary in nanoseconds
total_overhead_ms — estimated total overhead across all active spans

Visible in the UI to verify the profiler does not bias measurements significantly.

Data Model

ProfilingTrace
  command_name: str
  command_id: str
  kwargs: dict
  timestamp: datetime
  total_duration_ms: float
  root_span: ProfilingSpan

ProfilingSpan
  name: str
  start: float          (perf_counter)
  duration_ms: float
  data: dict            (attached via span.set())
  children: list[ProfilingSpan | CumulativeSpan]

CumulativeSpan
  name: str
  count: int
  total_ms: float
  min_ms: float
  max_ms: float
  avg_ms: float

Existing Code Hooks

`src/myfasthtml/core/utils.py` — route handler (Level A)

@utils_rt(Routes.Commands)
async def post(session, c_id: str, client_response: dict = None):
    with profiler.span("command", args={"c_id": c_id}):
        command = CommandsManager.get_command(c_id)
        return await command.execute(client_response)

`src/myfasthtml/core/commands.py` — execution phases (Level B)

def execute(self, client_response=None):
    with profiler.span("before_commands"):
        ...
    with profiler.span("callback"):
        result = self.callback(...)
    with profiler.span("after_commands"):
        ...
    with profiler.span("oob_swap"):
        ...

Implementation Plan

Phase 1 — Core

File: src/myfasthtml/core/profiler.py

ProfilingSpan dataclass
CumulativeSpan dataclass
ProfilingTrace dataclass
ProfilingManager class with all probe mechanisms
profiler singleton
Hook into utils.py (Level A)
Hook into commands.py (Level B)

Tests: tests/core/test_profiler.py

Test	Description
`test_i_can_create_a_span`	Basic span creation and timing
`test_i_can_nest_spans`	Child spans are correctly parented
`test_i_can_use_span_as_decorator`	Decorator captures args automatically
`test_i_can_use_cumulative_span`	Aggregates count/total/min/max/avg
`test_i_can_attach_data_to_span`	`span.set()` and `current_span().set()`
`test_i_can_clear_traces`	Buffer is emptied after `clear()`
`test_i_can_enable_disable_profiler`	Probes are no-ops when disabled
`test_i_can_measure_overhead`	Overhead metrics are exposed
`test_i_can_use_trace_all_on_class`	All methods of a class are wrapped
`test_i_can_use_trace_calls_on_function`	Sub-calls are traced via setprofile

Phase 2 — Controls

src/myfasthtml/controls/ProfilerList.py (SingleInstance)

Table of all traces: command name / total duration / timestamp
Right panel: trace detail (kwargs, span breakdown)
Buttons: enable/disable, clear
Click on a trace → opens ProfilerDetail

src/myfasthtml/controls/ProfilerDetail.py (MultipleInstance)

Hierarchical span tree for a single trace
Two display modes: list and pie chart
Click on a span → zooms into its children (if any)
Displays cumulative spans with count/min/max/avg
Shows overhead metrics

src/myfasthtml/controls/ProfilerPieChart.py (future)

Pie chart visualization of span distribution at a given zoom level

Naming Conventions

Control files: ProfilerXxx.py
CSS classes: mf-profiler-xxx
Logger: logging.getLogger("Profiler")
Constant: PROFILER_MAX_TRACES = 500 in src/myfasthtml/core/constants.py

8.0 KiB Raw Blame History