Files
MyFastHtml/docs/Profiler.md
2026-03-21 18:08:34 +01:00

8.0 KiB

Profiler — Design & Implementation Plan

Context

Performance issues were identified during keyboard navigation in the DataGrid (173ms server-side per command call). The HTMX debug traces (via htmx_debug.js) confirmed the bottleneck is server-side. A persistent, in-application profiling system is needed for continuous analysis across sessions and future investigations.

Design Decisions

Data Collection Strategy

Two complementary levels:

  • Level A (route handler): One trace per /myfasthtml/commands call. Captures total server-side duration including lookup, execution, and HTMX swap overhead.
  • Level B (granular spans): Decomposition of each trace into named phases. Activated by placing probes in the code.

Both levels are active simultaneously. Level A gives the global picture; Level B gives the breakdown.

Probe Mechanisms

Four complementary mechanisms, chosen based on the context:

1. Context manager — partial block instrumentation

with profiler.span("oob_swap"):
    # only this block is timed
    result = build_oob_elements(...)

Metadata can be attached during execution:

with profiler.span("query") as span:
    rows = db.query(...)
    span.set("row_count", len(rows))

2. Decorator — full function instrumentation

@profiler.span("callback")
def execute_callback(self, client_response):
    ...

Function arguments are captured automatically. Metadata can be attached via current_span():

@profiler.span("process")
def process(self, rows):
    result = do_work(rows)
    profiler.current_span().set("row_count", len(result))
    return result

3. Cumulative span — loop instrumentation

For loops with many iterations. Aggregates instead of creating one span per iteration.

for row in rows:
    with profiler.cumulative_span("process_row"):
        process(row)

# or as a decorator
@profiler.cumulative_span("process_row")
def process_row(self, row):
    ...

Exposes: count, total, min, max, avg. Single entry in the trace tree regardless of iteration count.

4. trace_all — class-level static instrumentation

Wraps all methods of a class at definition time. No runtime overhead beyond the spans themselves.

@profiler.trace_all
class DataGrid(MultipleInstance):
    def navigate_cell(self, ...):  # auto-spanned
        ...

# Exclude specific methods
@profiler.trace_all(exclude=["__ft__", "render"])
class DataGrid(MultipleInstance):
    ...

Implementation: uses inspect to iterate over methods and wraps each with @profiler.span(). No sys.settrace() involved — pure static wrapping.

5. trace_calls — sub-call exploration

Traces all function calls made within a single function, recursively. Used for exploration when the bottleneck location is unknown.

@profiler.trace_calls
def navigate_cell(self, ...):
    self._update_selection()   # auto-traced as child span
    self._compute_visible()    # auto-traced as child span
    db.query(...)              # auto-traced as child span

Implementation: uses sys.setprofile() scoped to the decorated function's execution only. Overhead is localized to that function's call stack. This is an exploration tool — use it to identify hotspots, then replace with explicit probes.

Span Hierarchy

Hierarchy is determined by code nesting via a ContextVar stack (async-safe). No explicit parent references required.

with profiler.span("execute"):           # root
    with profiler.span("callback"):      # child of execute
        result = self.callback(...)
    with profiler.span("oob_swap"):      # sibling of callback
        ...

When a command calls another command, the second command's spans automatically become children of the first command's active span.

profiler.current_span() provides access to the active span from anywhere in the call stack.

Storage

  • Scope: Global (all sessions). Profiling measures server behavior, not per-user state.
  • Structure: deque with a configurable maximum size.
  • Default size: 500 traces (constant PROFILER_MAX_TRACES).
  • Eviction: Oldest traces are dropped when the buffer is full (FIFO).
  • Persistence: In-memory only. Lost on server restart.

Toggle and Clear

  • profiler.enabled — boolean flag. When False, all probe mechanisms are no-ops (zero overhead).
  • profiler.clear() — empties the trace buffer.
  • Both are controllable from the UI control.

Overhead Measurement

The ProfilingManager self-profiles its own span.__enter__ and span.__exit__ calls. Exposes:

  • overhead_per_span_ns — average cost of one span boundary in nanoseconds
  • total_overhead_ms — estimated total overhead across all active spans

Visible in the UI to verify the profiler does not bias measurements significantly.


Data Model

ProfilingTrace
  command_name: str
  command_id: str
  kwargs: dict
  timestamp: datetime
  total_duration_ms: float
  root_span: ProfilingSpan

ProfilingSpan
  name: str
  start: float          (perf_counter)
  duration_ms: float
  data: dict            (attached via span.set())
  children: list[ProfilingSpan | CumulativeSpan]

CumulativeSpan
  name: str
  count: int
  total_ms: float
  min_ms: float
  max_ms: float
  avg_ms: float

Existing Code Hooks

src/myfasthtml/core/utils.py — route handler (Level A)

@utils_rt(Routes.Commands)
async def post(session, c_id: str, client_response: dict = None):
    with profiler.span("command", args={"c_id": c_id}):
        command = CommandsManager.get_command(c_id)
        return await command.execute(client_response)

src/myfasthtml/core/commands.py — execution phases (Level B)

def execute(self, client_response=None):
    with profiler.span("before_commands"):
        ...
    with profiler.span("callback"):
        result = self.callback(...)
    with profiler.span("after_commands"):
        ...
    with profiler.span("oob_swap"):
        ...

Implementation Plan

Phase 1 — Core

File: src/myfasthtml/core/profiler.py

  1. ProfilingSpan dataclass
  2. CumulativeSpan dataclass
  3. ProfilingTrace dataclass
  4. ProfilingManager class with all probe mechanisms
  5. profiler singleton
  6. Hook into utils.py (Level A)
  7. Hook into commands.py (Level B)

Tests: tests/core/test_profiler.py

Test Description
test_i_can_create_a_span Basic span creation and timing
test_i_can_nest_spans Child spans are correctly parented
test_i_can_use_span_as_decorator Decorator captures args automatically
test_i_can_use_cumulative_span Aggregates count/total/min/max/avg
test_i_can_attach_data_to_span span.set() and current_span().set()
test_i_can_clear_traces Buffer is emptied after clear()
test_i_can_enable_disable_profiler Probes are no-ops when disabled
test_i_can_measure_overhead Overhead metrics are exposed
test_i_can_use_trace_all_on_class All methods of a class are wrapped
test_i_can_use_trace_calls_on_function Sub-calls are traced via setprofile

Phase 2 — Controls

src/myfasthtml/controls/ProfilerList.py (SingleInstance)

  • Table of all traces: command name / total duration / timestamp
  • Right panel: trace detail (kwargs, span breakdown)
  • Buttons: enable/disable, clear
  • Click on a trace → opens ProfilerDetail

src/myfasthtml/controls/ProfilerDetail.py (MultipleInstance)

  • Hierarchical span tree for a single trace
  • Two display modes: list and pie chart
  • Click on a span → zooms into its children (if any)
  • Displays cumulative spans with count/min/max/avg
  • Shows overhead metrics

src/myfasthtml/controls/ProfilerPieChart.py (future)

  • Pie chart visualization of span distribution at a given zoom level

Naming Conventions

  • Control files: ProfilerXxx.py
  • CSS classes: mf-profiler-xxx
  • Logger: logging.getLogger("Profiler")
  • Constant: PROFILER_MAX_TRACES = 500 in src/myfasthtml/core/constants.py