8.0 KiB
Profiler — Design & Implementation Plan
Context
Performance issues were identified during keyboard navigation in the DataGrid (173ms server-side
per command call). The HTMX debug traces (via htmx_debug.js) confirmed the bottleneck is
server-side. A persistent, in-application profiling system is needed for continuous analysis
across sessions and future investigations.
Design Decisions
Data Collection Strategy
Two complementary levels:
- Level A (route handler): One trace per
/myfasthtml/commandscall. Captures total server-side duration including lookup, execution, and HTMX swap overhead. - Level B (granular spans): Decomposition of each trace into named phases. Activated by placing probes in the code.
Both levels are active simultaneously. Level A gives the global picture; Level B gives the breakdown.
Probe Mechanisms
Four complementary mechanisms, chosen based on the context:
1. Context manager — partial block instrumentation
with profiler.span("oob_swap"):
# only this block is timed
result = build_oob_elements(...)
Metadata can be attached during execution:
with profiler.span("query") as span:
rows = db.query(...)
span.set("row_count", len(rows))
2. Decorator — full function instrumentation
@profiler.span("callback")
def execute_callback(self, client_response):
...
Function arguments are captured automatically. Metadata can be attached via current_span():
@profiler.span("process")
def process(self, rows):
result = do_work(rows)
profiler.current_span().set("row_count", len(result))
return result
3. Cumulative span — loop instrumentation
For loops with many iterations. Aggregates instead of creating one span per iteration.
for row in rows:
with profiler.cumulative_span("process_row"):
process(row)
# or as a decorator
@profiler.cumulative_span("process_row")
def process_row(self, row):
...
Exposes: count, total, min, max, avg. Single entry in the trace tree regardless of
iteration count.
4. trace_all — class-level static instrumentation
Wraps all methods of a class at definition time. No runtime overhead beyond the spans themselves.
@profiler.trace_all
class DataGrid(MultipleInstance):
def navigate_cell(self, ...): # auto-spanned
...
# Exclude specific methods
@profiler.trace_all(exclude=["__ft__", "render"])
class DataGrid(MultipleInstance):
...
Implementation: uses inspect to iterate over methods and wraps each with @profiler.span().
No sys.settrace() involved — pure static wrapping.
5. trace_calls — sub-call exploration
Traces all function calls made within a single function, recursively. Used for exploration when the bottleneck location is unknown.
@profiler.trace_calls
def navigate_cell(self, ...):
self._update_selection() # auto-traced as child span
self._compute_visible() # auto-traced as child span
db.query(...) # auto-traced as child span
Implementation: uses sys.setprofile() scoped to the decorated function's execution only.
Overhead is localized to that function's call stack. This is an exploration tool — use it
to identify hotspots, then replace with explicit probes.
Span Hierarchy
Hierarchy is determined by code nesting via a ContextVar stack (async-safe). No explicit
parent references required.
with profiler.span("execute"): # root
with profiler.span("callback"): # child of execute
result = self.callback(...)
with profiler.span("oob_swap"): # sibling of callback
...
When a command calls another command, the second command's spans automatically become children of the first command's active span.
profiler.current_span() provides access to the active span from anywhere in the call stack.
Storage
- Scope: Global (all sessions). Profiling measures server behavior, not per-user state.
- Structure:
dequewith a configurable maximum size. - Default size: 500 traces (constant
PROFILER_MAX_TRACES). - Eviction: Oldest traces are dropped when the buffer is full (FIFO).
- Persistence: In-memory only. Lost on server restart.
Toggle and Clear
profiler.enabled— boolean flag. WhenFalse, all probe mechanisms are no-ops (zero overhead).profiler.clear()— empties the trace buffer.- Both are controllable from the UI control.
Overhead Measurement
The ProfilingManager self-profiles its own span.__enter__ and span.__exit__ calls.
Exposes:
overhead_per_span_ns— average cost of one span boundary in nanosecondstotal_overhead_ms— estimated total overhead across all active spans
Visible in the UI to verify the profiler does not bias measurements significantly.
Data Model
ProfilingTrace
command_name: str
command_id: str
kwargs: dict
timestamp: datetime
total_duration_ms: float
root_span: ProfilingSpan
ProfilingSpan
name: str
start: float (perf_counter)
duration_ms: float
data: dict (attached via span.set())
children: list[ProfilingSpan | CumulativeSpan]
CumulativeSpan
name: str
count: int
total_ms: float
min_ms: float
max_ms: float
avg_ms: float
Existing Code Hooks
src/myfasthtml/core/utils.py — route handler (Level A)
@utils_rt(Routes.Commands)
async def post(session, c_id: str, client_response: dict = None):
with profiler.span("command", args={"c_id": c_id}):
command = CommandsManager.get_command(c_id)
return await command.execute(client_response)
src/myfasthtml/core/commands.py — execution phases (Level B)
def execute(self, client_response=None):
with profiler.span("before_commands"):
...
with profiler.span("callback"):
result = self.callback(...)
with profiler.span("after_commands"):
...
with profiler.span("oob_swap"):
...
Implementation Plan
Phase 1 — Core
File: src/myfasthtml/core/profiler.py
ProfilingSpandataclassCumulativeSpandataclassProfilingTracedataclassProfilingManagerclass with all probe mechanismsprofilersingleton- Hook into
utils.py(Level A) - Hook into
commands.py(Level B)
Tests: tests/core/test_profiler.py
| Test | Description |
|---|---|
test_i_can_create_a_span |
Basic span creation and timing |
test_i_can_nest_spans |
Child spans are correctly parented |
test_i_can_use_span_as_decorator |
Decorator captures args automatically |
test_i_can_use_cumulative_span |
Aggregates count/total/min/max/avg |
test_i_can_attach_data_to_span |
span.set() and current_span().set() |
test_i_can_clear_traces |
Buffer is emptied after clear() |
test_i_can_enable_disable_profiler |
Probes are no-ops when disabled |
test_i_can_measure_overhead |
Overhead metrics are exposed |
test_i_can_use_trace_all_on_class |
All methods of a class are wrapped |
test_i_can_use_trace_calls_on_function |
Sub-calls are traced via setprofile |
Phase 2 — Controls
src/myfasthtml/controls/ProfilerList.py (SingleInstance)
- Table of all traces: command name / total duration / timestamp
- Right panel: trace detail (kwargs, span breakdown)
- Buttons: enable/disable, clear
- Click on a trace → opens ProfilerDetail
src/myfasthtml/controls/ProfilerDetail.py (MultipleInstance)
- Hierarchical span tree for a single trace
- Two display modes: list and pie chart
- Click on a span → zooms into its children (if any)
- Displays cumulative spans with count/min/max/avg
- Shows overhead metrics
src/myfasthtml/controls/ProfilerPieChart.py (future)
- Pie chart visualization of span distribution at a given zoom level
Naming Conventions
- Control files:
ProfilerXxx.py - CSS classes:
mf-profiler-xxx - Logger:
logging.getLogger("Profiler") - Constant:
PROFILER_MAX_TRACES = 500insrc/myfasthtml/core/constants.py