diff --git a/docs/Profiler.md b/docs/Profiler.md
new file mode 100644
index 0000000..67934e5
--- /dev/null
+++ b/docs/Profiler.md
@@ -0,0 +1,271 @@
+# Profiler — Design & Implementation Plan
+
+## Context
+
+Performance issues were identified during keyboard navigation in the DataGrid (173ms server-side
+per command call). The HTMX debug traces (via `htmx_debug.js`) confirmed the bottleneck is
+server-side. A persistent, in-application profiling system is needed for continuous analysis
+across sessions and future investigations.
+
+## Design Decisions
+
+### Data Collection Strategy
+
+Two complementary levels:
+
+- **Level A** (route handler): One trace per `/myfasthtml/commands` call. Captures total
+  server-side duration including lookup, execution, and HTMX swap overhead.
+- **Level B** (granular spans): Decomposition of each trace into named phases. Activated
+  by placing probes in the code.
+
+Both levels are active simultaneously. Level A gives the global picture; Level B gives the
+breakdown.
+
+### Probe Mechanisms
+
+Four complementary mechanisms, chosen based on the context:
+
+#### 1. Context manager — partial block instrumentation
+
+```python
+with profiler.span("oob_swap"):
+    # only this block is timed
+    result = build_oob_elements(...)
+```
+
+Metadata can be attached during execution:
+
+```python
+with profiler.span("query") as span:
+    rows = db.query(...)
+    span.set("row_count", len(rows))
+```
+
+#### 2. Decorator — full function instrumentation
+
+```python
+@profiler.span("callback")
+def execute_callback(self, client_response):
+    ...
+```
+
+Function arguments are captured automatically. Metadata can be attached via `current_span()`:
+
+```python
+@profiler.span("process")
+def process(self, rows):
+    result = do_work(rows)
+    profiler.current_span().set("row_count", len(result))
+    return result
+```
+
+#### 3. Cumulative span — loop instrumentation
+
+For loops with many iterations. Aggregates instead of creating one span per iteration.
+
+```python
+for row in rows:
+    with profiler.cumulative_span("process_row"):
+        process(row)
+
+# or as a decorator
+@profiler.cumulative_span("process_row")
+def process_row(self, row):
+    ...
+```
+
+Exposes: `count`, `total`, `min`, `max`, `avg`. Single entry in the trace tree regardless of
+iteration count.
+
+#### 4. `trace_all` — class-level static instrumentation
+
+Wraps all methods of a class at definition time. No runtime overhead beyond the spans themselves.
+
+```python
+@profiler.trace_all
+class DataGrid(MultipleInstance):
+    def navigate_cell(self, ...):  # auto-spanned
+        ...
+
+# Exclude specific methods
+@profiler.trace_all(exclude=["__ft__", "render"])
+class DataGrid(MultipleInstance):
+    ...
+```
+
+Implementation: uses `inspect` to iterate over methods and wraps each with `@profiler.span()`.
+No `sys.settrace()` involved — pure static wrapping.
+
+#### 5. `trace_calls` — sub-call exploration
+
+Traces all function calls made within a single function, recursively. Used for exploration
+when the bottleneck location is unknown.
+
+```python
+@profiler.trace_calls
+def navigate_cell(self, ...):
+    self._update_selection()   # auto-traced as child span
+    self._compute_visible()    # auto-traced as child span
+    db.query(...)              # auto-traced as child span
+```
+
+Implementation: uses `sys.setprofile()` scoped to the decorated function's execution only.
+Overhead is localized to that function's call stack. This is an exploration tool — use it
+to identify hotspots, then replace with explicit probes.
+
+### Span Hierarchy
+
+Hierarchy is determined by code nesting via a `ContextVar` stack (async-safe). No explicit
+parent references required.
+
+```python
+with profiler.span("execute"):           # root
+    with profiler.span("callback"):      # child of execute
+        result = self.callback(...)
+    with profiler.span("oob_swap"):      # sibling of callback
+        ...
+```
+
+When a command calls another command, the second command's spans automatically become children
+of the first command's active span.
+
+`profiler.current_span()` provides access to the active span from anywhere in the call stack.
+
+### Storage
+
+- **Scope**: Global (all sessions). Profiling measures server behavior, not per-user state.
+- **Structure**: `deque` with a configurable maximum size.
+- **Default size**: 500 traces (constant `PROFILER_MAX_TRACES`).
+- **Eviction**: Oldest traces are dropped when the buffer is full (FIFO).
+- **Persistence**: In-memory only. Lost on server restart.
+
+### Toggle and Clear
+
+- `profiler.enabled` — boolean flag. When `False`, all probe mechanisms are no-ops (zero overhead).
+- `profiler.clear()` — empties the trace buffer.
+- Both are controllable from the UI control.
+
+### Overhead Measurement
+
+The `ProfilingManager` self-profiles its own `span.__enter__` and `span.__exit__` calls.
+Exposes:
+
+- `overhead_per_span_ns` — average cost of one span boundary in nanoseconds
+- `total_overhead_ms` — estimated total overhead across all active spans
+
+Visible in the UI to verify the profiler does not bias measurements significantly.
+
+---
+
+## Data Model
+
+```
+ProfilingTrace
+  command_name: str
+  command_id: str
+  kwargs: dict
+  timestamp: datetime
+  total_duration_ms: float
+  root_span: ProfilingSpan
+
+ProfilingSpan
+  name: str
+  start: float          (perf_counter)
+  duration_ms: float
+  data: dict            (attached via span.set())
+  children: list[ProfilingSpan | CumulativeSpan]
+
+CumulativeSpan
+  name: str
+  count: int
+  total_ms: float
+  min_ms: float
+  max_ms: float
+  avg_ms: float
+```
+
+---
+
+## Existing Code Hooks
+
+### `src/myfasthtml/core/utils.py` — route handler (Level A)
+
+```python
+@utils_rt(Routes.Commands)
+async def post(session, c_id: str, client_response: dict = None):
+    with profiler.span("command", args={"c_id": c_id}):
+        command = CommandsManager.get_command(c_id)
+        return await command.execute(client_response)
+```
+
+### `src/myfasthtml/core/commands.py` — execution phases (Level B)
+
+```python
+def execute(self, client_response=None):
+    with profiler.span("before_commands"):
+        ...
+    with profiler.span("callback"):
+        result = self.callback(...)
+    with profiler.span("after_commands"):
+        ...
+    with profiler.span("oob_swap"):
+        ...
+```
+
+---
+
+## Implementation Plan
+
+### Phase 1 — Core
+
+**File**: `src/myfasthtml/core/profiler.py`
+
+1. `ProfilingSpan` dataclass
+2. `CumulativeSpan` dataclass
+3. `ProfilingTrace` dataclass
+4. `ProfilingManager` class with all probe mechanisms
+5. `profiler` singleton
+6. Hook into `utils.py` (Level A)
+7. Hook into `commands.py` (Level B)
+
+**Tests**: `tests/core/test_profiler.py`
+
+| Test | Description |
+|------|-------------|
+| `test_i_can_create_a_span` | Basic span creation and timing |
+| `test_i_can_nest_spans` | Child spans are correctly parented |
+| `test_i_can_use_span_as_decorator` | Decorator captures args automatically |
+| `test_i_can_use_cumulative_span` | Aggregates count/total/min/max/avg |
+| `test_i_can_attach_data_to_span` | `span.set()` and `current_span().set()` |
+| `test_i_can_clear_traces` | Buffer is emptied after `clear()` |
+| `test_i_can_enable_disable_profiler` | Probes are no-ops when disabled |
+| `test_i_can_measure_overhead` | Overhead metrics are exposed |
+| `test_i_can_use_trace_all_on_class` | All methods of a class are wrapped |
+| `test_i_can_use_trace_calls_on_function` | Sub-calls are traced via setprofile |
+
+### Phase 2 — Controls
+
+**`src/myfasthtml/controls/ProfilerList.py`** (SingleInstance)
+- Table of all traces: command name / total duration / timestamp
+- Right panel: trace detail (kwargs, span breakdown)
+- Buttons: enable/disable, clear
+- Click on a trace → opens ProfilerDetail
+
+**`src/myfasthtml/controls/ProfilerDetail.py`** (MultipleInstance)
+- Hierarchical span tree for a single trace
+- Two display modes: list and pie chart
+- Click on a span → zooms into its children (if any)
+- Displays cumulative spans with count/min/max/avg
+- Shows overhead metrics
+
+**`src/myfasthtml/controls/ProfilerPieChart.py`** (future)
+- Pie chart visualization of span distribution at a given zoom level
+
+---
+
+## Naming Conventions
+
+- Control files: `ProfilerXxx.py`
+- CSS classes: `mf-profiler-xxx`
+- Logger: `logging.getLogger("Profiler")`
+- Constant: `PROFILER_MAX_TRACES = 500` in `src/myfasthtml/core/constants.py`
diff --git a/src/myfasthtml/assets/core/htmx_debug.js b/src/myfasthtml/assets/core/htmx_debug.js
new file mode 100644
index 0000000..f70c48c
--- /dev/null
+++ b/src/myfasthtml/assets/core/htmx_debug.js
@@ -0,0 +1,87 @@
+/**
+ * HTMX debug tracing — toggle with window.HTMX_DEBUG = true in the browser console.
+ *
+ * Each request gets a unique ID (#1, #2, ...). Timings are deltas from beforeRequest.
+ *
+ * Full event sequence (→ = network boundary):
+ *   beforeRequest  — request about to be sent
+ *   beforeSend     — XHR about to be sent (after HTMX setup)
+ *                 → network round-trip ←
+ *   beforeOnLoad   — response received, HTMX about to process it
+ *   beforeSwap     — DOM swap about to happen
+ *   afterSwap      — DOM swap done
+ *   afterSettle    — settle phase complete
+ *   afterRequest   — full request lifecycle complete
+ *   sendError      — network error
+ *   responseError  — non-2xx response
+ */
+
+window.HTMX_DEBUG = false;
+(function () {
+  console.log('Debug HTMX: htmx.logAll();');
+  console.log('Perf HTMX: window.HTMX_DEBUG=true;');
+})();
+
+(function () {
+  const EVENTS = [
+    'htmx:beforeRequest',
+    'htmx:beforeSend',
+    'htmx:beforeOnLoad',
+    'htmx:beforeSwap',
+    'htmx:afterSwap',
+    'htmx:afterSettle',
+    'htmx:afterRequest',
+    'htmx:sendError',
+    'htmx:responseError',
+  ];
+
+  let counter = 0;
+  const requests = new WeakMap();
+
+  function getInfo(detail) {
+    const key = detail?.requestConfig ?? detail?.xhr ?? null;
+    if (!key || !requests.has(key)) return null;
+    return requests.get(key);
+  }
+
+  EVENTS.forEach(eventName => {
+    document.addEventListener(eventName, (e) => {
+      if (!window.HTMX_DEBUG) return;
+
+      const short = eventName.replace('htmx:', '').padEnd(14);
+      const path = e.detail?.requestConfig?.path ?? e.detail?.pathInfo?.requestPath ?? '';
+      const isError = eventName === 'htmx:sendError' || eventName === 'htmx:responseError';
+
+      let prefix;
+
+      if (eventName === 'htmx:beforeRequest') {
+        const key = e.detail?.requestConfig ?? null;
+        if (key) {
+          const id = ++counter;
+          const now = performance.now();
+          requests.set(key, {id, start: now, last: now});
+          prefix = `#${String(id).padStart(3)}  +    0.0ms  (Δ    0.0ms)`;
+        } else {
+          prefix = `#  ?  +    0.0ms  (Δ    0.0ms)`;
+        }
+      } else {
+        const info = getInfo(e.detail);
+        if (info) {
+          const now  = performance.now();
+          const total = (now - info.start).toFixed(1);
+          const step  = (now - info.last).toFixed(1);
+          info.last   = now;
+          prefix = `#${String(info.id).padStart(3)}  +${String(total).padStart(7)}ms  (Δ${String(step).padStart(7)}ms)`;
+        } else {
+          prefix = `#  ?  +    ?.?ms  (Δ    ?.?ms)`;
+        }
+      }
+
+      if (isError) {
+        console.warn(`[HTMX] ${prefix}  ${short}`, path, e.detail);
+      } else {
+        console.debug(`[HTMX] ${prefix}  ${short}`, path);
+      }
+    });
+  });
+})();
diff --git a/src/myfasthtml/core/constants.py b/src/myfasthtml/core/constants.py
index cde3cba..a1b29c0 100644
--- a/src/myfasthtml/core/constants.py
+++ b/src/myfasthtml/core/constants.py
@@ -2,6 +2,8 @@ from enum import Enum
 
 NO_DEFAULT_VALUE = object()
 
+PROFILER_MAX_TRACES = 500
+
 ROUTE_ROOT = "/myfasthtml"
 
 # Datagrid
diff --git a/src/myfasthtml/core/profiler.py b/src/myfasthtml/core/profiler.py
new file mode 100644
index 0000000..049961e
--- /dev/null
+++ b/src/myfasthtml/core/profiler.py
@@ -0,0 +1,546 @@
+import functools
+import inspect
+import logging
+import sys
+import time
+from collections import deque
+from contextvars import ContextVar
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import Optional
+
+logger = logging.getLogger("Profiler")
+
+
+# ---------------------------------------------------------------------------
+# No-op sentinel — used when profiler is disabled
+# ---------------------------------------------------------------------------
+
+class _NullSpan:
+    """No-op span returned when profiler is disabled."""
+
+    def set(self, key: str, value) -> '_NullSpan':
+        return self
+
+    def __enter__(self) -> '_NullSpan':
+        return self
+
+    def __exit__(self, *args):
+        pass
+
+    def __call__(self, fn):
+        return fn
+
+
+# ---------------------------------------------------------------------------
+# Data model
+# ---------------------------------------------------------------------------
+
+@dataclass
+class CumulativeSpan:
+    """Aggregated span for loops — one entry regardless of iteration count.
+
+    Attributes:
+        name: Span name.
+        count: Number of iterations recorded.
+        total_ms: Cumulative duration in milliseconds.
+        min_ms: Shortest recorded iteration in milliseconds.
+        max_ms: Longest recorded iteration in milliseconds.
+    """
+
+    name: str
+    count: int = 0
+    total_ms: float = 0.0
+    min_ms: float = float('inf')
+    max_ms: float = 0.0
+
+    @property
+    def avg_ms(self) -> float:
+        """Average duration per iteration in milliseconds."""
+        return self.total_ms / self.count if self.count > 0 else 0.0
+
+    def record(self, duration_ms: float):
+        """Record one iteration.
+
+        Args:
+            duration_ms: Duration of this iteration in milliseconds.
+        """
+        self.count += 1
+        self.total_ms += duration_ms
+        if duration_ms < self.min_ms:
+            self.min_ms = duration_ms
+        if duration_ms > self.max_ms:
+            self.max_ms = duration_ms
+
+
+@dataclass
+class ProfilingSpan:
+    """A named timing segment.
+
+    Attributes:
+        name: Span name.
+        data: Arbitrary metadata attached via span.set().
+        children: Nested spans and cumulative spans.
+        duration_ms: Duration of this span in milliseconds (set on finish).
+    """
+
+    name: str
+    data: dict = field(default_factory=dict)
+    children: list = field(default_factory=list)
+    _start: float = field(default_factory=time.perf_counter, repr=False)
+    duration_ms: float = field(default=0.0)
+
+    def set(self, key: str, value) -> 'ProfilingSpan':
+        """Attach metadata to this span.
+
+        Args:
+            key: Metadata key.
+            value: Metadata value.
+
+        Returns:
+            Self, for chaining.
+        """
+        self.data[key] = value
+        return self
+
+    def finish(self):
+        """Stop timing and record the duration."""
+        self.duration_ms = (time.perf_counter() - self._start) * 1000
+
+
+@dataclass
+class ProfilingTrace:
+    """One complete command execution, from route handler to response.
+
+    Attributes:
+        command_name: Name of the executed command.
+        command_id: UUID of the command.
+        kwargs: Arguments received from the client.
+        timestamp: When the command was received.
+        root_span: Top-level span wrapping the full execution.
+        total_duration_ms: Total server-side duration in milliseconds.
+    """
+
+    command_name: str
+    command_id: str
+    kwargs: dict
+    timestamp: datetime
+    root_span: Optional[ProfilingSpan] = None
+    total_duration_ms: float = 0.0
+
+
+# ---------------------------------------------------------------------------
+# Context managers / decorators
+# ---------------------------------------------------------------------------
+
+class _ActiveSpan:
+    """Context manager and decorator for a single named span.
+
+    When used as a context manager, returns the ProfilingSpan so callers can
+    attach metadata via ``span.set()``.  When used as a decorator, captures
+    function arguments automatically.
+    """
+
+    def __init__(self, manager: 'ProfilingManager', name: str, args: dict = None):
+        self._manager = manager
+        self._name = name
+        self._args = args
+        self._span: Optional[ProfilingSpan] = None
+        self._token = None
+
+    def __enter__(self) -> ProfilingSpan:
+        if not self._manager.enabled:
+            return _NullSpan()
+
+        overhead_start = time.perf_counter()
+        self._span = ProfilingSpan(name=self._name)
+        if self._args:
+            self._span.data.update(self._args)
+        self._token = self._manager.push_span(self._span)
+        self._manager.record_overhead(time.perf_counter() - overhead_start)
+        return self._span
+
+    def __exit__(self, *args):
+        if self._span is not None:
+            overhead_start = time.perf_counter()
+            self._manager.pop_span(self._span, self._token)
+            self._manager.record_overhead(time.perf_counter() - overhead_start)
+
+    def __call__(self, fn):
+        """Use as decorator — enabled check deferred to call time."""
+        manager = self._manager
+        name = self._name
+
+        @functools.wraps(fn)
+        def wrapper(*args, **kwargs):
+            if not manager.enabled:
+                return fn(*args, **kwargs)
+            captured = manager.capture_args(fn, args, kwargs)
+            with _ActiveSpan(manager, name, captured):
+                return fn(*args, **kwargs)
+
+        return wrapper
+
+
+class _CumulativeActiveSpan:
+    """Context manager and decorator for cumulative spans.
+
+    Finds or creates a CumulativeSpan in the current parent and records
+    each iteration without adding a new child entry.
+    """
+
+    def __init__(self, manager: 'ProfilingManager', name: str):
+        self._manager = manager
+        self._name = name
+        self._cumulative_span: Optional[CumulativeSpan] = None
+        self._iter_start: float = 0.0
+
+    def _get_or_create(self) -> CumulativeSpan:
+        parent = self._manager.current_span()
+        if isinstance(parent, ProfilingSpan):
+            for child in parent.children:
+                if isinstance(child, CumulativeSpan) and child.name == self._name:
+                    return child
+            cumulative_span = CumulativeSpan(name=self._name)
+            parent.children.append(cumulative_span)
+            return cumulative_span
+        return CumulativeSpan(name=self._name)
+
+    def __enter__(self) -> CumulativeSpan:
+        if not self._manager.enabled:
+            return _NullSpan()
+        self._cumulative_span = self._get_or_create()
+        self._iter_start = time.perf_counter()
+        return self._cumulative_span
+
+    def __exit__(self, *args):
+        if self._cumulative_span is not None:
+            duration_ms = (time.perf_counter() - self._iter_start) * 1000
+            self._cumulative_span.record(duration_ms)
+
+    def __call__(self, fn):
+        manager = self._manager
+        name = self._name
+
+        @functools.wraps(fn)
+        def wrapper(*args, **kwargs):
+            with _CumulativeActiveSpan(manager, name):
+                return fn(*args, **kwargs)
+
+        return wrapper
+
+
+class _CommandSpan:
+    """Context manager that creates both a ProfilingTrace and its root span.
+
+    Used exclusively by the route handler to wrap the full command execution.
+    """
+
+    def __init__(self, manager: 'ProfilingManager', command_name: str,
+                 command_id: str, kwargs: dict):
+        self._manager = manager
+        self._command_name = command_name
+        self._command_id = command_id
+        self._kwargs = kwargs
+        self._trace: Optional[ProfilingTrace] = None
+        self._span: Optional[ProfilingSpan] = None
+        self._token = None
+
+    def __enter__(self) -> ProfilingSpan:
+        self._trace = ProfilingTrace(
+            command_name=self._command_name,
+            command_id=self._command_id,
+            kwargs=dict(self._kwargs) if self._kwargs else {},
+            timestamp=datetime.now(),
+        )
+        self._span = ProfilingSpan(name=self._command_name)
+        self._token = self._manager.push_span(self._span)
+        return self._span
+
+    def __exit__(self, *args):
+        self._manager.pop_span(self._span, self._token)
+        self._trace.root_span = self._span
+        self._trace.total_duration_ms = self._span.duration_ms
+        self._manager.add_trace(self._trace)
+
+
+# ---------------------------------------------------------------------------
+# Manager
+# ---------------------------------------------------------------------------
+
+class ProfilingManager:
+    """Global in-memory profiling manager.
+
+    All probe mechanisms check ``enabled`` at call time, so the profiler can
+    be toggled without restarting the server.  Use the module-level ``profiler``
+    singleton rather than instantiating this class directly.
+    """
+
+    def __init__(self, max_traces: int = None):
+        from myfasthtml.core.constants import PROFILER_MAX_TRACES
+        self.enabled: bool = False
+        self._traces: deque = deque(maxlen=max_traces or PROFILER_MAX_TRACES)
+        self._current_span: ContextVar[Optional[ProfilingSpan]] = ContextVar(
+            'profiler_current_span', default=None
+        )
+        self._overhead_samples: list = []
+
+    # --- Span lifecycle ---
+
+    def push_span(self, span: ProfilingSpan) -> object:
+        """Register a span as the current context and attach it to the parent.
+
+        Args:
+            span: The span to activate.
+
+        Returns:
+            A reset token to pass to pop_span().
+        """
+        parent = self._current_span.get()
+        if isinstance(parent, ProfilingSpan):
+            parent.children.append(span)
+        return self._current_span.set(span)
+
+    def pop_span(self, span: ProfilingSpan, token: object) -> None:
+        """Finish a span and restore the previous context.
+
+        Args:
+            span: The span to finish.
+            token: The reset token returned by push_span().
+        """
+        span.finish()
+        self._current_span.reset(token)
+
+    def add_trace(self, trace: ProfilingTrace) -> None:
+        """Add a completed trace to the buffer.
+
+        Args:
+            trace: The trace to store.
+        """
+        self._traces.appendleft(trace)
+
+    def record_overhead(self, duration_s: float) -> None:
+        """Record a span boundary overhead sample.
+
+        Args:
+            duration_s: Duration in seconds of the profiler's own housekeeping.
+        """
+        self._overhead_samples.append(duration_s * 1e6)
+        if len(self._overhead_samples) > 1000:
+            self._overhead_samples = self._overhead_samples[-1000:]
+
+    @staticmethod
+    def capture_args(fn, args, kwargs) -> dict:
+        """Capture function arguments as a truncated string dict.
+
+        Args:
+            fn: The function whose signature is inspected.
+            args: Positional arguments passed to the function.
+            kwargs: Keyword arguments passed to the function.
+
+        Returns:
+            Dict of parameter names to string-truncated values.
+        """
+        try:
+            sig = inspect.signature(fn)
+            bound = sig.bind(*args, **kwargs)
+            bound.apply_defaults()
+            params = dict(bound.arguments)
+            params.pop('self', None)
+            params.pop('cls', None)
+            return {k: str(v)[:200] for k, v in params.items()}
+        except Exception:
+            return {}
+
+    # --- Public interface ---
+
+    @property
+    def traces(self) -> list[ProfilingTrace]:
+        """All recorded traces, most recent first."""
+        return list(self._traces)
+
+    def current_span(self) -> Optional[ProfilingSpan]:
+        """Return the active span in the current async context.
+
+        Can be called from anywhere within a span to attach metadata::
+
+            profiler.current_span().set("row_count", len(rows))
+        """
+        return self._current_span.get()
+
+    def clear(self):
+        """Empty the trace buffer."""
+        self._traces.clear()
+        logger.debug("Profiler traces cleared.")
+
+    # --- Probe mechanisms ---
+
+    def span(self, name: str, args: dict = None) -> _ActiveSpan:
+        """Context manager and decorator for a named span.
+
+        The enabled check is deferred to call/enter time, so this can be used
+        as a static decorator without concern for startup order.
+
+        Args:
+            name: Span name.
+            args: Optional metadata to attach immediately.
+
+        Returns:
+            An object usable as a context manager or decorator.
+        """
+        return _ActiveSpan(self, name, args)
+
+    def cumulative_span(self, name: str) -> _CumulativeActiveSpan:
+        """Context manager and decorator for loop spans.
+
+        Aggregates all iterations into a single entry (count, total, min, max, avg).
+
+        Args:
+            name: Span name.
+
+        Returns:
+            An object usable as a context manager or decorator.
+        """
+        return _CumulativeActiveSpan(self, name)
+
+    def command_span(self, command_name: str, command_id: str,
+                     kwargs: dict) -> '_CommandSpan | _NullSpan':
+        """Context manager for the route handler.
+
+        Creates a ProfilingTrace and its root span together.  When the context
+        exits, the trace is added to the buffer.
+
+        Args:
+            command_name: Human-readable command name.
+            command_id: UUID string of the command.
+            kwargs: Client arguments received by the route handler.
+
+        Returns:
+            An object usable as a context manager.
+        """
+        if not self.enabled:
+            return _NullSpan()
+        return _CommandSpan(self, command_name, command_id, kwargs)
+
+    def trace_all(self, cls=None, *, exclude: list = None):
+        """Class decorator — statically wraps all non-dunder methods with spans.
+
+        Wrapping happens at class definition time; the enabled check is deferred
+        to call time via _ActiveSpan.__call__.
+
+        Args:
+            cls: The class to instrument (when used without parentheses).
+            exclude: List of method names to skip.
+
+        Usage::
+
+            @profiler.trace_all
+            class MyClass: ...
+
+            @profiler.trace_all(exclude=["render"])
+            class MyClass: ...
+        """
+        _exclude = set(exclude or [])
+
+        def decorator(klass):
+            for attr_name, method in inspect.getmembers(klass, predicate=inspect.isfunction):
+                if attr_name in _exclude:
+                    continue
+                if attr_name.startswith('__') and attr_name.endswith('__'):
+                    continue
+                setattr(klass, attr_name, _ActiveSpan(self, attr_name)(method))
+            return klass
+
+        if cls is not None:
+            return decorator(cls)
+        return decorator
+
+    def trace_calls(self, fn):
+        """Function decorator — traces all sub-calls via sys.setprofile().
+
+        Use for exploration when the bottleneck location is unknown.
+        sys.setprofile() is scoped to this function's execution only;
+        the global profiler is restored on exit.
+
+        The root span for ``fn`` itself is created before setprofile is
+        activated so that profiler internals are not captured as children.
+
+        Args:
+            fn: The function to instrument.
+        """
+        manager = self
+
+        @functools.wraps(fn)
+        def wrapper(*args, **kwargs):
+            if not manager.enabled:
+                return fn(*args, **kwargs)
+
+            call_stack: list[tuple[ProfilingSpan, object]] = []
+            # Skip the first call event (fn itself — already represented by root_span)
+            skip_first = [True]
+
+            def _profile(frame, event, arg):
+                if event == 'call':
+                    if skip_first[0]:
+                        skip_first[0] = False
+                        return
+                    span = ProfilingSpan(name=frame.f_code.co_name)
+                    token = manager.push_span(span)
+                    call_stack.append((span, token))
+                elif event in ('return', 'exception'):
+                    if call_stack:
+                        span, token = call_stack.pop()
+                        manager.pop_span(span, token)
+
+            # Build root span BEFORE activating setprofile so that profiler
+            # internals (capture_args, ProfilingSpan.__init__, etc.) are not
+            # captured as children.
+            captured = manager.capture_args(fn, args, kwargs)
+            root_span = ProfilingSpan(name=fn.__name__)
+            root_span.data.update(captured)
+            root_token = manager.push_span(root_span)
+
+            old_profile = sys.getprofile()
+            sys.setprofile(_profile)
+            try:
+                result = fn(*args, **kwargs)
+            finally:
+                sys.setprofile(old_profile)
+                manager.pop_span(root_span, root_token)
+
+            return result
+
+        return wrapper
+
+    # --- Overhead measurement ---
+
+    @property
+    def overhead_per_span_us(self) -> float:
+        """Average overhead per span boundary in microseconds."""
+        if not self._overhead_samples:
+            return 0.0
+        return sum(self._overhead_samples) / len(self._overhead_samples)
+
+    @property
+    def total_overhead_ms(self) -> float:
+        """Estimated total overhead across all recorded traces."""
+        total_spans = sum(
+            self._count_spans(t.root_span) for t in self._traces if t.root_span
+        )
+        return (total_spans * self.overhead_per_span_us * 2) / 1000
+
+    def _count_spans(self, span: ProfilingSpan) -> int:
+        if span is None:
+            return 0
+        count = 1
+        for child in span.children:
+            if isinstance(child, ProfilingSpan):
+                count += self._count_spans(child)
+        return count
+
+
+# ---------------------------------------------------------------------------
+# Singleton
+# ---------------------------------------------------------------------------
+
+profiler = ProfilingManager()
diff --git a/tests/core/test_profiler.py b/tests/core/test_profiler.py
new file mode 100644
index 0000000..513da28
--- /dev/null
+++ b/tests/core/test_profiler.py
@@ -0,0 +1,660 @@
+import time
+from datetime import datetime
+
+import pytest
+
+from myfasthtml.core.profiler import (
+    ProfilingManager,
+    ProfilingSpan,
+    CumulativeSpan,
+    ProfilingTrace,
+    _NullSpan,
+)
+
+
+@pytest.fixture(autouse=True)
+def fresh_profiler():
+    """Provide a fresh ProfilingManager for each test."""
+    p = ProfilingManager(max_traces=10)
+    p.enabled = True
+    return p
+
+
+# ---------------------------------------------------------------------------
+# TestProfilingModels — data model tests (ProfilingSpan, CumulativeSpan)
+# ---------------------------------------------------------------------------
+
+class TestProfilingModels:
+
+    def test_i_can_compute_avg_ms_when_no_iterations(self):
+        """Test that avg_ms returns 0.0 on a fresh CumulativeSpan to avoid division by zero."""
+        cum = CumulativeSpan(name="empty")
+        assert cum.avg_ms == 0.0
+        assert cum.total_ms == 0.0
+        assert cum.count == 0
+        assert cum.min_ms == float('inf')
+        assert cum.max_ms == 0
+
+    @pytest.mark.parametrize("durations, expected_min, expected_max, expected_total, expected_avg", [
+        ([10.0, 5.0, 8.0], 5.0, 10.0, 23.0, 23.0 / 3),
+        ([1.0],            1.0,  1.0,  1.0,  1.0),
+        ([3.0, 3.0, 3.0],  3.0,  3.0,  9.0,  3.0),
+    ])
+    def test_i_can_aggregate_iterations_in_cumulative_span(
+        self, durations, expected_min, expected_max, expected_total, expected_avg
+    ):
+        """Test that CumulativeSpan correctly aggregates all metrics across iterations."""
+        cum = CumulativeSpan(name="probe")
+        for d in durations:
+            cum.record(d)
+
+        assert cum.count == len(durations)
+        assert cum.min_ms == expected_min
+        assert cum.max_ms == expected_max
+        assert cum.total_ms == expected_total
+        assert cum.avg_ms == pytest.approx(expected_avg)
+
+    def test_i_can_chain_set_calls_on_span(self, fresh_profiler):
+        """Test that set() returns self to allow fluent chaining."""
+        p = fresh_profiler
+        with p.span("query") as s:
+            result = s.set("table", "orders").set("rows", 42)
+
+        assert result is s
+        assert s.data["table"] == "orders"
+        assert s.data["rows"] == 42
+
+    def test_i_can_access_span_data_after_context_exits(self, fresh_profiler):
+        """Test that span data and duration persist after the with block has exited."""
+        p = fresh_profiler
+        with p.span("query") as s:
+            s.set("key", "value")
+
+        assert s.data["key"] == "value"
+        assert s.duration_ms > 0
+
+
+# ---------------------------------------------------------------------------
+# TestSpan — profiler.span() context manager and decorator
+# ---------------------------------------------------------------------------
+
+class TestSpan:
+
+    def test_i_can_create_a_span(self, fresh_profiler):
+        """Test that a span is created with correct name and measured duration."""
+        p = fresh_profiler
+        with p.span("my_span") as s:
+            time.sleep(0.01)
+
+        assert isinstance(s, ProfilingSpan)
+        assert s.name == "my_span"
+        assert s.duration_ms >= 10
+
+    def test_i_can_nest_spans(self, fresh_profiler):
+        """Test that a span created inside another becomes its child."""
+        p = fresh_profiler
+        with p.span("parent") as parent:
+            with p.span("child") as child:
+                pass
+
+        assert len(parent.children) == 1
+        assert parent.children[0] is child
+        assert child.name == "child"
+
+    def test_i_can_have_sibling_spans(self, fresh_profiler):
+        """Test that consecutive spans under the same parent are recorded as siblings."""
+        p = fresh_profiler
+        with p.span("parent") as parent:
+            with p.span("child_a"):
+                pass
+            with p.span("child_b"):
+                pass
+
+        assert len(parent.children) == 2
+        assert parent.children[0].name == "child_a"
+        assert parent.children[1].name == "child_b"
+
+    def test_i_can_create_two_spans_with_same_name_under_same_parent(self, fresh_profiler):
+        """Test that two normal spans with the same name under the same parent are two distinct entries.
+
+        This contrasts with cumulative spans, which merge same-name entries into one aggregated entry.
+        """
+        p = fresh_profiler
+        with p.span("parent") as parent:
+            with p.span("work"):
+                pass
+            with p.span("work"):
+                pass
+
+        assert len(parent.children) == 2, "Normal spans with the same name must remain separate entries"
+        assert parent.children[0] is not parent.children[1]
+
+    def test_i_can_use_same_span_name_under_different_parents(self, fresh_profiler):
+        """Test that spans with the same name under different parents are independent objects."""
+        p = fresh_profiler
+        with p.span("root"):
+            with p.span("parent_a") as parent_a:
+                with p.span("work"):
+                    pass
+            with p.span("parent_b") as parent_b:
+                with p.span("work"):
+                    pass
+
+        span_a = parent_a.children[0]
+        span_b = parent_b.children[0]
+        assert span_a is not span_b, "Same name under different parents must be separate objects"
+
+    def test_i_can_use_span_as_decorator(self, fresh_profiler):
+        """Test that @span wraps a function and captures all positional arguments."""
+        p = fresh_profiler
+
+        @p.span("decorated")
+        def my_func(x, y):
+            return x + y
+
+        with p.span("root") as root:
+            result = my_func(1, 2)
+
+        assert result == 3
+        assert len(root.children) == 1
+        child = root.children[0]
+        assert child.name == "decorated"
+        assert child.data.get("x") == "1"
+        assert child.data.get("y") == "2"
+
+    def test_i_can_capture_all_args_with_span_decorator(self, fresh_profiler):
+        """Test that all positional and keyword arguments are captured by the @span decorator."""
+        p = fresh_profiler
+
+        @p.span("compute")
+        def my_func(x, y, z=10):
+            return x + y + z
+
+        with p.span("root") as root:
+            my_func(1, 2, z=3)
+
+        child = root.children[0]
+        assert child.data.get("x") == "1"
+        assert child.data.get("y") == "2"
+        assert child.data.get("z") == "3"
+
+    def test_i_can_exclude_self_from_captured_args_with_span_decorator(self, fresh_profiler):
+        """Test that 'self' is not included in the captured args of a decorated method."""
+        p = fresh_profiler
+
+        class MyClass:
+            @p.span("method")
+            def my_method(self, value):
+                return value
+
+        with p.span("root") as root:
+            MyClass().my_method(42)
+
+        child = root.children[0]
+        assert "self" not in child.data
+        assert child.data.get("value") == "42"
+
+    def test_i_can_use_span_decorator_without_parent(self, fresh_profiler):
+        """Test that a decorated function runs correctly with no active parent span."""
+        p = fresh_profiler
+
+        @p.span("solo")
+        def my_func():
+            return 42
+
+        result = my_func()
+        assert result == 42
+
+    def test_i_can_attach_data_to_span_via_context_manager(self, fresh_profiler):
+        """Test that metadata can be attached to a span inside the with block."""
+        p = fresh_profiler
+        with p.span("query") as s:
+            s.set("row_count", 42)
+            s.set("table", "users")
+
+        assert s.data["row_count"] == 42
+        assert s.data["table"] == "users"
+
+    def test_i_can_attach_data_via_current_span(self, fresh_profiler):
+        """Test that current_span().set() attaches metadata to the active span from anywhere."""
+        p = fresh_profiler
+
+        @p.span("process")
+        def process():
+            p.current_span().set("result", "ok")
+
+        with p.span("root") as root:
+            process()
+
+        child = root.children[0]
+        assert child.data["result"] == "ok"
+
+    def test_i_can_pass_args_to_span_context_manager(self, fresh_profiler):
+        """Test that metadata can be pre-attached to a span via the args parameter."""
+        p = fresh_profiler
+        with p.span("query", args={"table": "orders"}) as s:
+            pass
+
+        assert s.data["table"] == "orders"
+
+    def test_i_can_get_none_from_current_span_when_no_span_active(self, fresh_profiler):
+        """Test that current_span() returns None when called outside any span context."""
+        p = fresh_profiler
+        assert p.current_span() is None
+
+    def test_i_can_get_innermost_span_via_current_span(self, fresh_profiler):
+        """Test that current_span() returns the innermost active span in nested contexts."""
+        p = fresh_profiler
+        with p.span("outer"):
+            with p.span("inner") as inner:
+                assert p.current_span() is inner
+
+
+# ---------------------------------------------------------------------------
+# TestCumulativeSpan — profiler.cumulative_span() context manager and decorator
+# ---------------------------------------------------------------------------
+
+class TestCumulativeSpan:
+
+    def test_i_can_use_cumulative_span(self, fresh_profiler):
+        """Test that repeated iterations are aggregated into a single child entry."""
+        p = fresh_profiler
+        with p.span("loop") as loop_span:
+            for _ in range(5):
+                with p.cumulative_span("item"):
+                    time.sleep(0.001)
+
+        assert len(loop_span.children) == 1
+        cum = loop_span.children[0]
+        assert isinstance(cum, CumulativeSpan)
+        assert cum.count == 5
+        assert cum.total_ms >= 5
+        assert cum.min_ms <= cum.avg_ms <= cum.max_ms
+
+    def test_i_can_use_cumulative_span_as_decorator(self, fresh_profiler):
+        """Test that @cumulative_span aggregates all decorated function calls."""
+        p = fresh_profiler
+
+        @p.cumulative_span("item")
+        def process_item(x):
+            return x * 2
+
+        with p.span("loop") as loop_span:
+            for i in range(3):
+                process_item(i)
+
+        assert len(loop_span.children) == 1
+        cum = loop_span.children[0]
+        assert cum.count == 3
+
+    def test_i_can_continue_using_the_cumulative_span(self, fresh_profiler):
+        """Test that the same cumulative span accumulates across separate loop blocks."""
+        p = fresh_profiler
+        with p.span("parent") as parent:
+            for _ in range(3):
+                with p.cumulative_span("reads"):
+                    pass
+            for _ in range(2):
+                with p.cumulative_span("reads"):
+                    pass
+
+        assert len(parent.children) == 1
+        reads = parent.children[0]
+        assert reads.count == 5
+
+    def test_i_can_have_two_cumulative_spans_with_different_names(self, fresh_profiler):
+        """Test that two cumulative spans with different names create separate entries in the parent."""
+        p = fresh_profiler
+        with p.span("parent") as parent:
+            for _ in range(3):
+                with p.cumulative_span("reads"):
+                    pass
+            for _ in range(2):
+                with p.cumulative_span("writes"):
+                    pass
+
+        assert len(parent.children) == 2
+        reads = next(c for c in parent.children if c.name == "reads")
+        writes = next(c for c in parent.children if c.name == "writes")
+        assert reads.count == 3
+        assert writes.count == 2
+
+    def test_i_can_use_same_cumulative_span_name_under_different_parents(self, fresh_profiler):
+        """Test that cumulative spans with the same name under different parents are independent."""
+        p = fresh_profiler
+        with p.span("root"):
+            with p.span("parent_a") as parent_a:
+                for _ in range(3):
+                    with p.cumulative_span("items"):
+                        pass
+            with p.span("parent_b") as parent_b:
+                for _ in range(2):
+                    with p.cumulative_span("items"):
+                        pass
+
+        items_a = parent_a.children[0]
+        items_b = parent_b.children[0]
+        assert items_a is not items_b, "Same name under different parents must be separate objects"
+        assert items_a.count == 3
+        assert items_b.count == 2
+
+
+# ---------------------------------------------------------------------------
+# TestCommandSpan — profiler.command_span() and trace recording
+# ---------------------------------------------------------------------------
+
+class TestCommandSpan:
+
+    def test_i_can_record_a_trace_via_command_span(self, fresh_profiler):
+        """Test that command_span creates a complete trace with root span and children."""
+        p = fresh_profiler
+        with p.command_span("NavigateCell", "abc-123", {"row": "5"}):
+            with p.span("callback"):
+                time.sleep(0.01)
+
+        assert len(p.traces) == 1
+        trace = p.traces[0]
+        assert isinstance(trace, ProfilingTrace)
+        assert trace.command_name == "NavigateCell"
+        assert trace.command_id == "abc-123"
+        assert trace.kwargs == {"row": "5"}
+        assert trace.total_duration_ms >= 10
+        assert len(trace.root_span.children) == 1
+        assert trace.root_span.children[0].name == "callback"
+
+    def test_i_cannot_record_trace_when_profiler_disabled(self, fresh_profiler):
+        """Test that command_span is a no-op when the profiler is disabled."""
+        p = fresh_profiler
+        p.enabled = False
+        with p.command_span("cmd", "id", {}):
+            pass
+
+        assert len(p.traces) == 0
+
+    def test_i_can_record_up_to_max_traces(self, fresh_profiler):
+        """Test that the trace buffer respects the max_traces limit (FIFO eviction)."""
+        p = fresh_profiler
+        for i in range(15):
+            with p.command_span(f"cmd_{i}", str(i), {}):
+                pass
+
+        assert len(p.traces) == 10, "Buffer should cap at max_traces=10"
+
+    def test_i_can_access_trace_timestamp(self, fresh_profiler):
+        """Test that a recorded trace contains a valid datetime timestamp."""
+        p = fresh_profiler
+        before = datetime.now()
+        with p.command_span("MyCmd", "id-001", {}):
+            pass
+        after = datetime.now()
+
+        trace = p.traces[0]
+        assert before <= trace.timestamp <= after
+
+    def test_i_can_verify_kwargs_are_copied_in_command_span(self, fresh_profiler):
+        """Test that mutating the original kwargs dict does not affect the recorded trace."""
+        p = fresh_profiler
+        kwargs = {"row": "5", "col": "2"}
+        with p.command_span("MyCmd", "id-001", kwargs):
+            pass
+
+        kwargs["row"] = "99"
+        trace = p.traces[0]
+        assert trace.kwargs["row"] == "5"
+
+
+# ---------------------------------------------------------------------------
+# TestTraceAll — profiler.trace_all() class decorator
+# ---------------------------------------------------------------------------
+
+class TestTraceAll:
+
+    def test_i_can_use_trace_all_on_class(self, fresh_profiler):
+        """Test that trace_all wraps all non-dunder methods of a class."""
+        p = fresh_profiler
+
+        @p.trace_all
+        class MyClass:
+            def method_a(self):
+                return "a"
+
+            def method_b(self):
+                return "b"
+
+            def __repr__(self):
+                return "MyClass()"
+
+        obj = MyClass()
+        with p.span("root") as root:
+            obj.method_a()
+            obj.method_b()
+
+        assert len(root.children) == 2
+        assert root.children[0].name == "method_a"
+        assert root.children[1].name == "method_b"
+
+    def test_i_can_use_trace_all_with_exclude(self, fresh_profiler):
+        """Test that excluded methods are not wrapped by trace_all."""
+        p = fresh_profiler
+
+        @p.trace_all(exclude=["method_b"])
+        class MyClass:
+            def method_a(self):
+                return "a"
+
+            def method_b(self):
+                return "b"
+
+        obj = MyClass()
+        with p.span("root") as root:
+            obj.method_a()
+            obj.method_b()
+
+        assert len(root.children) == 1
+        assert root.children[0].name == "method_a"
+
+    def test_i_can_confirm_trace_all_skips_dunder_methods(self, fresh_profiler):
+        """Test that trace_all does not wrap dunder methods like __repr__."""
+        p = fresh_profiler
+        call_log = []
+
+        @p.trace_all
+        class MyClass:
+            def __repr__(self):
+                call_log.append("repr")
+                return "MyClass()"
+
+            def method_a(self):
+                return "a"
+
+        obj = MyClass()
+        with p.span("root") as root:
+            repr(obj)
+            obj.method_a()
+
+        child_names = [c.name for c in root.children]
+        assert "method_a" in child_names
+        assert "__repr__" not in child_names
+        assert "repr" in call_log
+
+    def test_i_can_use_trace_all_with_parentheses_and_no_exclude(self, fresh_profiler):
+        """Test that @profiler.trace_all() with parentheses and no args behaves like @profiler.trace_all."""
+        p = fresh_profiler
+
+        @p.trace_all()
+        class MyClass:
+            def method_a(self):
+                return "a"
+
+        obj = MyClass()
+        with p.span("root") as root:
+            obj.method_a()
+
+        assert len(root.children) == 1
+        assert root.children[0].name == "method_a"
+
+
+# ---------------------------------------------------------------------------
+# TestTraceCalls — profiler.trace_calls() function decorator
+# ---------------------------------------------------------------------------
+
+class TestTraceCalls:
+
+    def test_i_can_use_trace_calls_on_function(self, fresh_profiler):
+        """Test that trace_calls traces all sub-calls as children of the decorated function span.
+
+        Verifies both direct children (helper_a, helper_b under main_func) and nested hierarchy
+        (grandchild under helper_a, not under main_func).
+        """
+        p = fresh_profiler
+
+        def grandchild():
+            return 0
+
+        def helper_a():
+            grandchild()
+            return 1
+
+        def helper_b():
+            return 2
+
+        @p.trace_calls
+        def main_func():
+            helper_a()
+            helper_b()
+            return 42
+
+        with p.span("root") as root:
+            main_func()
+
+        assert len(root.children) == 1
+        main_span = root.children[0]
+        assert main_span.name == "main_func"
+        assert len(main_span.children) == 2, "main_func should have exactly 2 direct children"
+        child_names = [c.name for c in main_span.children]
+        assert "helper_a" in child_names
+        assert "helper_b" in child_names
+
+        helper_a_span = next(c for c in main_span.children if c.name == "helper_a")
+        assert len(helper_a_span.children) == 1, "grandchild must be nested under helper_a, not main_func"
+        assert helper_a_span.children[0].name == "grandchild"
+
+    def test_i_cannot_use_trace_calls_when_disabled(self, fresh_profiler):
+        """Test that trace_calls creates no spans when the profiler is disabled at call time."""
+        p = fresh_profiler
+
+        @p.trace_calls
+        def main_func():
+            return 99
+
+        with p.span("root") as root:
+            p.enabled = False
+            main_func()
+            p.enabled = True
+
+        assert len(root.children) == 0, "trace_calls should not create spans when profiler is disabled"
+
+    def test_i_can_verify_trace_calls_captures_function_args(self, fresh_profiler):
+        """Test that trace_calls captures the decorated function's arguments in the root span data."""
+        p = fresh_profiler
+
+        @p.trace_calls
+        def compute(x, y):
+            return x + y
+
+        with p.span("root") as root:
+            compute(3, 7)
+
+        main_span = root.children[0]
+        assert main_span.name == "compute"
+        assert main_span.data.get("x") == "3"
+        assert main_span.data.get("y") == "7"
+
+
+# ---------------------------------------------------------------------------
+# TestProfilingManager — enable/disable, clear, overhead
+# ---------------------------------------------------------------------------
+
+class TestProfilingManager:
+
+    def test_i_can_enable_disable_profiler(self, fresh_profiler):
+        """Test that disabling the profiler makes span() return a no-op NullSpan."""
+        p = fresh_profiler
+        p.enabled = False
+
+        with p.span("ignored") as s:
+            pass
+
+        assert isinstance(s, _NullSpan)
+
+    def test_i_can_use_decorator_when_profiler_is_disabled(self, fresh_profiler):
+        """Test that a @span decorated function still executes correctly when profiler is disabled."""
+        p = fresh_profiler
+
+        @p.span("my_span")
+        def my_func():
+            return "result"
+
+        p.enabled = False
+        result = my_func()
+        assert result == "result"
+
+    def test_i_can_toggle_profiler_at_runtime(self, fresh_profiler):
+        """Test that spans are captured only while the profiler is enabled.
+
+        Three phases: enabled -> disabled -> re-enabled, verifying capture behavior at each step.
+        """
+        p = fresh_profiler
+
+        @p.span("traced")
+        def my_func():
+            return 1
+
+        # Phase 1: enabled — span must be captured
+        with p.span("root") as root:
+            my_func()
+        assert len(root.children) == 1, "Span should be captured when profiler is enabled"
+
+        # Phase 2: disabled — p.span() returns NullSpan, nothing captured
+        p.enabled = False
+        with p.span("root_disabled") as root_disabled:
+            my_func()
+        assert isinstance(root_disabled, _NullSpan), "p.span() should return NullSpan when disabled"
+
+        # Phase 3: re-enabled — span must be captured again
+        p.enabled = True
+        with p.span("root_reenabled") as root_reenabled:
+            my_func()
+        assert len(root_reenabled.children) == 1, "Span should be captured again after re-enabling"
+
+    def test_i_can_clear_traces(self, fresh_profiler):
+        """Test that clear() empties the trace buffer completely."""
+        p = fresh_profiler
+        with p.command_span("cmd", "uuid-1", {}):
+            pass
+        with p.command_span("cmd", "uuid-2", {}):
+            pass
+
+        assert len(p.traces) == 2
+        p.clear()
+        assert len(p.traces) == 0
+
+    def test_i_can_measure_overhead(self, fresh_profiler):
+        """Test that overhead metrics are populated after spans are recorded."""
+        p = fresh_profiler
+        for _ in range(20):
+            with p.span("probe"):
+                pass
+
+        assert p.overhead_per_span_us >= 0
+        assert p.total_overhead_ms >= 0
+
+    def test_i_can_get_zero_overhead_when_no_samples(self, fresh_profiler):
+        """Test that overhead_per_span_us returns 0.0 when no spans have been recorded."""
+        p = fresh_profiler
+        assert p.overhead_per_span_us == 0.0
+
+    def test_i_can_get_zero_total_overhead_when_buffer_empty(self, fresh_profiler):
+        """Test that total_overhead_ms returns 0.0 when the trace buffer is empty."""
+        p = fresh_profiler
+        assert p.total_overhead_ms == 0.0