Observability Trace · ai-explained

How to read a waterfall

Each row is a span — one operation with a duration. Spans nest: a parent span (e.g., "handle request") contains children (e.g., "retrieve docs", "model call", "guard output"). The horizontal position is wall-clock time; the length is duration. Click a span to see its tags, status, and cost.

Try this — predict before you click

Look at the first trace. Predict: the longest span is the model call, not the retrieval or guard step. This is normal — generation dominates wall-clock time on most production traces.
Find a span that fires twice in a row (a retry). Predict: the second attempt has the same shape but adds latency to the critical path. Retry budgets exist because every retry shows up here as a visible cost.
Switch traces. Predict: the slow trace has either (a) a tool that took 5+ seconds, (b) a retry, or (c) a long generation. The waterfall makes the answer obvious in seconds — without it, you'd be grepping through logs.
Click a model-call span. Predict: tags show prompt-tokens, completion-tokens, model-id, temperature. These are what production observability tools (Langfuse, Arize, custom) store on every call. Cost roll-up = Σ(prompt_tokens × $/M_in + completion_tokens × $/M_out).

Anchored to 13-production/observability-and-tracing. Code-side: /ship/12 — observability with Phoenix.

Where the latency actually went

How to read a waterfall

Try this — predict before you click