demo
Where the latency actually went
The waterfall view every production AI app needs. Spans for retrieval, tools, model calls, guards. Click any span to see tags, status, and cost. The view your on-call rotation lives in.
How to read a waterfall
Each row is a span — one operation with a duration. Spans nest: a parent span (e.g., "handle request") contains children (e.g., "retrieve docs", "model call", "guard output"). The horizontal position is wall-clock time; the length is duration. Click a span to see its tags, status, and cost.
Try this — predict before you click
- Look at the first trace. Predict: the longest span is the model call, not the retrieval or guard step. This is normal — generation dominates wall-clock time on most production traces.
- Find a span that fires twice in a row (a retry). Predict: the second attempt has the same shape but adds latency to the critical path. Retry budgets exist because every retry shows up here as a visible cost.
- Switch traces. Predict: the slow trace has either (a) a tool that took 5+ seconds, (b) a retry, or (c) a long generation. The waterfall makes the answer obvious in seconds — without it, you'd be grepping through logs.
- Click a model-call span. Predict: tags show prompt-tokens, completion-tokens, model-id, temperature. These are what production observability tools (Langfuse, Arize, custom) store on every call. Cost roll-up = Σ(prompt_tokens × $/M_in + completion_tokens × $/M_out).
Anchored to 13-production/observability-and-tracing.
Code-side: /ship/12 — observability with Phoenix.