Stage 13 — Production

Shipping AI is different from shipping a feature. Models are stochastic. Inputs are open-ended. Costs scale unpredictably. This stage covers the operational layer: deploying, evaluating, monitoring, and trusting AI in production.

Prerequisites

Stages 08–11 (you’ve built something to ship)

Learning ladder

Deployment architectures — API providers, self-hosted, edge
Evaluation & benchmarks — offline, online, LLM-as-judge
Guardrails — input/output filters, schema validation, jailbreak defense
Observability & tracing
Cost & latency — caching, batching, speculative decoding, model routing
Hallucination mitigation
Data systems for AI — ingestion, skew, drift, feedback loops, lineage
Enterprise considerations — security, compliance, data boundaries

MVU

You can:

Take a working LLM feature from prototype to production with confidence
Set up evals, observability, and guardrails before shipping
Estimate per-query cost and latency
Diagnose a regression in production within hours, not days

Exercise

Take a Stage 09 RAG or a Stage 11 agent. Add: end-to-end evals, production observability, two layers of guardrails, a cost cap. Run for a week with real or simulated traffic. Read the traces. Find one bug.

Hands-on companions

The entire production-ops layer of this stage has a code-side companion in /ship’s Production release:

/ship/12 — observability with Phoenix — OpenTelemetry instrumentation for every LLM call, tool, retrieval, agent step
/ship/13 — evaluation in production — regression suite, paired A/B prompt testing, drift detection, feedback-to-eval pipeline
/ship/14 — cost and latency tuning — five levers ranked by ROI, with a real benchmark table (4× cheaper, 4× faster, quality flat)
/ship/15 — deploy it for real — Modal / Replicate / RunPod / VPS, with the trade-offs and the deploy command for each

For products that put all of this into one running service, the /case-studies section ships four — pick the one closest to what you’re shipping.