Stage 13 — Production
Shipping AI is different from shipping a feature. Models are stochastic. Inputs are open-ended. Costs scale unpredictably. This stage covers the operational layer: deploying, evaluating, monitoring, and trusting AI in production.
Prerequisites
- Stages 08–11 (you’ve built something to ship)
Learning ladder
- Deployment architectures — API providers, self-hosted, edge
- Evaluation & benchmarks — offline, online, LLM-as-judge
- Guardrails — input/output filters, schema validation, jailbreak defense
- Observability & tracing
- Cost & latency — caching, batching, speculative decoding, model routing
- Hallucination mitigation
- Data systems for AI — ingestion, skew, drift, feedback loops, lineage
- Enterprise considerations — security, compliance, data boundaries
MVU
You can:
- Take a working LLM feature from prototype to production with confidence
- Set up evals, observability, and guardrails before shipping
- Estimate per-query cost and latency
- Diagnose a regression in production within hours, not days
Exercise
Take a Stage 09 RAG or a Stage 11 agent. Add: end-to-end evals, production observability, two layers of guardrails, a cost cap. Run for a week with real or simulated traffic. Read the traces. Find one bug.
Hands-on companions
The entire production-ops layer of this stage has a code-side companion in /ship’s Production release:
- /ship/12 — observability with Phoenix — OpenTelemetry instrumentation for every LLM call, tool, retrieval, agent step
- /ship/13 — evaluation in production — regression suite, paired A/B prompt testing, drift detection, feedback-to-eval pipeline
- /ship/14 — cost and latency tuning — five levers ranked by ROI, with a real benchmark table (4× cheaper, 4× faster, quality flat)
- /ship/15 — deploy it for real — Modal / Replicate / RunPod / VPS, with the trade-offs and the deploy command for each
For products that put all of this into one running service, the /case-studies section ships four — pick the one closest to what you’re shipping.