13

stage · curriculum

Production

Models are stochastic, inputs are open-ended, costs scale unpredictably. Evaluation discipline is the single biggest gap between prototype and production. Observability, guardrails, cost-and-latency tuning are the operational layer.

8 articles
46 min to read
6 demos
5 books
if you only do one thing

A demo with no eval is a coin flip; with an eval, it's a tracked, improvable system. Tune rubric weights and watch the winner change — the rubric, not the answers, often decides 'which model is better.'

Articles in this stage

  1. 01 Cost & Latency
  2. 02 Data Systems for AI Products
  3. 03 Deployment Architectures
  4. 04 Enterprise Considerations
  5. 05 Evaluation & Benchmarks
  6. 06 Guardrails
  7. 07 Hallucination Mitigation
  8. 08 Observability & Tracing

Stage 13 — Production

Shipping AI is different from shipping a feature. Models are stochastic. Inputs are open-ended. Costs scale unpredictably. This stage covers the operational layer: deploying, evaluating, monitoring, and trusting AI in production.

Prerequisites

  • Stages 08–11 (you’ve built something to ship)

Learning ladder

  1. Deployment architectures — API providers, self-hosted, edge
  2. Evaluation & benchmarks — offline, online, LLM-as-judge
  3. Guardrails — input/output filters, schema validation, jailbreak defense
  4. Observability & tracing
  5. Cost & latency — caching, batching, speculative decoding, model routing
  6. Hallucination mitigation
  7. Data systems for AI — ingestion, skew, drift, feedback loops, lineage
  8. Enterprise considerations — security, compliance, data boundaries

MVU

You can:

  • Take a working LLM feature from prototype to production with confidence
  • Set up evals, observability, and guardrails before shipping
  • Estimate per-query cost and latency
  • Diagnose a regression in production within hours, not days

Exercise

Take a Stage 09 RAG or a Stage 11 agent. Add: end-to-end evals, production observability, two layers of guardrails, a cost cap. Run for a week with real or simulated traffic. Read the traces. Find one bug.

Hands-on companions

The entire production-ops layer of this stage has a code-side companion in /ship’s Production release:

For products that put all of this into one running service, the /case-studies section ships four — pick the one closest to what you’re shipping.

See also

Further reading

Books move slower than papers in this field — treat these as foundations, not replacements for the latest research. Real authors, real publishers, real editions. Free badges mark books with author-authorized full text online.

  1. Designing Machine Learning Systems cover

    Designing Machine Learning Systems

    Chip Huyen

    O'Reilly, 2022

    The canonical pre-LLM ML systems book. Chapters 3, 4, 5, 8, 10 are still core.

  2. Designing Data-Intensive Applications cover

    Designing Data-Intensive Applications

    Martin Kleppmann

    O'Reilly, 2017

    The systems-engineering book the AI-eng stack assumes you've read.

  3. Building Machine Learning Powered Applications cover

    Building Machine Learning Powered Applications

    Emmanuel Ameisen

    O'Reilly, 2020

    Idea-to-product walkthroughs that complement the case-studies arc.

  4. Practical MLOps cover

    Practical MLOps

    Noah Gift, Alfredo Deza

    O'Reilly, 2021

    CI/CD, deployment strategies, model registries.