Stage 08 — Prompting

The cheapest way to change an LLM’s behavior. No training, no data, no GPU — just better instructions. Prompting alone gets you 80% of the way for many production problems.

Prerequisites

Stage 07 (modern LLMs)
Working API access (Anthropic, OpenAI, or local model)

Learning ladder

Prompt fundamentals — system, user, assistant; chat templates; context windows
Few-shot & chain-of-thought
Structured outputs — JSON mode, tool calling, schemas
Advanced techniques — self-consistency, ReAct, tree of thoughts, reflexion
Sampling & decoding — temperature, top-p/k, beam, speculative

MVU

You can:

Reach for prompting first; know when to escalate to RAG, fine-tuning, or agents
Write a robust system prompt that survives adversarial inputs
Force valid JSON output 100/100 times for a non-trivial schema
Pick sampling parameters for a given task (creative vs deterministic)

Exercise

Write a prompt that classifies emails into 8 categories with 95%+ accuracy on a held-out set of 100 examples. Then break it with adversarial cases. Iterate.

Hands-on companions

Watch it interactively:

Sampling Knobs — same real GPT-2 logits, three sliders (T, top-p, top-k), four very different outputs.
Beam Search Lab — greedy / beam / sample on real GPT-2 distributions.
Structured Outputs — editable response, live JSON-schema validator, five named “break it” experiments.
CoT Lab — reasoning-depth slider showing how partial reasoning lands on different (sometimes wrong) answers.
Few-Shot Lab — how k=0..3 examples change format adherence and label coverage.

Ship the stack:

/ship/04 — build the eval harness — measure prompt changes against a golden set, replace vibes with numbers.
/ship/13 — evals in production — A/B prompt testing with statistical rigor (paired t-test, effect-size threshold).