Stage 08 — Prompting
The cheapest way to change an LLM’s behavior. No training, no data, no GPU — just better instructions. Prompting alone gets you 80% of the way for many production problems.
Prerequisites
- Stage 07 (modern LLMs)
- Working API access (Anthropic, OpenAI, or local model)
Learning ladder
- Prompt fundamentals — system, user, assistant; chat templates; context windows
- Few-shot & chain-of-thought
- Structured outputs — JSON mode, tool calling, schemas
- Advanced techniques — self-consistency, ReAct, tree of thoughts, reflexion
- Sampling & decoding — temperature, top-p/k, beam, speculative
MVU
You can:
- Reach for prompting first; know when to escalate to RAG, fine-tuning, or agents
- Write a robust system prompt that survives adversarial inputs
- Force valid JSON output 100/100 times for a non-trivial schema
- Pick sampling parameters for a given task (creative vs deterministic)
Exercise
Write a prompt that classifies emails into 8 categories with 95%+ accuracy on a held-out set of 100 examples. Then break it with adversarial cases. Iterate.
Hands-on companions
Watch it interactively:
- Sampling Knobs — same real GPT-2 logits, three sliders (T, top-p, top-k), four very different outputs.
- Beam Search Lab — greedy / beam / sample on real GPT-2 distributions.
- Structured Outputs — editable response, live JSON-schema validator, five named “break it” experiments.
- CoT Lab — reasoning-depth slider showing how partial reasoning lands on different (sometimes wrong) answers.
- Few-Shot Lab — how k=0..3 examples change format adherence and label coverage.
Ship the stack:
/ship/04— build the eval harness — measure prompt changes against a golden set, replace vibes with numbers./ship/13— evals in production — A/B prompt testing with statistical rigor (paired t-test, effect-size threshold).
See also
- Stage 09 — RAG — when prompting alone isn’t enough
- Stage 11 — Agents — prompts in a loop
- Stage 13 — Evaluation

