Learning Path — Detailed Walk-Through
This is the long-form companion to README.md. For each stage, it lists the prerequisite knowledge, the articles inside, the “minimum viable understanding” (MVU) you should have before moving on, and the exercises that prove it.
If README.md is the map, this is the trail guide.
Stage 01 — Math Foundations
Prereqs: high-school algebra.
Articles:
- Linear algebra — vectors, matrices, dot product, eigenvectors
- Probability & statistics — distributions, expectation, MLE, Bayes
- Calculus & optimization — derivatives, gradients, gradient descent
- Information theory — entropy, KL divergence, cross-entropy
MVU: You can compute a dot product by hand, explain why softmax outputs sum to 1, and write the gradient descent update rule from memory.
Exercise: Implement gradient descent on y = (x - 3)^2 in NumPy. Plot the loss over 100 steps.
Stage 02 — ML Fundamentals
Prereqs: Stage 01.
Articles:
- Supervised learning — regression, classification, the train/val/test split
- Unsupervised learning — clustering, dimensionality reduction
- Loss functions & optimization — MSE, cross-entropy, SGD, momentum
- Evaluation & metrics — accuracy, precision/recall, F1, AUC, calibration
- Regularization & generalization — bias/variance, L1/L2, early stopping
- Classical algorithms — linear/logistic regression, trees, ensembles
MVU: You can explain the bias-variance tradeoff to a non-ML engineer in 90 seconds.
Exercise: Train a logistic regression on Iris in scikit-learn. Compute precision, recall, and F1 by hand from the confusion matrix.
Stage 03 — Neural Networks
Prereqs: Stages 01–02.
Articles:
- Perceptrons & MLPs
- Backpropagation — chain rule, computational graphs, autograd
- Activations & initialization
- Optimizers — SGD, Adam, AdamW, schedulers
- Regularization techniques — dropout, batch/layer norm, weight decay
- Architectures: CNN & RNN
MVU: You can hand-derive backprop through a 2-layer MLP and explain why we need non-linear activations.
Exercise: Train an MLP on MNIST in raw PyTorch (no nn.Sequential). Hit >97% test accuracy.
Stage 04 — Language Modeling
Prereqs: Stage 03.
Articles:
- n-gram models
- Neural language models — Bengio 2003, word embeddings as a side effect
- RNNs & LSTMs
- Why transformers — the failure modes RNNs couldn’t escape
MVU: You can explain perplexity, the vanishing gradient problem, and why parallelism matters for scaling.
Exercise: Train a character-level RNN on a small corpus (~1MB) and generate text. Notice how it forgets context past ~50 chars.
Stage 05 — Tokens & Embeddings
Prereqs: Stages 03–04.
Articles:
- Tokenization — BPE, WordPiece, SentencePiece, byte-level tradeoffs
- Static embeddings — Word2Vec, GloVe, FastText
- Contextual embeddings — ELMo, BERT, modern embedding models
- Semantic geometry — cosine similarity, dimensionality, why “king − man + woman ≈ queen” works
MVU: Given a sentence, you can describe what happens between raw text and the first transformer layer.
Exercise: Take 1000 product reviews, embed them with sentence-transformers/all-MiniLM-L6-v2, find the 5 nearest neighbors of a query review.
Stage 06 — Transformers
Prereqs: Stages 03–05.
Articles:
- Self-attention (KQV) — the central mechanism
- Multi-head attention
- Positional encoding — sinusoidal, learned, RoPE, ALiBi
- The transformer block — attention + MLP + residual + norm
- GPT from scratch — minimal PyTorch implementation
MVU: You can draw the transformer block on a whiteboard from memory and explain every arrow.
Exercise: Implement a 6-layer GPT in <300 lines of PyTorch. Train on TinyShakespeare. Generate convincing fake Shakespeare.
Stage 07 — Modern LLMs
Prereqs: Stage 06.
Articles:
- Scaling laws — Chinchilla, compute-optimal training
- Mixture of Experts — routing, sparse activation
- Reasoning models — o-series, R1, test-time compute
- Long context — 1M+ tokens, attention variants, retrieval interplay
- Frontier architectures — what 2026 frontier models look like
MVU: You can describe what’s different between a 2022 LLM and a 2026 frontier model in 5 dimensions.
Exercise: Run the same complex reasoning prompt through a non-reasoning model (Haiku) and a reasoning model (o-series). Compare outputs.
Stage 08 — Prompting
Prereqs: Stage 06 (mechanically); useful from Stage 04 (operationally).
Articles:
- Prompt fundamentals — system/user/assistant, context windows
- Few-shot & chain-of-thought
- Structured outputs — JSON mode, tool calling, schemas
- Advanced techniques — self-consistency, ReAct, tree of thoughts, reflexion
- Sampling & decoding — temperature, top-p/k, beam, speculative
MVU: Given a task, you can predict whether prompting alone will solve it or whether you need RAG/fine-tuning/agents.
Exercise: Write a system prompt that gets a model to output valid JSON for a non-trivial schema 100/100 times.
Stage 09 — RAG
Prereqs: Stages 05, 08.
Articles:
- RAG fundamentals
- Chunking strategies — fixed, semantic, structural, late chunking
- Embedding models for retrieval
- Vector databases — pgvector, Qdrant, LanceDB, when each wins
- Hybrid search & reranking — BM25 + dense, cross-encoders
- Advanced retrieval patterns — HyDE, FLARE, query decomposition, GraphRAG
- Evaluating RAG — retrieval@k, faithfulness, RAGAs, golden sets
MVU: You can build a RAG system end-to-end and explain every failure mode it could exhibit in production.
Exercise: Build a RAG over your own notes. Then break it (ambiguous queries, multi-hop questions). Fix one failure mode.
Stage 10 — Fine-Tuning
Prereqs: Stages 06, 08.
Articles:
- When to fine-tune — decision flow vs prompting and RAG
- Supervised fine-tuning (SFT)
- LoRA & QLoRA
- RLHF, DPO, GRPO — preference and reward-based training
- Embedding fine-tuning
- Data & tooling — TRL, Axolotl, Unsloth, dataset design
MVU: Given a use case, you can pick the right fine-tuning method and estimate cost/data needs.
Exercise: LoRA-fine-tune a 7B model on a 1k-row instruction dataset. Evaluate against the base model on a held-out test set.
Stage 11 — Agents
Prereqs: Stages 08, 09.
Articles:
- Agent loop & architecture
- Tool use & function calling
- Memory systems — working, episodic, semantic
- Planning & reflection — ReAct, plan-and-execute, reflexion
- Multi-agent orchestration — supervisor, swarm, debate
- Guardrails & safety
- Browser & vision agents
MVU: You can build a single-agent loop in <100 lines of code and articulate when to add a second agent vs a single one with more tools.
Exercise: Build an agent that can search the web, read pages, and answer multi-hop questions. No framework allowed for the first version.
Stage 12 — Multimodal
Prereqs: Stages 05, 06.
Articles:
- Multimodal embeddings (CLIP)
- Vision-language models — VLMs, image-to-text, doc understanding
- Text-to-image diffusion
- Video generation — Sora, Wan, Veo
- Speech & TTS — Whisper, modern TTS, real-time voice
- Synthetic data
MVU: You know which modalities to combine for a given application and can pick a stack.
Exercise: Build a “search my photos by description” feature using CLIP embeddings.
Stage 13 — Production
Prereqs: Stages 08–11.
Articles:
- Deployment architectures
- Evaluation & benchmarks — offline, online, LLM-as-judge
- Guardrails — input/output filters, schema validation, jailbreak defense
- Observability & tracing
- Cost & latency — caching, batching, speculative decoding, model routing
- Hallucination mitigation
- Data systems for AI — ingestion, skew, drift, feedback loops, lineage
- Enterprise considerations — security, compliance, data boundaries
MVU: You can ship an LLM feature to production and have it stay shipped.
Exercise: Add evals + observability + a guardrail to a Stage 09 RAG. Run for a week with real traffic. Read the traces.
Stage 14 — Applications
Prereqs: Stages 09–13.
Articles:
- Text-to-SQL
- Text-to-code — Copilot, Cursor, Claude Code patterns
- Browser agents
- Financial reasoning
- Case studies
MVU: You can lift a pattern from any of these applications and apply it to your own domain.
Exercise: Pick one application area and build a working v0 in a weekend.
Stage 15 — Engineering & Career
Prereqs: None — read at any time.
Articles:
- AI engineer roles — applied, research, infra, product
- Learning roadmap — what to build to get hired
- Staying current — feeds, papers, communities
MVU: You have a 90-day plan for what you’ll build next.
Exercise: Ship something — a side project, a blog post, a fine-tuned model on HuggingFace. Anything public.
Going beyond
When you finish stage 15, the path doesn’t end — the field moves too fast for any path to be “complete.” Pick a sub-area (e.g. agent eval, post-training, multimodal retrieval) and go deep. Read the latest papers. Reproduce one. Write about what you learn. That’s how you stay sharp.