Learning Path — Detailed Walk-Through

This is the long-form companion to README.md. For each stage, it lists the prerequisite knowledge, the articles inside, the “minimum viable understanding” (MVU) you should have before moving on, and the exercises that prove it.

If README.md is the map, this is the trail guide.

Stage 01 — Math Foundations

Prereqs: high-school algebra.

Articles:

Linear algebra — vectors, matrices, dot product, eigenvectors
Probability & statistics — distributions, expectation, MLE, Bayes
Calculus & optimization — derivatives, gradients, gradient descent
Information theory — entropy, KL divergence, cross-entropy

MVU: You can compute a dot product by hand, explain why softmax outputs sum to 1, and write the gradient descent update rule from memory.

Exercise: Implement gradient descent on y = (x - 3)^2 in NumPy. Plot the loss over 100 steps.

Stage 02 — ML Fundamentals

Prereqs: Stage 01.

Articles:

Supervised learning — regression, classification, the train/val/test split
Unsupervised learning — clustering, dimensionality reduction
Loss functions & optimization — MSE, cross-entropy, SGD, momentum
Evaluation & metrics — accuracy, precision/recall, F1, AUC, calibration
Regularization & generalization — bias/variance, L1/L2, early stopping
Classical algorithms — linear/logistic regression, trees, ensembles

MVU: You can explain the bias-variance tradeoff to a non-ML engineer in 90 seconds.

Exercise: Train a logistic regression on Iris in scikit-learn. Compute precision, recall, and F1 by hand from the confusion matrix.

Stage 03 — Neural Networks

Prereqs: Stages 01–02.

Articles:

Perceptrons & MLPs
Backpropagation — chain rule, computational graphs, autograd
Activations & initialization
Optimizers — SGD, Adam, AdamW, schedulers
Regularization techniques — dropout, batch/layer norm, weight decay
Architectures: CNN & RNN

MVU: You can hand-derive backprop through a 2-layer MLP and explain why we need non-linear activations.

Exercise: Train an MLP on MNIST in raw PyTorch (no nn.Sequential). Hit >97% test accuracy.

Stage 04 — Language Modeling

Prereqs: Stage 03.

Articles:

n-gram models
Neural language models — Bengio 2003, word embeddings as a side effect
RNNs & LSTMs
Why transformers — the failure modes RNNs couldn’t escape

MVU: You can explain perplexity, the vanishing gradient problem, and why parallelism matters for scaling.

Exercise: Train a character-level RNN on a small corpus (~1MB) and generate text. Notice how it forgets context past ~50 chars.

Stage 05 — Tokens & Embeddings

Prereqs: Stages 03–04.

Articles:

Tokenization — BPE, WordPiece, SentencePiece, byte-level tradeoffs
Static embeddings — Word2Vec, GloVe, FastText
Contextual embeddings — ELMo, BERT, modern embedding models
Semantic geometry — cosine similarity, dimensionality, why “king − man + woman ≈ queen” works

MVU: Given a sentence, you can describe what happens between raw text and the first transformer layer.

Exercise: Take 1000 product reviews, embed them with sentence-transformers/all-MiniLM-L6-v2, find the 5 nearest neighbors of a query review.

Stage 06 — Transformers

Prereqs: Stages 03–05.

Articles:

Self-attention (KQV) — the central mechanism
Multi-head attention
Positional encoding — sinusoidal, learned, RoPE, ALiBi
The transformer block — attention + MLP + residual + norm
GPT from scratch — minimal PyTorch implementation

MVU: You can draw the transformer block on a whiteboard from memory and explain every arrow.

Exercise: Implement a 6-layer GPT in <300 lines of PyTorch. Train on TinyShakespeare. Generate convincing fake Shakespeare.

Stage 07 — Modern LLMs

Prereqs: Stage 06.

Articles:

Scaling laws — Chinchilla, compute-optimal training
Mixture of Experts — routing, sparse activation
Reasoning models — o-series, R1, test-time compute
Long context — 1M+ tokens, attention variants, retrieval interplay
Frontier architectures — what 2026 frontier models look like

MVU: You can describe what’s different between a 2022 LLM and a 2026 frontier model in 5 dimensions.

Exercise: Run the same complex reasoning prompt through a non-reasoning model (Haiku) and a reasoning model (o-series). Compare outputs.

Stage 08 — Prompting

Prereqs: Stage 06 (mechanically); useful from Stage 04 (operationally).

Articles:

Prompt fundamentals — system/user/assistant, context windows
Few-shot & chain-of-thought
Structured outputs — JSON mode, tool calling, schemas
Advanced techniques — self-consistency, ReAct, tree of thoughts, reflexion
Sampling & decoding — temperature, top-p/k, beam, speculative

MVU: Given a task, you can predict whether prompting alone will solve it or whether you need RAG/fine-tuning/agents.

Exercise: Write a system prompt that gets a model to output valid JSON for a non-trivial schema 100/100 times.

Stage 09 — RAG

Prereqs: Stages 05, 08.

Articles:

RAG fundamentals
Chunking strategies — fixed, semantic, structural, late chunking
Embedding models for retrieval
Vector databases — pgvector, Qdrant, LanceDB, when each wins
Hybrid search & reranking — BM25 + dense, cross-encoders
Advanced retrieval patterns — HyDE, FLARE, query decomposition, GraphRAG
Evaluating RAG — retrieval@k, faithfulness, RAGAs, golden sets

MVU: You can build a RAG system end-to-end and explain every failure mode it could exhibit in production.

Exercise: Build a RAG over your own notes. Then break it (ambiguous queries, multi-hop questions). Fix one failure mode.

Stage 10 — Fine-Tuning

Prereqs: Stages 06, 08.

Articles:

When to fine-tune — decision flow vs prompting and RAG
Supervised fine-tuning (SFT)
LoRA & QLoRA
RLHF, DPO, GRPO — preference and reward-based training
Embedding fine-tuning
Data & tooling — TRL, Axolotl, Unsloth, dataset design

MVU: Given a use case, you can pick the right fine-tuning method and estimate cost/data needs.

Exercise: LoRA-fine-tune a 7B model on a 1k-row instruction dataset. Evaluate against the base model on a held-out test set.

Stage 11 — Agents

Prereqs: Stages 08, 09.

Articles:

Agent loop & architecture
Tool use & function calling
Memory systems — working, episodic, semantic
Planning & reflection — ReAct, plan-and-execute, reflexion
Multi-agent orchestration — supervisor, swarm, debate
Guardrails & safety
Browser & vision agents

MVU: You can build a single-agent loop in <100 lines of code and articulate when to add a second agent vs a single one with more tools.

Exercise: Build an agent that can search the web, read pages, and answer multi-hop questions. No framework allowed for the first version.

Stage 12 — Multimodal

Prereqs: Stages 05, 06.

Articles:

Multimodal embeddings (CLIP)
Vision-language models — VLMs, image-to-text, doc understanding
Text-to-image diffusion
Video generation — Sora, Wan, Veo
Speech & TTS — Whisper, modern TTS, real-time voice
Synthetic data

MVU: You know which modalities to combine for a given application and can pick a stack.

Exercise: Build a “search my photos by description” feature using CLIP embeddings.

Stage 13 — Production

Prereqs: Stages 08–11.

Articles:

Deployment architectures
Evaluation & benchmarks — offline, online, LLM-as-judge
Guardrails — input/output filters, schema validation, jailbreak defense
Observability & tracing
Cost & latency — caching, batching, speculative decoding, model routing
Hallucination mitigation
Data systems for AI — ingestion, skew, drift, feedback loops, lineage
Enterprise considerations — security, compliance, data boundaries

MVU: You can ship an LLM feature to production and have it stay shipped.

Exercise: Add evals + observability + a guardrail to a Stage 09 RAG. Run for a week with real traffic. Read the traces.

Stage 14 — Applications

Prereqs: Stages 09–13.

Articles:

Text-to-SQL
Text-to-code — Copilot, Cursor, Claude Code patterns
Browser agents
Financial reasoning
Case studies

MVU: You can lift a pattern from any of these applications and apply it to your own domain.

Exercise: Pick one application area and build a working v0 in a weekend.

Stage 15 — Engineering & Career

Prereqs: None — read at any time.

Articles:

AI engineer roles — applied, research, infra, product
Learning roadmap — what to build to get hired
Staying current — feeds, papers, communities

MVU: You have a 90-day plan for what you’ll build next.

Exercise: Ship something — a side project, a blog post, a fine-tuned model on HuggingFace. Anything public.

Going beyond

When you finish stage 15, the path doesn’t end — the field moves too fast for any path to be “complete.” Pick a sub-area (e.g. agent eval, post-training, multimodal retrieval) and go deep. Read the latest papers. Reproduce one. Write about what you learn. That’s how you stay sharp.