AI / ML / AI Engineering — Learning Path

A structured, opinionated learning path that takes you from linear algebra to building production AI systems. Each stage is a folder; read in order or jump in where you fit.

Note on grounding. Earlier wiki content in ../wiki/ is strictly grounded in ../raw/ source PDFs. This organized/ tree is pedagogy-first — it draws from the curriculum, canonical books, and modern practice (through 2026) without forcing every claim back to a citation. Where a specific source matters (e.g. a paper, a book chapter), it’s cited inline. Treat this as a textbook you can navigate, not a citation index.

How to use this path

Total or vertical-slice. The 15 stages are sequential, but you can vertical-slice: read the foundation of each stage (the README.md) and then drill into one application area (e.g. RAG → production → applications).
Read the stage README first. Every stage opens with prerequisites, a learning ladder, key concepts, and a “minimum viable understanding” checklist. Don’t skip it.
Build, then read more. Most stages have an “exercises” section. The fastest way through this material is to implement a small thing in each stage before moving on.
Backtrack freely. Hit a wall in stage 6 because vector spaces feel hand-wavy? Go back to stage 1. The path is a graph, not a staircase.

The 15 stages

#	Stage	Why it’s here	Time
01	Math foundations	Linear algebra, probability, calculus, info theory — the language ML is written in	2–4 weeks
02	ML fundamentals	Supervised/unsupervised, loss & optimization, evaluation, classical algorithms	2–4 weeks
03	Neural networks	Perceptrons → MLPs → backprop → optimizers — how networks actually learn	2–3 weeks
04	Language modeling	n-grams → RNNs → why transformers won	1–2 weeks
05	Tokens & embeddings	How text becomes vectors; static vs contextual embeddings	1 week
06	Transformers	Self-attention (KQV), multi-head, positional encoding, GPT from scratch	2–3 weeks
07	Modern LLMs	Scaling laws, MoE, reasoning models, long-context, frontier architectures	1–2 weeks
08	Prompting	Zero/few-shot, CoT, structured output, sampling, prompt patterns	1 week
09	RAG	Fundamentals → chunking → vector DBs → hybrid search → reranking → eval	2 weeks
10	Fine-tuning	When to FT, SFT, LoRA/QLoRA, RLHF/DPO/GRPO, embedding FT, datasets	2–3 weeks
11	Agents	Agent loops, tools, memory, planning, multi-agent, browser/vision agents	2 weeks
12	Multimodal	CLIP, VLMs, diffusion, video gen, TTS, synthetic data	1–2 weeks
13	Production	Evals, guardrails, observability, scaling, hallucinations, enterprise	2–3 weeks
14	Applications	Text-to-SQL, code gen, browser agents, financial reasoning, case studies	1–2 weeks
15	Engineering & career	Roles, roadmap, staying current, what to actually build	ongoing

Total: 24–36 weeks if you build alongside reading. Less if you’re already partway up the stack.

Three reading tracks

Not everyone needs all 15 stages. Pick a track. Each one has a week-by-week cheat sheet:

Track	For	Time	Cheat sheet
A — SWE → AI Product Engineer	You can ship code; you want to ship AI features	8–12 weeks	TRACK_A_SWE_TO_AI.md
B — ML Engineer → LLM Specialist	You know ML; you want LLM depth	12–18 weeks	TRACK_B_ML_TO_LLM.md
C — Complete from scratch	Starting fresh; want the full foundation	24–36 weeks	TRACK_C_FROM_SCRATCH.md

Each cheat sheet has weekly reading + building targets, milestones, and pivot signals.

Exercise solutions

Worked solutions to the most concrete exercises live in EXERCISE_SOLUTIONS/. Try the exercise yourself first; check the solution after.

Prerequisites

Python. All examples are Python. If you’re new, do one Python crash course (e.g. Automate the Boring Stuff) first.
High-school math. Algebra, basic geometry. Stage 01 covers everything beyond that.
A GPU is helpful but optional. Most exercises run on CPU or Colab/Modal/Lightning. Heavy training (GPT from scratch, fine-tuning) is the only place a GPU is required.
Curiosity over credentials. None of this requires a CS degree.

Tooling you’ll meet

Layer	Tools
Numerical	NumPy, PyTorch, JAX
Tokenization	tiktoken, SentencePiece, HuggingFace tokenizers
LLM access	Anthropic SDK, OpenAI SDK, LiteLLM, Ollama
Embeddings	sentence-transformers, OpenAI/Voyage/Cohere APIs
Vector DBs	pgvector, Qdrant, Weaviate, LanceDB, Pinecone
Fine-tuning	TRL, PEFT, Axolotl, LlamaFactory, Unsloth
Agents	Anthropic Agent SDK, LangGraph, smolagents, raw loops
Eval	promptfoo, Inspect, Braintrust, Langfuse
Observability	Langfuse, LangSmith, Phoenix, OpenTelemetry

We’ll introduce them as needed. No pre-reading required.

Conventions

Code blocks use triple-backtick fences with language tags. Run them in order; they accumulate.
Callouts use blockquotes with prefixes: > Why:, > Pitfall:, > Aside:.
“See also” sections at the bottom of each article link forward and backward in the path.
Year markers appear when something is recent: e.g. (2025). The path is current as of early 2026.

What this path is not

Not a research curriculum. We don’t derive proofs. Goodfellow et al.’s Deep Learning is the textbook for that.
Not a benchmark race. We talk about state-of-the-art conceptually; specific leaderboard numbers move every month.
Not vendor-locked. Models from Anthropic, OpenAI, Google, Meta, and open-source all show up. Pick what works for you.

Updating this path

The path is a living document. When something fundamental shifts (e.g. a new training paradigm, a new architecture class), the relevant stage gets updated and the change is noted at the top of that stage’s README.

Ready? Start with 01-math-foundations or jump to your track.