AI / ML / AI Engineering — Learning Path
A structured, opinionated learning path that takes you from linear algebra to building production AI systems. Each stage is a folder; read in order or jump in where you fit.
Note on grounding. Earlier wiki content in
../wiki/is strictly grounded in../raw/source PDFs. Thisorganized/tree is pedagogy-first — it draws from the curriculum, canonical books, and modern practice (through 2026) without forcing every claim back to a citation. Where a specific source matters (e.g. a paper, a book chapter), it’s cited inline. Treat this as a textbook you can navigate, not a citation index.
How to use this path
- Total or vertical-slice. The 15 stages are sequential, but you can vertical-slice: read the foundation of each stage (the
README.md) and then drill into one application area (e.g. RAG → production → applications). - Read the stage README first. Every stage opens with prerequisites, a learning ladder, key concepts, and a “minimum viable understanding” checklist. Don’t skip it.
- Build, then read more. Most stages have an “exercises” section. The fastest way through this material is to implement a small thing in each stage before moving on.
- Backtrack freely. Hit a wall in stage 6 because vector spaces feel hand-wavy? Go back to stage 1. The path is a graph, not a staircase.
The 15 stages
| # | Stage | Why it’s here | Time |
|---|---|---|---|
| 01 | Math foundations | Linear algebra, probability, calculus, info theory — the language ML is written in | 2–4 weeks |
| 02 | ML fundamentals | Supervised/unsupervised, loss & optimization, evaluation, classical algorithms | 2–4 weeks |
| 03 | Neural networks | Perceptrons → MLPs → backprop → optimizers — how networks actually learn | 2–3 weeks |
| 04 | Language modeling | n-grams → RNNs → why transformers won | 1–2 weeks |
| 05 | Tokens & embeddings | How text becomes vectors; static vs contextual embeddings | 1 week |
| 06 | Transformers | Self-attention (KQV), multi-head, positional encoding, GPT from scratch | 2–3 weeks |
| 07 | Modern LLMs | Scaling laws, MoE, reasoning models, long-context, frontier architectures | 1–2 weeks |
| 08 | Prompting | Zero/few-shot, CoT, structured output, sampling, prompt patterns | 1 week |
| 09 | RAG | Fundamentals → chunking → vector DBs → hybrid search → reranking → eval | 2 weeks |
| 10 | Fine-tuning | When to FT, SFT, LoRA/QLoRA, RLHF/DPO/GRPO, embedding FT, datasets | 2–3 weeks |
| 11 | Agents | Agent loops, tools, memory, planning, multi-agent, browser/vision agents | 2 weeks |
| 12 | Multimodal | CLIP, VLMs, diffusion, video gen, TTS, synthetic data | 1–2 weeks |
| 13 | Production | Evals, guardrails, observability, scaling, hallucinations, enterprise | 2–3 weeks |
| 14 | Applications | Text-to-SQL, code gen, browser agents, financial reasoning, case studies | 1–2 weeks |
| 15 | Engineering & career | Roles, roadmap, staying current, what to actually build | ongoing |
Total: 24–36 weeks if you build alongside reading. Less if you’re already partway up the stack.
Three reading tracks
Not everyone needs all 15 stages. Pick a track. Each one has a week-by-week cheat sheet:
| Track | For | Time | Cheat sheet |
|---|---|---|---|
| A — SWE → AI Product Engineer | You can ship code; you want to ship AI features | 8–12 weeks | TRACK_A_SWE_TO_AI.md |
| B — ML Engineer → LLM Specialist | You know ML; you want LLM depth | 12–18 weeks | TRACK_B_ML_TO_LLM.md |
| C — Complete from scratch | Starting fresh; want the full foundation | 24–36 weeks | TRACK_C_FROM_SCRATCH.md |
Each cheat sheet has weekly reading + building targets, milestones, and pivot signals.
Exercise solutions
Worked solutions to the most concrete exercises live in EXERCISE_SOLUTIONS/. Try the exercise yourself first; check the solution after.
Prerequisites
- Python. All examples are Python. If you’re new, do one Python crash course (e.g. Automate the Boring Stuff) first.
- High-school math. Algebra, basic geometry. Stage 01 covers everything beyond that.
- A GPU is helpful but optional. Most exercises run on CPU or Colab/Modal/Lightning. Heavy training (GPT from scratch, fine-tuning) is the only place a GPU is required.
- Curiosity over credentials. None of this requires a CS degree.
Tooling you’ll meet
| Layer | Tools |
|---|---|
| Numerical | NumPy, PyTorch, JAX |
| Tokenization | tiktoken, SentencePiece, HuggingFace tokenizers |
| LLM access | Anthropic SDK, OpenAI SDK, LiteLLM, Ollama |
| Embeddings | sentence-transformers, OpenAI/Voyage/Cohere APIs |
| Vector DBs | pgvector, Qdrant, Weaviate, LanceDB, Pinecone |
| Fine-tuning | TRL, PEFT, Axolotl, LlamaFactory, Unsloth |
| Agents | Anthropic Agent SDK, LangGraph, smolagents, raw loops |
| Eval | promptfoo, Inspect, Braintrust, Langfuse |
| Observability | Langfuse, LangSmith, Phoenix, OpenTelemetry |
We’ll introduce them as needed. No pre-reading required.
Conventions
- Code blocks use triple-backtick fences with language tags. Run them in order; they accumulate.
- Callouts use blockquotes with prefixes:
> Why:,> Pitfall:,> Aside:. - “See also” sections at the bottom of each article link forward and backward in the path.
- Year markers appear when something is recent: e.g. (2025). The path is current as of early 2026.
What this path is not
- Not a research curriculum. We don’t derive proofs. Goodfellow et al.’s Deep Learning is the textbook for that.
- Not a benchmark race. We talk about state-of-the-art conceptually; specific leaderboard numbers move every month.
- Not vendor-locked. Models from Anthropic, OpenAI, Google, Meta, and open-source all show up. Pick what works for you.
Updating this path
The path is a living document. When something fundamental shifts (e.g. a new training paradigm, a new architecture class), the relevant stage gets updated and the change is noted at the top of that stage’s README.
Ready? Start with 01-math-foundations or jump to your track.