Track A — Software Engineer → AI Product Engineer
For someone who can already write code at a senior-engineer level and wants to ship AI-powered features in production. You’ll skip the math/ML internals on the first pass and circle back when something breaks that requires it.
Time: 8–12 weeks at ~10 hours/week. Endpoint: you can scope, build, evaluate, and operate an AI feature — RAG, agent, or both — at production quality.
What you skip (for now)
- Stages 1–4 — math, ML fundamentals, neural network internals, language modeling history. You don’t need them to ship; come back when you hit a debugging wall.
You’ll also skim, not read large parts of:
- Stage 6’s GPT-from-scratch.
- Stage 10’s RLHF/DPO/GRPO mechanics.
You can build a great AI product without ever training a model. But you do need to understand what’s happening enough to debug.
Week-by-week
Week 1 — Mental model + first calls
Read:
- Top README and LEARNING_PATH.md.
- Stage 4 — Why transformers (skim — you need the intuition, not the history).
- Stage 5 — README, Tokenization.
- Stage 8 — Prompt fundamentals.
Build:
- 100 API calls to Claude or GPT through the official SDK.
- A CLI tool: stdin → model → stdout, with streaming.
- Try varying temperature, top-p; observe.
Goal at end of week: “I can call an LLM and explain what every parameter does.”
Week 2 — Prompting and structured output
Read:
Build:
- An email-classifier that takes a subject + body and returns one of 8 categories with reasons. JSON output.
- Run it on 100 test emails. Calculate accuracy.
- Make it output JSON 100/100 times reliably (use strict mode / tool calls).
Goal: “I trust my LLM calls to give me the structure I asked for.”
Week 3 — Embeddings and search
Read:
Build:
- Take 1k documents (your notes, Wikipedia subset, ArXiv abstracts).
- Embed with
text-embedding-3-smallorbge-large-en-v1.5. - Store in pgvector or ChromaDB.
- Query CLI: text in → top-5 nearest neighbors out.
- Evaluate: write 30 query/expected-doc pairs; measure recall@5.
Goal: “I can build semantic search and tell you when it’s broken.”
Week 4 — Full RAG
Read:
Build:
- Wrap your search from week 3 with a generation step: retrieve → prompt → answer with citations.
- Add hybrid search (BM25 + dense).
- Add a reranker (Cohere
rerank-3.5,bge-reranker-v2-m3, or LLM-as-judge). - Build a 50-query eval set with expected sources.
- Measure: recall@10, faithfulness via LLM judge.
Goal: “My RAG works, and I can prove it.”
Week 5 — Advanced RAG and agents
Read:
- Stage 9 — Advanced retrieval patterns (skim — pick 2 to understand deeply).
- Stage 11 — Agent loop & architecture.
- Stage 11 — Tool use.
Build:
- An agent loop in <100 lines, no framework.
- Five tools: search the web, read URL, search your KB, calculator, finalize.
- Make it answer a multi-hop question correctly: “Who was the CEO of Apple when the iPhone 7 launched?”
Goal: “I can write an agent loop from scratch and explain every step.”
Week 6 — Agent depth
Read:
- Stage 11 — Memory systems.
- Stage 11 — Planning & reflection.
- Stage 11 — Guardrails & safety.
- Stage 7 — Reasoning models.
Build:
- Add: budget caps, retry-on-tool-failure, conversation summarization for long sessions, basic input filter (length cap, profanity).
- Try the same agent with a reasoning model (Claude with extended thinking, o-series). Compare quality, cost, latency.
Goal: “I know when to use a reasoning model and when it’s overkill.”
Week 7 — Production discipline
Read:
Build:
- Add tracing to your RAG and your agent (Langfuse, Phoenix, or LangSmith).
- Add cost monitoring per request.
- Add prompt caching for static prefixes.
- Set up two-tier routing (cheap model first, fallback to frontier).
Goal: “I can debug a production issue from the trace alone.”
Week 8 — Guardrails, hallucination, and evals
Read:
Build:
- Add: input validation (PII detection, length caps, prompt injection scan), output validation (schema check, citation verification).
- Build a regression eval: 50 cases, run on every prompt change.
- Add an LLM-as-judge faithfulness check.
- Wire all of this into a CI pipeline.
Goal: “I won’t ship a regression.”
Weeks 9–12 — Ship something real
Pick one of:
- A vertical agent for a domain you know (legal contract review, recipe assistant, code reviewer for a specific framework).
- A vertical RAG over a corpus you care about (your own notes, a public dataset, internal docs at work).
- A real workflow (transcribe meetings → action items, summarize daily news, monitor a topic for changes).
Polish to “would show a stranger” quality. Write up the engineering decisions. Post on GitHub + your blog or LinkedIn.
Goal at end: something public with your name on it.
When to backtrack into the foundations
Skip Stages 1–4 until one of these happens:
- Embedding similarity is doing weird things. → Stage 5 — Semantic geometry and a refresher on linear algebra (cosine, dot products, dimensionality).
- You’re choosing between models and don’t know what
Bin7Bmeans. → Stage 6 — Transformer block. - Someone asks you why their fine-tune broke and you can’t even ask the right questions. → Stage 10 — When to fine-tune.
- You hit a real classification problem that prompting can’t solve. → Stage 2 — ML Fundamentals.
You can always come back. The path is a graph, not a staircase.
What “done with Track A” looks like
You can:
- Take a fuzzy product idea (“we should add AI to X”) and design a concrete system.
- Estimate cost and latency before writing code.
- Build a RAG or agent end-to-end with proper evals and guardrails.
- Debug a production failure from a trace in <30 minutes.
- Articulate when fine-tuning would help and when it wouldn’t.
- Ship without breaking the bank.
That’s most of what an AI product engineer does day-to-day. From here, the next investments are:
- Stage 14 case studies to lift patterns from real products.
- Stage 15 career to think about specialization.
- Stages 1–7 when curiosity or job needs pull you there.