Track A — Software Engineer → AI Product Engineer

For someone who can already write code at a senior-engineer level and wants to ship AI-powered features in production. You’ll skip the math/ML internals on the first pass and circle back when something breaks that requires it.

Time: 8–12 weeks at ~10 hours/week. Endpoint: you can scope, build, evaluate, and operate an AI feature — RAG, agent, or both — at production quality.


What you skip (for now)

  • Stages 1–4 — math, ML fundamentals, neural network internals, language modeling history. You don’t need them to ship; come back when you hit a debugging wall.

You’ll also skim, not read large parts of:

  • Stage 6’s GPT-from-scratch.
  • Stage 10’s RLHF/DPO/GRPO mechanics.

You can build a great AI product without ever training a model. But you do need to understand what’s happening enough to debug.


Week-by-week

Week 1 — Mental model + first calls

Read:

Build:

  • 100 API calls to Claude or GPT through the official SDK.
  • A CLI tool: stdin → model → stdout, with streaming.
  • Try varying temperature, top-p; observe.

Goal at end of week: “I can call an LLM and explain what every parameter does.”

Week 2 — Prompting and structured output

Read:

Build:

  • An email-classifier that takes a subject + body and returns one of 8 categories with reasons. JSON output.
  • Run it on 100 test emails. Calculate accuracy.
  • Make it output JSON 100/100 times reliably (use strict mode / tool calls).

Goal: “I trust my LLM calls to give me the structure I asked for.”

Read:

Build:

  • Take 1k documents (your notes, Wikipedia subset, ArXiv abstracts).
  • Embed with text-embedding-3-small or bge-large-en-v1.5.
  • Store in pgvector or ChromaDB.
  • Query CLI: text in → top-5 nearest neighbors out.
  • Evaluate: write 30 query/expected-doc pairs; measure recall@5.

Goal: “I can build semantic search and tell you when it’s broken.”

Week 4 — Full RAG

Read:

Build:

  • Wrap your search from week 3 with a generation step: retrieve → prompt → answer with citations.
  • Add hybrid search (BM25 + dense).
  • Add a reranker (Cohere rerank-3.5, bge-reranker-v2-m3, or LLM-as-judge).
  • Build a 50-query eval set with expected sources.
  • Measure: recall@10, faithfulness via LLM judge.

Goal: “My RAG works, and I can prove it.”

Week 5 — Advanced RAG and agents

Read:

Build:

  • An agent loop in <100 lines, no framework.
  • Five tools: search the web, read URL, search your KB, calculator, finalize.
  • Make it answer a multi-hop question correctly: “Who was the CEO of Apple when the iPhone 7 launched?”

Goal: “I can write an agent loop from scratch and explain every step.”

Week 6 — Agent depth

Read:

Build:

  • Add: budget caps, retry-on-tool-failure, conversation summarization for long sessions, basic input filter (length cap, profanity).
  • Try the same agent with a reasoning model (Claude with extended thinking, o-series). Compare quality, cost, latency.

Goal: “I know when to use a reasoning model and when it’s overkill.”

Week 7 — Production discipline

Read:

Build:

  • Add tracing to your RAG and your agent (Langfuse, Phoenix, or LangSmith).
  • Add cost monitoring per request.
  • Add prompt caching for static prefixes.
  • Set up two-tier routing (cheap model first, fallback to frontier).

Goal: “I can debug a production issue from the trace alone.”

Week 8 — Guardrails, hallucination, and evals

Read:

Build:

  • Add: input validation (PII detection, length caps, prompt injection scan), output validation (schema check, citation verification).
  • Build a regression eval: 50 cases, run on every prompt change.
  • Add an LLM-as-judge faithfulness check.
  • Wire all of this into a CI pipeline.

Goal: “I won’t ship a regression.”

Weeks 9–12 — Ship something real

Pick one of:

  • A vertical agent for a domain you know (legal contract review, recipe assistant, code reviewer for a specific framework).
  • A vertical RAG over a corpus you care about (your own notes, a public dataset, internal docs at work).
  • A real workflow (transcribe meetings → action items, summarize daily news, monitor a topic for changes).

Polish to “would show a stranger” quality. Write up the engineering decisions. Post on GitHub + your blog or LinkedIn.

Goal at end: something public with your name on it.


When to backtrack into the foundations

Skip Stages 1–4 until one of these happens:

You can always come back. The path is a graph, not a staircase.


What “done with Track A” looks like

You can:

  • Take a fuzzy product idea (“we should add AI to X”) and design a concrete system.
  • Estimate cost and latency before writing code.
  • Build a RAG or agent end-to-end with proper evals and guardrails.
  • Debug a production failure from a trace in <30 minutes.
  • Articulate when fine-tuning would help and when it wouldn’t.
  • Ship without breaking the bank.

That’s most of what an AI product engineer does day-to-day. From here, the next investments are:

  • Stage 14 case studies to lift patterns from real products.
  • Stage 15 career to think about specialization.
  • Stages 1–7 when curiosity or job needs pull you there.

See also