Planning & Reflection
The naive agent loop reacts step by step. For complex tasks, it can drift, miss obvious shortcuts, or fail without learning. Planning and reflection patterns add structure and self-correction.
ReAct (Reason + Act)
Yao et al. (2022). The original agentic-prompt pattern:
Thought: I need to find X. Let me search.
Action: search("X")
Observation: <result>
Thought: That's not quite right. Let me try a different query.
Action: search("Y")
Observation: <result>
Thought: Now I have enough to answer.
Final Answer: ...
Pre-native-tool-use, this was implemented by parsing Action: lines from text. Today, native tool calling does this automatically with cleaner mechanics, but the underlying pattern — reason, act, observe, repeat — is the same.
Plan-and-Execute
Before acting, generate a high-level plan:
Step 1: Plan
Plan:
1. Search for X
2. Read top 3 results
3. Compare findings
4. Synthesize answer
Step 2: Execute each step
Step 1: search("X") → results
Step 2: read(results[0]), read(results[1]), read(results[2]) → contents
Step 3: <reason about contents>
Step 4: produce answer
Pros:
- Forces upfront thinking; less drift.
- Easier to display progress to users.
- Sub-steps can be parallelized.
Cons:
- Plans can be wrong; replanning is needed.
- Adds an extra LLM call before action.
A common pattern in research agents and customer-support agents.
Plan-Replan loop
Fixed plans rarely survive contact with reality. A robust agent replans when:
- An action fails unexpectedly.
- The current plan no longer makes sense given new information.
- A budget cap is approached.
plan = make_plan(task)
for step in plan:
result = execute(step)
if needs_replan(result, plan):
plan = make_plan(task, history)
Reflection / Reflexion
Shinn et al. (2023). After failing a task, the agent generates a written reflection (“here’s what I tried, here’s what didn’t work, here’s what I’ll do differently”) and stores it in memory. Subsequent attempts use the reflection as context.
Attempt 1: <output> → fail (test fails)
Reflection: I assumed the API returns objects but it returns lists.
Attempt 2 (with reflection in context): <better output>
For tasks with feedback signals (test results, user thumbs-up/down), reflexion meaningfully boosts iterative agents.
Self-critique
Before submitting a final answer, the agent critiques its own draft:
Draft: <answer>
Critique: Does this address the user's question? Is anything missing? Any errors?
Final: <revised answer>
Often catches obvious mistakes. Adds 1.5–2× latency for the second pass.
Tree of Thoughts (ToT) for agents
Yao et al. (2023). Branch on different actions, explore each, score, pick best. For agents, this is action-level search:
Current state
├── Try action A → result A → score
├── Try action B → result B → score
└── Try action C → result C → score
→ pick best path; continue from there
Expensive (multiplies API calls) but sometimes the only way to handle adversarial or high-uncertainty problems.
Verification loops
Some tasks have a verifier — a way to check if the answer is correct (compiler, tests, schema validator). Use it inside the agent:
loop:
answer = generate()
if verify(answer):
return answer
else:
feedback = verifier_message(answer)
# add feedback to context, try again
The agent gets ground-truth feedback without a human in the loop. Used heavily by code agents (Claude Code, Cursor’s agent) and math agents.
Decomposition
For multi-part tasks, decompose into independent subtasks. Each subtask is a sub-agent or a function call:
Task: "Compare the security features of products A, B, C"
↓
Subtasks:
- Get security features of A
- Get security features of B
- Get security features of C
- Compare and summarize
↓
Each subtask runs (potentially in parallel); results aggregated.
Critical for keeping context windows manageable on big tasks.
Reasoning models change this
Modern reasoning models (Claude with extended thinking, o-series, R1) internalize much of this:
- They plan before acting.
- They reflect during reasoning.
- They self-critique implicitly.
Asking a reasoning model to “make a plan, then execute” can be redundant — it does this naturally.
For non-reasoning models, explicit planning patterns still help.
When reasoning is overkill
For simple, narrow tool use (e.g. “look up the order status”), reasoning patterns are wasteful. Use them when:
- Tasks have multiple plausible approaches.
- Failure modes are subtle.
- The agent needs to recover from errors gracefully.
- The cost of a wrong answer is high.
Operational concerns
Loops
Reflection and planning patterns can trigger loops where the agent keeps “reflecting” without progress.
Mitigations:
- Per-task max-iterations.
- Detect repetitive plans/reflections.
- Force progress: each loop iteration must do something (call a tool, output a deliverable).
Context cost
Plans, reflections, critiques all consume tokens. A heavily-reasoning agent on a 50-step task can use 100k+ tokens just for reasoning.
Mitigations:
- Compact / summarize old plans and reflections.
- Use a smaller cheaper model for sub-tasks (planning by Haiku, execution by Sonnet).
- Cap reasoning budgets per step.
Visibility
For end-user products: plans and reflections are useful UX (show progress, build trust). But they can also overwhelm. Defaults that work:
- Surface plans, hide reflections.
- Show high-level steps, hide low-level reasoning.
- Provide an “explain” button for users who want depth.
Combining patterns
Many production agents stack patterns:
1. Plan upfront
2. Each step: ReAct (reason + tool)
3. Verify with a tool when possible (tests, schema)
4. On failure: reflect, replan
5. Final answer: self-critique pass
This is roughly the architecture of Claude Code on complex tasks.
Pitfalls
- Reflection without learning: the agent reflects but ignores its own conclusions.
- Plans frozen too early: rigid plans don’t adapt; replan.
- Endless self-critique: the agent can always find something to “improve.” Cap iterations.
- Decomposition into too-small pieces: overhead exceeds benefit.
- Pretending to plan: the agent generates a plan it doesn’t follow. Tie planning to actual execution.