Planning & Reflection

The naive agent loop reacts step by step. For complex tasks, it can drift, miss obvious shortcuts, or fail without learning. Planning and reflection patterns add structure and self-correction.

ReAct (Reason + Act)

Yao et al. (2022). The original agentic-prompt pattern:

Thought: I need to find X. Let me search.
Action: search("X")
Observation: <result>
Thought: That's not quite right. Let me try a different query.
Action: search("Y")
Observation: <result>
Thought: Now I have enough to answer.
Final Answer: ...

Pre-native-tool-use, this was implemented by parsing Action: lines from text. Today, native tool calling does this automatically with cleaner mechanics, but the underlying pattern — reason, act, observe, repeat — is the same.

Plan-and-Execute

Before acting, generate a high-level plan:

Step 1: Plan
  Plan:
  1. Search for X
  2. Read top 3 results
  3. Compare findings
  4. Synthesize answer

Step 2: Execute each step
  Step 1: search("X") → results
  Step 2: read(results[0]), read(results[1]), read(results[2]) → contents
  Step 3: <reason about contents>
  Step 4: produce answer

Pros:

Forces upfront thinking; less drift.
Easier to display progress to users.
Sub-steps can be parallelized.

Cons:

Plans can be wrong; replanning is needed.
Adds an extra LLM call before action.

A common pattern in research agents and customer-support agents.

Plan-Replan loop

Fixed plans rarely survive contact with reality. A robust agent replans when:

An action fails unexpectedly.
The current plan no longer makes sense given new information.
A budget cap is approached.

plan = make_plan(task)
for step in plan:
    result = execute(step)
    if needs_replan(result, plan):
        plan = make_plan(task, history)

Reflection / Reflexion

Shinn et al. (2023). After failing a task, the agent generates a written reflection (“here’s what I tried, here’s what didn’t work, here’s what I’ll do differently”) and stores it in memory. Subsequent attempts use the reflection as context.

Attempt 1: <output> → fail (test fails)
Reflection: I assumed the API returns objects but it returns lists.
Attempt 2 (with reflection in context): <better output>

For tasks with feedback signals (test results, user thumbs-up/down), reflexion meaningfully boosts iterative agents.

Self-critique

Before submitting a final answer, the agent critiques its own draft:

Draft: <answer>
Critique: Does this address the user's question? Is anything missing? Any errors?
Final: <revised answer>

Often catches obvious mistakes. Adds 1.5–2× latency for the second pass.

Tree of Thoughts (ToT) for agents

Yao et al. (2023). Branch on different actions, explore each, score, pick best. For agents, this is action-level search:

Current state
├── Try action A → result A → score
├── Try action B → result B → score
└── Try action C → result C → score
   → pick best path; continue from there

Expensive (multiplies API calls) but sometimes the only way to handle adversarial or high-uncertainty problems.

Verification loops

Some tasks have a verifier — a way to check if the answer is correct (compiler, tests, schema validator). Use it inside the agent:

loop:
    answer = generate()
    if verify(answer):
        return answer
    else:
        feedback = verifier_message(answer)
        # add feedback to context, try again

The agent gets ground-truth feedback without a human in the loop. Used heavily by code agents (Claude Code, Cursor’s agent) and math agents.

Decomposition

For multi-part tasks, decompose into independent subtasks. Each subtask is a sub-agent or a function call:

Task: "Compare the security features of products A, B, C"
  ↓
Subtasks:
  - Get security features of A
  - Get security features of B
  - Get security features of C
  - Compare and summarize
  ↓
Each subtask runs (potentially in parallel); results aggregated.

Critical for keeping context windows manageable on big tasks.

Reasoning models change this

Modern reasoning models (Claude with extended thinking, o-series, R1) internalize much of this:

They plan before acting.
They reflect during reasoning.
They self-critique implicitly.

Asking a reasoning model to “make a plan, then execute” can be redundant — it does this naturally.

For non-reasoning models, explicit planning patterns still help.

When reasoning is overkill

For simple, narrow tool use (e.g. “look up the order status”), reasoning patterns are wasteful. Use them when:

Tasks have multiple plausible approaches.
Failure modes are subtle.
The agent needs to recover from errors gracefully.
The cost of a wrong answer is high.

Operational concerns

Loops

Reflection and planning patterns can trigger loops where the agent keeps “reflecting” without progress.

Mitigations:

Per-task max-iterations.
Detect repetitive plans/reflections.
Force progress: each loop iteration must do something (call a tool, output a deliverable).

Context cost

Plans, reflections, critiques all consume tokens. A heavily-reasoning agent on a 50-step task can use 100k+ tokens just for reasoning.