Advanced Prompting Techniques

Beyond zero/few-shot and CoT, there’s a zoo of prompting patterns. Most are subsumed by reasoning models in 2026, but worth knowing — they still matter when reasoning models aren’t available, when you’re using a smaller model, or when you need to combine prompts programmatically.

Self-consistency

Already covered briefly in Few-shot & CoT: sample multiple CoT traces at high temperature, take the majority vote of final answers.

def self_consistency(prompt, n=10, temp=0.8):
    answers = [extract(model(prompt, temperature=temp)) for _ in range(n)]
    return Counter(answers).most_common(1)[0][0]

Strong on math/logic. Costs N× tokens.

ReAct (Reason + Act)

Yao et al. (2022). Interleave reasoning and tool use:

Question: ...
Thought: I need to find X.
Action: search("X")
Observation: <result>
Thought: Now I should ...
Action: ...
...
Final Answer: ...

The pattern that launched a thousand LangChain tutorials. Still the workhorse for agent loops, though modern approaches use native tool calling instead of textual Action: parsing.

Plan-and-Execute

Wang et al. (2023). Two stages:

  1. Plan: model produces a structured plan of subtasks.
  2. Execute: each subtask is solved (often by sub-agents or tool calls).
Plan:
1. Find the user's order
2. Determine refund eligibility
3. Process refund or escalate

Execute step 1: ...

Often more reliable than ReAct for complex multi-step tasks. Also more rigid — re-planning after a failure is more expensive.

Tree of Thoughts (ToT)

Yao et al. (2023). Frame problem-solving as tree search over CoT branches:

Root: "What's the optimal move?"
├── Thought A: Try M1
│   ├── Outcome: ...
│   └── Score: 0.7
├── Thought B: Try M2
│   ├── Outcome: ...
│   └── Score: 0.3
└── Continue from highest-scored branch

Used for puzzle-solving and games where lookahead helps. Expensive — explores many branches.

Reflexion / self-reflection

Shinn et al. (2023). After failing a task, the model generates a reflection (“here’s what went wrong, here’s what I’ll do differently”) and stores it in memory. Future attempts use the reflections.

Attempt 1: <output> → fail
Reflection: I assumed the API returns objects but it returns lists.
Attempt 2 (with reflection in context): <better output>

Useful when you have a feedback signal (test results, user thumbs-up/down). Also the underpinning of some agent memory designs (Stage 11).

Step-Back Prompting

Zheng et al. (2023). Before answering a specific question, ask the model to identify the general principle:

Question: At what temperature does water boil at 0.5 atm?

Step back: What is the general principle for boiling point at different pressures?

Now apply: ...

Helps the model retrieve relevant knowledge before drilling into specifics. Surprisingly effective for math, physics, fact-heavy domains.

Least-to-Most prompting

Zhou et al. (2022). Decompose hard problems into easy → hard subproblems:

Problem: <hard question>

Subproblems:
1. What's a small piece I can solve first?
2. Given the answer to (1), what's next?
...

Useful when problem decomposition is itself the hard part.

Skeleton-of-Thought

Ning et al. (2023). Generate a skeleton answer first, then expand each section in parallel:

Skeleton:
- Point 1: ...
- Point 2: ...
- Point 3: ...

Expand:
[Parallel expansions of each point]

Reduces latency for long-form output (parallel calls) and improves coherence.

Generated Knowledge

Liu et al. (2021). Before answering, generate relevant facts; then condition on them:

Question: <Q>

First, generate relevant background knowledge:
1. ...
2. ...

Now answer using that knowledge: ...

Boosts performance on knowledge-heavy tasks at the cost of prompt length.

RAG-as-prompting

A bridge to Stage 09: retrieve relevant docs, inject as context, instruct the model to ground answers in them.

Use the following context to answer the question. If the context doesn't contain
the answer, say "I don't know."

<context>
{retrieved_chunks}
</context>

Question: {user_question}

The structure of the prompt — explicit context tags, “don’t know” instruction — matters a lot for grounding quality.

Prompt chaining

Pass the output of one prompt as input to another:

Step 1: Summarize this document

Step 2: Extract action items from the summary

Step 3: Format them as a JSON list

Often more reliable than asking for everything in one prompt. Each step has a focused job.

Frameworks like DSPy, Promptflow, LangGraph automate this; raw Python works fine for most cases.

Multi-persona / debate

Have the model role-play multiple experts who debate before answering:

Persona A (skeptic): The proposed solution has these flaws...
Persona B (proponent): Counterpoints: ...
Persona C (judge): Weighing both sides, the verdict is ...

Works for ambiguous decisions, judgment calls. Sometimes helpful, sometimes theatrical.

Constitutional / principles-based

Anthropic’s approach: provide a set of principles (“be helpful, harmless, honest”) and have the model self-critique drafts.

Draft response: ...
Critique: Does this violate any of these principles: [list]?
Revised response: ...

Useful for safety and alignment. Built into Claude’s training; you can recreate the pattern in prompts for any model.

When to use which

PatternBest forCost
Few-shotFormat reliabilityLow
CoTMulti-step reasoningMedium
Self-consistencyHigh-confidence answers on hard problemsN× CoT
ReActTool-using agentsMedium
Plan-and-executeComplex multi-stage workflowsMedium
Tree of ThoughtsSearch-over-reasoning problemsHigh
ReflexionIterative improvement with feedbackVariable
Step-backKnowledge-grounded questionsLow
Generated knowledgeKnowledge-heavy with no RAGLow–medium
Prompt chainingMulti-stage pipelinesSum of stages

In the reasoning-model era

Modern reasoning models (Claude with thinking, o-series, R1) internalize most of these patterns. Asking a reasoning model to “think step by step” or use ToT is often unnecessary or counterproductive — it does its own thing.

But for non-reasoning models, smaller models, or strict cost constraints, these patterns are still your toolkit.

Practical advice

  1. Don’t over-engineer. Start with zero-shot. Add few-shot. Add CoT. Stop when it works.
  2. Measure on a real eval set. Most “prompt engineering” advice is folklore; your task is unique.
  3. Mix and match. Few-shot + CoT + self-consistency is a fine combination.
  4. Cache aggressively with multi-stage chains.
  5. Use frameworks (DSPy, Promptflow) for complex chains — easier than wiring it yourself.

See also