Advanced Prompting Techniques
Beyond zero/few-shot and CoT, there’s a zoo of prompting patterns. Most are subsumed by reasoning models in 2026, but worth knowing — they still matter when reasoning models aren’t available, when you’re using a smaller model, or when you need to combine prompts programmatically.
Self-consistency
Already covered briefly in Few-shot & CoT: sample multiple CoT traces at high temperature, take the majority vote of final answers.
def self_consistency(prompt, n=10, temp=0.8):
answers = [extract(model(prompt, temperature=temp)) for _ in range(n)]
return Counter(answers).most_common(1)[0][0]
Strong on math/logic. Costs N× tokens.
ReAct (Reason + Act)
Yao et al. (2022). Interleave reasoning and tool use:
Question: ...
Thought: I need to find X.
Action: search("X")
Observation: <result>
Thought: Now I should ...
Action: ...
...
Final Answer: ...
The pattern that launched a thousand LangChain tutorials. Still the workhorse for agent loops, though modern approaches use native tool calling instead of textual Action: parsing.
Plan-and-Execute
Wang et al. (2023). Two stages:
- Plan: model produces a structured plan of subtasks.
- Execute: each subtask is solved (often by sub-agents or tool calls).
Plan:
1. Find the user's order
2. Determine refund eligibility
3. Process refund or escalate
Execute step 1: ...
Often more reliable than ReAct for complex multi-step tasks. Also more rigid — re-planning after a failure is more expensive.
Tree of Thoughts (ToT)
Yao et al. (2023). Frame problem-solving as tree search over CoT branches:
Root: "What's the optimal move?"
├── Thought A: Try M1
│ ├── Outcome: ...
│ └── Score: 0.7
├── Thought B: Try M2
│ ├── Outcome: ...
│ └── Score: 0.3
└── Continue from highest-scored branch
Used for puzzle-solving and games where lookahead helps. Expensive — explores many branches.
Reflexion / self-reflection
Shinn et al. (2023). After failing a task, the model generates a reflection (“here’s what went wrong, here’s what I’ll do differently”) and stores it in memory. Future attempts use the reflections.
Attempt 1: <output> → fail
Reflection: I assumed the API returns objects but it returns lists.
Attempt 2 (with reflection in context): <better output>
Useful when you have a feedback signal (test results, user thumbs-up/down). Also the underpinning of some agent memory designs (Stage 11).
Step-Back Prompting
Zheng et al. (2023). Before answering a specific question, ask the model to identify the general principle:
Question: At what temperature does water boil at 0.5 atm?
Step back: What is the general principle for boiling point at different pressures?
Now apply: ...
Helps the model retrieve relevant knowledge before drilling into specifics. Surprisingly effective for math, physics, fact-heavy domains.
Least-to-Most prompting
Zhou et al. (2022). Decompose hard problems into easy → hard subproblems:
Problem: <hard question>
Subproblems:
1. What's a small piece I can solve first?
2. Given the answer to (1), what's next?
...
Useful when problem decomposition is itself the hard part.
Skeleton-of-Thought
Ning et al. (2023). Generate a skeleton answer first, then expand each section in parallel:
Skeleton:
- Point 1: ...
- Point 2: ...
- Point 3: ...
Expand:
[Parallel expansions of each point]
Reduces latency for long-form output (parallel calls) and improves coherence.
Generated Knowledge
Liu et al. (2021). Before answering, generate relevant facts; then condition on them:
Question: <Q>
First, generate relevant background knowledge:
1. ...
2. ...
Now answer using that knowledge: ...
Boosts performance on knowledge-heavy tasks at the cost of prompt length.
RAG-as-prompting
A bridge to Stage 09: retrieve relevant docs, inject as context, instruct the model to ground answers in them.
Use the following context to answer the question. If the context doesn't contain
the answer, say "I don't know."
<context>
{retrieved_chunks}
</context>
Question: {user_question}
The structure of the prompt — explicit context tags, “don’t know” instruction — matters a lot for grounding quality.
Prompt chaining
Pass the output of one prompt as input to another:
Step 1: Summarize this document
↓
Step 2: Extract action items from the summary
↓
Step 3: Format them as a JSON list
Often more reliable than asking for everything in one prompt. Each step has a focused job.
Frameworks like DSPy, Promptflow, LangGraph automate this; raw Python works fine for most cases.
Multi-persona / debate
Have the model role-play multiple experts who debate before answering:
Persona A (skeptic): The proposed solution has these flaws...
Persona B (proponent): Counterpoints: ...
Persona C (judge): Weighing both sides, the verdict is ...
Works for ambiguous decisions, judgment calls. Sometimes helpful, sometimes theatrical.
Constitutional / principles-based
Anthropic’s approach: provide a set of principles (“be helpful, harmless, honest”) and have the model self-critique drafts.
Draft response: ...
Critique: Does this violate any of these principles: [list]?
Revised response: ...
Useful for safety and alignment. Built into Claude’s training; you can recreate the pattern in prompts for any model.
When to use which
| Pattern | Best for | Cost |
|---|---|---|
| Few-shot | Format reliability | Low |
| CoT | Multi-step reasoning | Medium |
| Self-consistency | High-confidence answers on hard problems | N× CoT |
| ReAct | Tool-using agents | Medium |
| Plan-and-execute | Complex multi-stage workflows | Medium |
| Tree of Thoughts | Search-over-reasoning problems | High |
| Reflexion | Iterative improvement with feedback | Variable |
| Step-back | Knowledge-grounded questions | Low |
| Generated knowledge | Knowledge-heavy with no RAG | Low–medium |
| Prompt chaining | Multi-stage pipelines | Sum of stages |
In the reasoning-model era
Modern reasoning models (Claude with thinking, o-series, R1) internalize most of these patterns. Asking a reasoning model to “think step by step” or use ToT is often unnecessary or counterproductive — it does its own thing.
But for non-reasoning models, smaller models, or strict cost constraints, these patterns are still your toolkit.
Practical advice
- Don’t over-engineer. Start with zero-shot. Add few-shot. Add CoT. Stop when it works.
- Measure on a real eval set. Most “prompt engineering” advice is folklore; your task is unique.
- Mix and match. Few-shot + CoT + self-consistency is a fine combination.
- Cache aggressively with multi-stage chains.
- Use frameworks (DSPy, Promptflow) for complex chains — easier than wiring it yourself.