Hallucination Mitigation

A hallucination is a confident, fluent, plausible-looking statement that’s wrong. The hardest LLM failure mode in production. Hallucinations don’t trigger error handlers; they just lie.

Where hallucinations come from

Training data noise: facts in training were wrong; the model memorized them.
Statistical pattern over truth: the model picks the most plausible-sounding completion regardless of accuracy.
Knowledge cutoff: facts changed after training.
Composition errors: facts each correct individually; combination is wrong.
Refusal failure: model should have said “I don’t know,” but didn’t.
Prompt-induced: leading questions, biased phrasing.

Different causes need different fixes.

Frequency

Frontier models hallucinate less than they used to but still significantly:

Famous-figure biographies: low.
Recent obscure facts: medium.
Specialized technical details: medium-high.
Multi-hop reasoning: high.
Citations / references: notoriously high (made-up paper titles, wrong authors).

For fact-critical applications, don’t trust the model alone.

Mitigation strategies

In rough order of effectiveness:

1. Retrieval grounding (RAG)

Give the model the relevant source material; instruct it to use only that material.

Use only the following context to answer the question. If the context doesn't
contain the answer, say "I don't know."

Context: ...

Question: ...

Drops hallucination dramatically for fact-grounded tasks. The challenge: the model still may ignore the instruction and fall back on prior. Combat with:

Strong prompting.
Output verification (next item).
Models with good faithfulness (frontier > smaller).

2. Output verification

After generation, check the answer against the context.

LLM-as-judge:

Context: ...
Answer: ...

Is every claim in the answer supported by the context? List unsupported claims.

Specialized tools:

RAGAs faithfulness metric.
TrueLens / TruLens feedback functions.
Custom classifiers trained on labeled (context, claim, supported?) data.

If unsupported claims exist, regenerate or downgrade confidence.

3. Citation and source linking

Force the model to cite sources for each claim:

Output format: each claim ends with a source tag like [doc-3].

Then:

Validate that cited sources exist in retrieved context.
Flag uncited claims.
For users: show citations so they can verify.

A claim with a verified citation is much more trustworthy than one without.

4. Tool-grounded answers

For tasks with tools, require the model to use them:

Math? Use a calculator.
Code? Run it.
Lookups? Use a search tool.
Time-sensitive? Use a date tool.

The model can still misuse tools or misinterpret results, but it’s grounded in something verifiable.

5. Confidence calibration

Some patterns to extract self-reported confidence:

Answer the question. Then on a new line, rate your confidence on a scale of 1-5
and explain why.

Be skeptical: LLMs are often poorly calibrated (overconfident).
More reliable: ensemble — sample multiple answers, count agreement.
Self-consistency: high agreement → high confidence.

6. Refusal training / prompting

A well-calibrated model says “I don’t know” when appropriate. Encourage:

If you're not sure of the answer, say "I'm not certain. Possible options are
X, Y, Z, and the user may want to verify." Do not invent details.

Some fine-tuned models are explicitly trained on refusal — they say “I don’t know” more reliably.

7. Reasoning models for hard questions

Reasoning models (Stage 07) often catch their own errors during the thinking process. They re-check, retry, qualify.

For fact-critical tasks where speed isn’t the concern, a reasoning model can reduce hallucination compared to a non-reasoning model.

8. Human review for high-stakes

For medical, legal, financial, safety-critical: don’t ship unreviewed model output. Use humans in the loop:

Model drafts → human reviews → publish.
Model summarizes → human edits.
Model classifies → human handles disagreements.

Patterns that increase hallucination

Avoid these:

Vague prompts: “Tell me about X” with no scope. Constrain.
No “I don’t know” path: if there’s no escape valve, the model invents.
Pressure to be “helpful”: models trained for helpfulness err on the side of answering.
Long-tail entities in vague prompts: the model is unsure, but generates anyway.
Inferring causality from correlation: “X correlates with Y; X causes Y.”
Date-sensitive questions without grounding.

RAG-specific hallucination

Even with retrieval, RAG can hallucinate:

Retrieved-but-ignored: model retrieves correct doc, then answers from prior anyway.
Mixed sources: model blends prior knowledge with retrieved context indistinguishably.
Made-up sources: cites doc IDs that don’t exist.
Out-of-context facts: claims things not in retrieved context.

Layered defenses:

Strict prompts (“only use the provided context”).
Citation validation (every cited source exists).
Faithfulness checks (every claim supported).
Refusal default (“I don’t have information about that”).

Domain-specific concerns

Code

Common: invented API methods, wrong import paths, made-up libraries, out-of-date APIs.

Mitigations:

Use docs in context (RAG over API docs).
Run / type-check generated code; iterate on errors.
Use code-tuned models with recent training data.

Math

Common: arithmetic errors, wrong formulas, plausible-but-wrong derivations.

Mitigations:

Use a calculator tool.
Use reasoning models for multi-step problems.
Verify with symbolic libraries (SymPy) when possible.

Medical / legal / financial

Common: outdated standards, fabricated statutes, made-up dosages.

Mitigations:

Strict RAG over authoritative sources.
Mandatory human review.
Disclaimers + confidence indicators.
Specialized models trained on domain-licensed data.

Citations / references

Famously bad. Hallucinated paper titles, wrong authors, fake DOIs.

Mitigations:

Verify every citation exists (Crossref API, semanticscholar.org).
Use models that ground citations in retrieved sources.
Don’t ask LLMs for “papers about X” without retrieval.

Evaluating hallucination

Build into your eval (Stage 13):

Faithfulness rate: % of answers fully supported by sources.
Correctness on factual questions with known answers.
“I don’t know” rate on questions outside scope (should be high).
Citation accuracy: % of citations that resolve to real, relevant sources.

Track over time. A model upgrade can change hallucination characteristics.

Surfacing uncertainty to users

Sometimes the right output is “I’m not sure”:

Confidence indicators (“This is likely correct based on…” vs “I’m certain…”).
Disclaimers on edge-of-knowledge topics.
Easy “verify” links for fact-based answers.

Educated users tolerate uncertainty. They don’t tolerate confident wrongness.

What hallucinations don’t mean

Not bugs: the model is doing what it was trained to do — generate plausible text.
Not always model failure: sometimes the prompt induces them.
Not always preventable: some tasks have inherent uncertainty.

The goal isn’t zero hallucinations; it’s acceptable hallucination rates with appropriate grounding and disclosure.

Hallucination Mitigation

Where hallucinations come from

Frequency

Mitigation strategies

1. Retrieval grounding (RAG)

2. Output verification

3. Citation and source linking

4. Tool-grounded answers

5. Confidence calibration

6. Refusal training / prompting

7. Reasoning models for hard questions

8. Human review for high-stakes

Patterns that increase hallucination

RAG-specific hallucination

Domain-specific concerns

Code

Math

Medical / legal / financial

Citations / references

Evaluating hallucination

Surfacing uncertainty to users

What hallucinations don’t mean

See also