Self-Correction

CRITIC

Don't just trust AI outputs — verify them. CRITIC teaches AI to check its own work using external tools, catching errors that pure self-reflection misses.

Technique Context: 2023

Introduced: CRITIC (Large Language Models Can Self-Correct with Tool-Interactive Critiquing) was published in 2023 by Gou et al. It introduced the idea of using external tools — search engines, code interpreters, knowledge bases — to verify AI-generated claims rather than relying on self-reflection alone.

Modern LLM Status: The core principle of CRITIC — tool-augmented verification — is now a native capability of modern LLMs. Claude, GPT-4, and Gemini all support function calling and tool use through their APIs, enabling the generate-verify-revise loop without manual prompting. Understanding CRITIC remains valuable for designing effective agentic AI workflows where tool verification is built into the system architecture.

The Core Insight

AI Can't Fact-Check Itself — But Tools Can

When you ask an AI a factual question, it generates an answer from patterns in training data. If that data is wrong, outdated, or incomplete, the AI confidently produces incorrect output. Asking it to "double-check" just runs the same flawed process again.

CRITIC solves this by bringing in external verification. Instead of relying on the model's internal knowledge alone, the AI actively queries search engines, runs code, or checks databases to verify its own claims — then revises based on real evidence.

Think of it like a journalist who writes an article, then fact-checks every claim against primary sources before publishing.

Why It Matters

Standard self-correction methods (like Self-Refine) can only catch errors the model already "knows" about. CRITIC catches errors the model doesn't know it's making — factual mistakes, outdated information, and computational errors — by grounding claims in external truth.

The CRITIC Process

Five steps from draft to verified output

Generate Initial Output

The AI produces its best answer using internal knowledge. This is the draft — potentially accurate but unverified.

Example

"Marie Curie won the Nobel Prize in Physics in 1903 and the Nobel Prize in Chemistry in 1911, making her the first person to win two Nobel Prizes."

Identify Claims to Verify

The AI scans its output for verifiable claims — dates, numbers, names, relationships, and factual assertions that could be checked against external sources.

Claims Found

1) Nobel Prize in Physics in 1903 2) Nobel Prize in Chemistry in 1911 3) First person to win two Nobel Prizes

Query External Tools

For each claim, the AI selects the appropriate tool — a search engine for facts, a calculator for math, a code interpreter for logic — and runs a verification query.

Tool Queries

Search: "Marie Curie Nobel Prizes dates" → Confirms 1903 Physics, 1911 Chemistry. Search: "first person two Nobel Prizes" → Curie was the first, confirmed.

Critique Based on Evidence

The AI compares tool results against its original claims. It identifies any discrepancies, grades the severity of errors, and decides what needs revision.

Critique

All three claims verified as correct. No revision needed in this case. Confidence: HIGH.

Revise If Needed

If errors are found, the AI rewrites the specific incorrect portions while preserving correct content. The revision can trigger another verification cycle for complex outputs.

Key Insight

Unlike simple retry approaches, CRITIC preserves what's correct and only fixes what's wrong — making it efficient and targeted.

The Verification Toolkit

Different claims need different tools to verify

Search Engine

Verify factual claims, historical dates, named entities, current events, and real-world statistics against authoritative web sources.

Facts, dates, names, events

Code Interpreter

Execute code to verify mathematical calculations, algorithmic correctness, data transformations, and logical operations with exact results.

Math, logic, algorithms

Knowledge Base

Query structured databases and knowledge graphs to verify entity relationships, categorical data, and domain-specific information.

Relationships, entities, categories

See the Difference

Why tool verification beats self-reflection alone

Output

"Python was created by James Gosling in 1991 at Sun Microsystems."

Self-Check

"Let me reconsider... Actually, I'm fairly confident in this answer. Python was indeed created around 1991."

Same wrong knowledge, same wrong answer

Output

"Python was created by James Gosling in 1991 at Sun Microsystems."

Tool Verification

Search: "Python creator" → Guido van Rossum. Search: "James Gosling" → Created Java. Critique: Wrong person, wrong company. Revised: "Python was created by Guido van Rossum in 1991."

External truth catches what self-reflection cannot

Natural Language Works Too

While structured frameworks and contextual labels are powerful tools, LLMs are exceptionally good at understanding natural language. As long as your prompt contains the actual contextual information needed to create, answer, or deliver the response you’re looking for — the who, what, why, and constraints — the AI can produce complete and accurate results whether you use a formal framework or plain conversational language. But even in 2026, with the best prompts, verifying AI output is always a necessary step.

CRITIC in Action

Real-world scenarios showing the full verification loop

Fact-Checking a Historical Summary

Initial Output

"The Panama Canal was completed in 1914, connecting the Atlantic and Pacific Oceans across 48 miles. It was built primarily by the French under Ferdinand de Lesseps, who also built the Suez Canal."

CRITIC Verification

Claim 1: "Completed in 1914" → Search confirms: Correct.

Claim 2: "48 miles long" → Search: Actually approximately 50 miles (82 km). Error found.

Claim 3: "Built primarily by the French" → Search: The French started but abandoned the project. The United States completed it (1904-1914). Significant error.

Claim 4: "de Lesseps built the Suez Canal" → Search confirms: Correct.

Revised Output: "The Panama Canal was completed in 1914, connecting the Atlantic and Pacific Oceans across approximately 50 miles. Construction was started by the French under Ferdinand de Lesseps (who also built the Suez Canal) but was completed by the United States after France abandoned the project."

Verifying a Mathematical Claim

Initial Output

"If you invest $10,000 at 7% annual compound interest for 20 years, you'll have approximately $38,700."

CRITIC Verification

Tool Selected: Code Interpreter

Calculation: 10000 × (1.07)^20 = $38,696.84

Critique: The claim of "approximately $38,700" is accurate (difference of $3.16). No revision needed.

Verdict: Output verified as correct. Confidence: HIGH.

Cross-Referencing Technical Claims

Initial Output

"React was created by Facebook in 2013. It uses a virtual DOM for performance and requires JSX syntax. Angular, created by Google, was released the same year."

CRITIC Verification

Claim 1: "React created by Facebook in 2013" → Search: Open-sourced May 2013. Correct.

Claim 2: "Requires JSX syntax" → Search: JSX is recommended but not required. React works with plain JavaScript. Error: "requires" is inaccurate.

Claim 3: "Angular released the same year" → Search: AngularJS was released in 2010. Angular (v2+) was released in 2016. Significant error.

Revised Output: "React was created by Facebook and open-sourced in 2013. It uses a virtual DOM for performance and commonly uses JSX syntax (though it's not required). AngularJS, created by Google, was released in 2010, predating React by three years."

When to Use CRITIC

Best for outputs where factual accuracy is critical

Perfect For

Factual Content Generation

Articles, reports, or summaries where dates, names, and numbers must be accurate.

Technical Documentation

API references, configuration guides, and specs where one wrong detail causes real problems.

Data-Heavy Analysis

Financial calculations, statistical claims, or quantitative reasoning that benefits from computational verification.

High-Stakes Outputs

Legal, medical, or compliance content where errors have real consequences.

Skip It When

Creative Writing

Fiction, brainstorming, or creative tasks where there are no "facts" to verify.

Speed Over Accuracy

Quick brainstorms, rough drafts, or exploratory conversations where verification adds unnecessary overhead.

No Tool Access

When the AI system doesn't have access to search, code execution, or external databases.

Use Cases

Where CRITIC delivers the most value

Research Summaries

Verify dates, authors, findings, and citations in academic or business research before sharing with stakeholders.

Financial Modeling

Run calculations through a code interpreter to verify compound interest, projections, and statistical analysis.

Code Generation

Test generated code snippets by executing them, catching syntax errors, logic bugs, and incorrect API usage.

Educational Content

Ensure lessons, tutorials, and explanations contain accurate information that won't mislead learners.

Troubleshooting Guides

Verify that recommended solutions, version numbers, and configuration settings are current and correct.

Competitive Analysis

Cross-reference company data, market figures, and product claims against current public information.

Where CRITIC Fits

CRITIC sits in the self-correction family alongside complementary techniques

Self-Refine Internal Review Self-feedback loop

CRITIC Tool Verification External ground truth

CoVe Verification Chain Systematic fact-check

Reflexion Memory Learning Learn from mistakes

Combine Them

Use Self-Refine for style and structure improvements, then CRITIC for factual accuracy. Chain-of-Verification works well as a more structured version of CRITIC's verification step. Together, they cover both subjective quality and objective correctness.

Related Techniques

Explore complementary self-correction techniques

Self-Correction Self-Refine Iterative improvement through self-generated feedback — best for style, tone, and structural quality.

Verification Chain-of-Verification A structured four-step verification process — plan questions, answer independently, then revise the original.

Learning Reflexion Learn from past failures by storing reflections in memory — improving performance across multiple attempts.

Build Verified Prompts

Try CRITIC-style verification with our interactive tools or explore more self-correction frameworks.

Prompt Builder All Foundations

CRITIC

AI Can't Fact-Check Itself — But Tools Can

The CRITIC Process

Generate Initial Output

Identify Claims to Verify

Query External Tools

Critique Based on Evidence

Revise If Needed

The Verification Toolkit

Search Engine

Code Interpreter

Knowledge Base

See the Difference

Self-Reflection Only

CRITIC with Tools

Natural Language Works Too

CRITIC in Action

When to Use CRITIC

Perfect For

Skip It When

Use Cases

Research Summaries

Financial Modeling

Code Generation

Educational Content

Troubleshooting Guides

Competitive Analysis

Where CRITIC Fits

Related Techniques

Build Verified Prompts