CRITIC
Don't just trust AI outputs — verify them. CRITIC teaches AI to check its own work using external tools, catching errors that pure self-reflection misses.
Introduced: CRITIC (Large Language Models Can Self-Correct with Tool-Interactive Critiquing) was published in 2023 by Gou et al. It introduced the idea of using external tools — search engines, code interpreters, knowledge bases — to verify AI-generated claims rather than relying on self-reflection alone.
Modern LLM Status: The core principle of CRITIC — tool-augmented verification — is now a native capability of modern LLMs. Claude, GPT-4, and Gemini all support function calling and tool use through their APIs, enabling the generate-verify-revise loop without manual prompting. Understanding CRITIC remains valuable for designing effective agentic AI workflows where tool verification is built into the system architecture.
AI Can't Fact-Check Itself — But Tools Can
When you ask an AI a factual question, it generates an answer from patterns in training data. If that data is wrong, outdated, or incomplete, the AI confidently produces incorrect output. Asking it to "double-check" just runs the same flawed process again.
CRITIC solves this by bringing in external verification. Instead of relying on the model's internal knowledge alone, the AI actively queries search engines, runs code, or checks databases to verify its own claims — then revises based on real evidence.
Think of it like a journalist who writes an article, then fact-checks every claim against primary sources before publishing.
Standard self-correction methods (like Self-Refine) can only catch errors the model already "knows" about. CRITIC catches errors the model doesn't know it's making — factual mistakes, outdated information, and computational errors — by grounding claims in external truth.
The CRITIC Process
Five steps from draft to verified output
Generate Initial Output
The AI produces its best answer using internal knowledge. This is the draft — potentially accurate but unverified.
"Marie Curie won the Nobel Prize in Physics in 1903 and the Nobel Prize in Chemistry in 1911, making her the first person to win two Nobel Prizes."
Identify Claims to Verify
The AI scans its output for verifiable claims — dates, numbers, names, relationships, and factual assertions that could be checked against external sources.
1) Nobel Prize in Physics in 1903 2) Nobel Prize in Chemistry in 1911 3) First person to win two Nobel Prizes
Query External Tools
For each claim, the AI selects the appropriate tool — a search engine for facts, a calculator for math, a code interpreter for logic — and runs a verification query.
Search: "Marie Curie Nobel Prizes dates" → Confirms 1903 Physics, 1911 Chemistry. Search: "first person two Nobel Prizes" → Curie was the first, confirmed.
Critique Based on Evidence
The AI compares tool results against its original claims. It identifies any discrepancies, grades the severity of errors, and decides what needs revision.
All three claims verified as correct. No revision needed in this case. Confidence: HIGH.
Revise If Needed
If errors are found, the AI rewrites the specific incorrect portions while preserving correct content. The revision can trigger another verification cycle for complex outputs.
Unlike simple retry approaches, CRITIC preserves what's correct and only fixes what's wrong — making it efficient and targeted.
The Verification Toolkit
Different claims need different tools to verify
Search Engine
Verify factual claims, historical dates, named entities, current events, and real-world statistics against authoritative web sources.
Facts, dates, names, eventsCode Interpreter
Execute code to verify mathematical calculations, algorithmic correctness, data transformations, and logical operations with exact results.
Math, logic, algorithmsKnowledge Base
Query structured databases and knowledge graphs to verify entity relationships, categorical data, and domain-specific information.
Relationships, entities, categoriesSee the Difference
Why tool verification beats self-reflection alone
Self-Reflection Only
"Python was created by James Gosling in 1991 at Sun Microsystems."
"Let me reconsider... Actually, I'm fairly confident in this answer. Python was indeed created around 1991."
CRITIC with Tools
"Python was created by James Gosling in 1991 at Sun Microsystems."
Search: "Python creator" → Guido van Rossum. Search: "James Gosling" → Created Java. Critique: Wrong person, wrong company. Revised: "Python was created by Guido van Rossum in 1991."
Natural Language Works Too
While structured frameworks and contextual labels are powerful tools, LLMs are exceptionally good at understanding natural language. As long as your prompt contains the actual contextual information needed to create, answer, or deliver the response you’re looking for — the who, what, why, and constraints — the AI can produce complete and accurate results whether you use a formal framework or plain conversational language. But even in 2026, with the best prompts, verifying AI output is always a necessary step.
CRITIC in Action
Real-world scenarios showing the full verification loop
"The Panama Canal was completed in 1914, connecting the Atlantic and Pacific Oceans across 48 miles. It was built primarily by the French under Ferdinand de Lesseps, who also built the Suez Canal."
Claim 1: "Completed in 1914" → Search confirms: Correct.
Claim 2: "48 miles long" → Search: Actually approximately 50 miles (82 km). Error found.
Claim 3: "Built primarily by the French" → Search: The French started but abandoned the project. The United States completed it (1904-1914). Significant error.
Claim 4: "de Lesseps built the Suez Canal" → Search confirms: Correct.
Revised Output: "The Panama Canal was completed in 1914, connecting the Atlantic and Pacific Oceans across approximately 50 miles. Construction was started by the French under Ferdinand de Lesseps (who also built the Suez Canal) but was completed by the United States after France abandoned the project."
"If you invest $10,000 at 7% annual compound interest for 20 years, you'll have approximately $38,700."
Tool Selected: Code Interpreter
Calculation: 10000 × (1.07)^20 = $38,696.84
Critique: The claim of "approximately $38,700" is accurate (difference of $3.16). No revision needed.
Verdict: Output verified as correct. Confidence: HIGH.
"React was created by Facebook in 2013. It uses a virtual DOM for performance and requires JSX syntax. Angular, created by Google, was released the same year."
Claim 1: "React created by Facebook in 2013" → Search: Open-sourced May 2013. Correct.
Claim 2: "Requires JSX syntax" → Search: JSX is recommended but not required. React works with plain JavaScript. Error: "requires" is inaccurate.
Claim 3: "Angular released the same year" → Search: AngularJS was released in 2010. Angular (v2+) was released in 2016. Significant error.
Revised Output: "React was created by Facebook and open-sourced in 2013. It uses a virtual DOM for performance and commonly uses JSX syntax (though it's not required). AngularJS, created by Google, was released in 2010, predating React by three years."
When to Use CRITIC
Best for outputs where factual accuracy is critical
Perfect For
Articles, reports, or summaries where dates, names, and numbers must be accurate.
API references, configuration guides, and specs where one wrong detail causes real problems.
Financial calculations, statistical claims, or quantitative reasoning that benefits from computational verification.
Legal, medical, or compliance content where errors have real consequences.
Skip It When
Fiction, brainstorming, or creative tasks where there are no "facts" to verify.
Quick brainstorms, rough drafts, or exploratory conversations where verification adds unnecessary overhead.
When the AI system doesn't have access to search, code execution, or external databases.
Use Cases
Where CRITIC delivers the most value
Research Summaries
Verify dates, authors, findings, and citations in academic or business research before sharing with stakeholders.
Financial Modeling
Run calculations through a code interpreter to verify compound interest, projections, and statistical analysis.
Code Generation
Test generated code snippets by executing them, catching syntax errors, logic bugs, and incorrect API usage.
Educational Content
Ensure lessons, tutorials, and explanations contain accurate information that won't mislead learners.
Troubleshooting Guides
Verify that recommended solutions, version numbers, and configuration settings are current and correct.
Competitive Analysis
Cross-reference company data, market figures, and product claims against current public information.
Where CRITIC Fits
CRITIC sits in the self-correction family alongside complementary techniques
Use Self-Refine for style and structure improvements, then CRITIC for factual accuracy. Chain-of-Verification works well as a more structured version of CRITIC's verification step. Together, they cover both subjective quality and objective correctness.
Related Techniques
Explore complementary self-correction techniques
Build Verified Prompts
Try CRITIC-style verification with our interactive tools or explore more self-correction frameworks.