Self-Correction Technique

STaR (Self-Taught Reasoner)

What if a model could teach itself to reason better? STaR (Self-Taught Reasoner) creates a self-improving loop: the model generates reasoning chains, keeps only the ones that lead to correct answers, and uses those successful chains as training data to become a better reasoner — bootstrapping its own reasoning ability from scratch.

Technique Context: 2022

Introduced: STaR was published in 2022 by Zelikman et al. It addresses a fundamental bootstrapping problem: getting high-quality reasoning demonstrations to train models requires either expensive human annotation or a model that can already reason well. STaR breaks this chicken-and-egg problem by having the model generate its own training data. It attempts problems, filters for correct answers, and fine-tunes on those successes. Through iterative rounds, reasoning quality improves exponentially — each generation of the model produces better training data for the next.

Modern LLM Status: STaR’s self-improvement paradigm has become foundational to modern AI training. The concept of using model-generated reasoning chains as training data underlies techniques like RLHF, DPO, and constitutional AI training. While the original paper focused on fine-tuning, the prompt-level insight — having models evaluate and learn from their own successful reasoning — applies broadly to any iterative prompting workflow where you want to accumulate and reuse effective reasoning patterns.

The Core Insight

Bootstrap Reasoning from Scratch

Traditional training requires human-written reasoning examples. STaR eliminates this bottleneck through a clever loop: (1) Attempt many problems with reasoning chains, (2) Keep only the chains that produce correct answers, (3) Fine-tune the model on those successful chains, (4) Repeat. With each iteration, the model generates higher-quality reasoning chains, which provide better training data, which produces an even better model.

The secret weapon is hindsight rationalization. For problems the model initially gets wrong, STaR provides the correct answer and asks the model to work backward — generating a “hindsight” rationale explaining how to reach that answer. This teaches from mistakes rather than just discarding them, dramatically accelerating improvement.

Think of it like a student who takes a practice test, reviews only the questions they got right to understand their best reasoning patterns, then also studies the answer key for missed questions to learn how those solutions work — becoming a stronger test-taker with each round.

Why Self-Generated Training Data Works

The key insight is selection pressure. By generating many reasoning attempts and keeping only the correct ones, STaR creates a curated dataset of successful reasoning patterns specific to the problem types the model encounters. This is more targeted than generic training data and scales without human annotation cost.

The STaR Process

Five stages from initial attempts to bootstrapped reasoning mastery

Generate Rationales

The model attempts a large set of problems, producing a reasoning chain (rationale) and a final answer for each. At this stage, many answers will be wrong — the model is reasoning with whatever ability it currently has, not yet benefiting from the improvement loop.

Example

Given 1,000 math problems, the model generates step-by-step solutions for each, arriving at correct answers for perhaps 400 of them.

Filter for Correctness

Compare each generated answer against the known correct answer. Keep only the reasoning chains that led to correct final answers. These represent the model’s best reasoning — the chains where its logic held together from start to finish.

Example

The 400 correct solutions are kept as high-quality training data. The 600 incorrect ones are set aside for the rationalization step.

Rationalization

For problems the model got wrong, provide the correct answer and ask the model to generate a new rationale that arrives at that answer. This “hindsight” rationalization creates additional training data from failures, teaching the model reasoning paths it could not initially find on its own.

Example

For a problem where the model answered “42” but the correct answer is “56,” the model is told the answer is 56 and asked to explain why — generating a valid reasoning chain it could not produce without the hint.

Fine-Tune

Train the model on the combined dataset: the successful rationales from step 2 plus the hindsight rationales from step 3. This gives the model exposure to correct reasoning patterns for a much larger portion of the problem set than it could solve on its own.

Example

The model is fine-tuned on 400 organic correct solutions plus 500 hindsight rationales, giving it 900 high-quality reasoning demonstrations from a set it originally could only solve 40% of.

Iterate

Repeat the entire process with the improved model. Because the model is now a better reasoner, it will correctly solve more problems in step 1, providing even higher-quality training data. Each iteration compounds the improvement — a true bootstrapping effect.

Example

Round 2: the improved model now solves 600 of 1,000 problems correctly. Round 3: 750 correct. Each generation produces better training data for the next, approaching mastery through self-improvement.

See the Difference

How iterative self-improvement transforms reasoning quality

Initial Reasoning Attempt

Q: If a train travels 120 miles in 2 hours, and another train travels 90 miles in 1.5 hours, which train is faster and by how much?

A: The first train goes 120/2 = 60 mph. The second train goes 90/1.5 = 45 mph. So the first train is faster by 15 mph.

Quality

Basic reasoning with correct arithmetic but minimal explanation. The model gets this one right but struggles on harder problems where steps are less obvious.

Shallow reasoning, no self-correction, fails on complex problems

Improved Reasoning Chain

Q: If a train travels 120 miles in 2 hours, and another train travels 90 miles in 1.5 hours, which train is faster and by how much?

A: To compare speeds, I need to calculate each train’s rate in the same units (mph).
Train 1: 120 miles / 2 hours = 60 mph.
Train 2: 90 miles / 1.5 hours = 60 mph.
Both trains travel at the same speed: 60 mph. The difference is 0 mph.

Quality

Structured reasoning with explicit unit normalization. The STaR-trained model catches the correct calculation (90/1.5 = 60, not 45) because it learned from verified reasoning chains.

Structured, verified reasoning learned from its own successful chains

Natural Language Works Too

While structured frameworks and contextual labels are powerful tools, LLMs are exceptionally good at understanding natural language. As long as your prompt contains the actual contextual information needed to create, answer, or deliver the response you’re looking for — the who, what, why, and constraints — the AI can produce complete and accurate results whether you use a formal framework or plain conversational language. But even in 2026, with the best prompts, verifying AI output is always a necessary step.

STaR in Action

See how self-taught reasoning bootstraps quality across domains

Math Problem Solving

The STaR Loop

Round 1: Model attempts 500 algebra problems. Solves 180 correctly with reasoning chains like: “To find x, I subtract 3 from both sides: 2x + 3 - 3 = 11 - 3, so 2x = 8, x = 4.”

Filter: Keep the 180 correct chains. For the 320 failures, provide correct answers and generate hindsight rationales.

Round 2: After fine-tuning, the model now solves 310 of the same 500 problems. Its reasoning chains are more structured and catch more edge cases.

Round 3: 420 correct. The model has learned to check its work and handle multi-step equations it previously could not.

Result

From 36% accuracy to 84% through self-improvement alone — no human-written solutions required. The model bootstrapped arithmetic reasoning by iteratively learning from its own successes.

Code Generation

The STaR Loop

Round 1: Model generates code solutions for 200 programming challenges. Only 60 pass all test cases. Those 60 include clear reasoning: “I need to iterate through the array, track the maximum, and handle the empty array edge case.”

Rationalization: For the 140 failures, provide passing solutions and have the model explain why they work — generating rationales like: “The key insight is using a hash map for O(1) lookups instead of nested loops.”

Fine-tune and repeat: Each round, more solutions pass tests, and the reasoning about algorithmic choices becomes more sophisticated.

Result

The model learns not just to write code that works, but to reason about why certain approaches are correct — choosing appropriate data structures, handling edge cases, and explaining trade-offs. Test pass rate climbs from 30% to over 75% across iterations.

Logical Deduction

The STaR Loop

Problem set: 300 logical deduction puzzles (e.g., “All A are B. Some B are C. Can we conclude that some A are C?”).

Round 1: Model solves 120 correctly. Correct chains show explicit premise identification and valid inference steps.

Rationalization: For incorrect attempts, the model generates hindsight explanations: “The error was assuming that because some B are C and all A are B, some A must be C. But the B that are C might not overlap with the B that are A.”

Iteration: By round 4, the model correctly identifies logical fallacies it previously committed.

Result

The model develops genuine logical reasoning patterns: distinguishing valid from invalid inferences, identifying common fallacies, and constructing step-by-step proofs. Accuracy improves from 40% to 80%+ through self-taught logical discipline.

When to Use STaR

Best for bootstrapping reasoning without human-labeled data

Perfect For

Improving Model Reasoning on Specific Problem Types

When you need a model to get better at math, logic, or code — STaR creates targeted training data from the model’s own successful attempts.

Bootstrapping from No Labeled Data

When human-annotated reasoning demonstrations are unavailable or too expensive — STaR generates its own training signal from correct/incorrect answer filtering.

Creating Domain-Specific Reasoning Training Data

Generating curated reasoning datasets for specialized fields where expert annotation is scarce — medical reasoning, legal analysis, scientific problem-solving.

Research on Self-Improvement in AI Systems

Studying how models can improve their own capabilities through iterative self-training — a foundational concept in AI alignment and capability research.

Skip It When

Using Models Through APIs

If you cannot fine-tune the model (API-only access), the core STaR loop of train-and-iterate cannot be applied directly.

Simple Tasks Not Requiring Reasoning

Classification, summarization, or extraction tasks where the bottleneck is understanding, not reasoning — STaR is designed for reasoning-heavy problems.

When High-Quality Human Annotations Are Available

If you already have expert-written reasoning demonstrations, supervised fine-tuning on those will likely outperform self-generated data.

Single-Query Applications

STaR requires multiple rounds of generation and training — it is a training methodology, not a single-prompt technique.

Use Cases

Where STaR delivers the most value

Training Data Generation

Automatically create high-quality reasoning demonstrations for model training without expensive human annotation — the model generates and curates its own examples.

Domain Adaptation

Adapt a general-purpose model to specialized domains (medical, legal, scientific) by bootstrapping domain-specific reasoning from problem sets with known answers.

Reasoning Enhancement

Systematically improve a model’s ability to reason through complex, multi-step problems by iteratively training on its own successful reasoning chains.

Educational AI

Build tutoring systems that improve their explanations over time by learning which reasoning approaches lead students to correct understanding.

Automated Tutoring

Create adaptive learning systems where the AI generates practice problems, evaluates its own solution attempts, and continuously improves its teaching ability.

Scientific Problem Solving

Bootstrap scientific reasoning by training on verified experimental results — each correct hypothesis-to-conclusion chain becomes training data for the next iteration.

Where STaR Fits

STaR bridges manual demonstrations and fully autonomous reasoning

Chain-of-Thought Manual Demonstrations Human-written reasoning examples

STaR Self-Generated Demonstrations Model creates its own training data

Quiet-STaR Internal Reasoning Thinking internalized into the model

Reflexion Episodic Self-Improvement Learning from memory across attempts

Apply STaR Principles Without Fine-Tuning

Even without fine-tuning access, you can apply STaR’s principles: generate multiple reasoning attempts for a problem, identify the successful ones, and use those as few-shot examples for future problems of the same type. This creates a growing library of verified reasoning patterns.

Related Techniques

Explore complementary self-improvement techniques

Foundation Chain-of-Thought The manual reasoning demonstration technique that STaR automates — instead of humans writing reasoning chains, the model generates and selects its own.

Evolution Reflexion Takes self-improvement further with episodic memory — the model reflects on past failures and stores lessons learned for future attempts.

Complement Self-Consistency Uses multiple reasoning paths at inference time to find the most reliable answer — a complementary approach to STaR’s training-time self-improvement.

Teach Yourself to Reason

Explore self-improvement techniques or other reasoning methods.

Prompt Builder All Techniques

STaR (Self-Taught Reasoner)

Bootstrap Reasoning from Scratch

The STaR Process

Generate Rationales

Filter for Correctness

Rationalization

Fine-Tune

Iterate

See the Difference

Before STaR

After STaR Iterations

Natural Language Works Too

STaR in Action

When to Use STaR

Perfect For

Skip It When

Use Cases

Training Data Generation

Domain Adaptation

Reasoning Enhancement

Educational AI

Automated Tutoring

Scientific Problem Solving

Where STaR Fits

Related Techniques

Teach Yourself to Reason