Quiet-STaR
Humans don’t just predict the next word — they think before speaking. Quiet-STaR teaches language models to generate internal rationales at every token position, creating a form of “inner speech” that improves both prediction accuracy and reasoning ability without requiring explicit reasoning prompts.
Introduced: Quiet-STaR was published in 2024 as a generalization of STaR. While STaR generates reasoning chains for specific question-answer pairs, Quiet-STaR trains the model to generate internal rationales at every token during general text prediction. The model learns when thinking helps (complex reasoning, factual claims) and when it doesn’t (simple next-word prediction). This creates a model that automatically “thinks” when thinking is useful — a form of learned metacognition.
Modern LLM Status: Quiet-STaR represents a paradigm shift in how reasoning is integrated into language models. Rather than relying on explicit “think step by step” prompts, models trained with Quiet-STaR develop the ability to reason internally when needed. This has influenced the design of modern “reasoning models” that activate chain-of-thought processing automatically for complex queries while responding quickly to simple ones — achieving the dual-process ideal without explicit prompting.
Teach the Model to Think Before Speaking
Standard language models predict the next token purely from the previous tokens. Quiet-STaR adds an internal reasoning step: at each position, the model can optionally generate a short “thought” that helps predict what comes next. During training, the model learns which positions benefit from internal reasoning (e.g., before a factual claim or logical conclusion) and which don’t.
The “quiet” in Quiet-STaR means these thoughts are internal. They improve the model’s predictions but aren’t shown to the user. The model develops its own inner monologue — thinking deeply when the situation demands it and responding immediately when the answer is straightforward.
Think of it like the difference between a student who blurts out answers instantly and one who pauses to think when a question is hard but answers quickly when it’s easy. Quiet-STaR teaches models that crucial skill of knowing when to think, not just how to think.
External CoT (prompt-based) requires the user to ask for reasoning. Internal reasoning (Quiet-STaR) happens automatically. This means every response benefits from thinking, not just the ones where the user remembered to say “think step by step.” The model develops judgment about when to think deeply and when to respond quickly.
The Quiet-STaR Process
Five stages from token prediction to internalized reasoning
Token-Level Thought Generation
At each position in the sequence, the model can generate a short internal rationale — a “thought” that exists between the current context and the next token prediction. These thoughts are generated in parallel across positions, making the process efficient despite the added computation.
Before predicting the word after “The capital of Australia is…” the model might internally generate: “Many people think Sydney, but the capital is actually Canberra” — then predict “Canberra.”
Thought-Augmented Prediction
The internal rationale is used alongside the original context to improve the next-token prediction. A mixing function blends the thought-augmented prediction with the standard prediction, allowing the model to rely on thoughts only when they actually help.
For “The cat sat on the…” the thought adds nothing useful, so the mixing weight is near zero. For “The integral of sin(x) is…” the thought substantially helps, so the mixing weight is high.
REINFORCE Training
Using the actual next tokens as ground truth, the model is trained with the REINFORCE algorithm to generate thoughts that improve prediction accuracy. Thoughts that lead to better next-token predictions are reinforced; unhelpful thoughts are gradually eliminated.
If thinking “this requires the chain rule” before a calculus token helps predict correctly, that thought pattern is strengthened. If a thought about calculus before a simple greeting adds nothing, it is weakened.
Selective Reasoning
Through training, the model learns to generate substantial thoughts at positions where thinking helps and minimal or empty thoughts where it doesn’t. This creates a natural “dual process” system: fast, automatic responses for simple predictions and slow, deliberate reasoning for complex ones.
The model generates rich internal thoughts before factual claims, mathematical operations, and logical conclusions — but essentially no thoughts before common phrases, greetings, or predictable continuations.
Generalization
The learned internal reasoning transfers to downstream tasks, improving reasoning without explicit prompting. A model trained with Quiet-STaR on general text prediction shows improved performance on math, logic, and factual accuracy benchmarks — without ever being explicitly trained on those tasks.
A Quiet-STaR model asked “What is 47 times 23?” internally reasons through the multiplication before producing the answer — no “think step by step” prompt needed. The reasoning is invisible but the accuracy is higher.
See the Difference
Why internal reasoning produces more reliable responses
Standard Prediction
Q: The element with atomic number 79 is commonly found in what type of geological formation?
A: Gold (atomic number 79) is commonly found in quartz veins and alluvial deposits.
Model predicts each token directly from context. Gets the answer right for well-known facts but struggles with complex multi-step reasoning without explicit prompting.
Quiet-STaR Prediction
Q: The element with atomic number 79 is commonly found in what type of geological formation?
[Internal: Atomic number 79 is gold. Gold forms in hydrothermal processes. Primary deposits are lode/vein deposits in quartz. Secondary deposits form through erosion into placer/alluvial deposits.]
A: Gold (atomic number 79) is primarily found in lode deposits within quartz veins formed by hydrothermal processes, and secondarily in placer deposits where erosion has concentrated gold particles in alluvial sediments along riverbeds.
Model internally reasons through the geological processes before predicting — the user sees only the improved answer, not the internal thought.
Natural Language Works Too
While structured frameworks and contextual labels are powerful tools, LLMs are exceptionally good at understanding natural language. As long as your prompt contains the actual contextual information needed to create, answer, or deliver the response you’re looking for — the who, what, why, and constraints — the AI can produce complete and accurate results whether you use a formal framework or plain conversational language. But even in 2026, with the best prompts, verifying AI output is always a necessary step.
Quiet-STaR in Action
See how internal reasoning improves model predictions across domains
A user asks: “Which country has the most UNESCO World Heritage Sites?”
Standard model: Predicts based on token frequency — might answer “Italy” or “China” depending on training data patterns.
Quiet-STaR model: Internally generates: “UNESCO sites — Italy and China are close. As of recent counts, Italy leads with 59 sites, China has 57. Need to verify which is current.” Then produces a more nuanced response acknowledging the close competition.
The Quiet-STaR model internally verifies facts before stating them, producing responses that acknowledge uncertainty where it exists rather than confidently asserting potentially outdated information. The user never sees the internal deliberation — only the improved accuracy.
A user asks: “If a store offers 30% off, and you have a coupon for an additional 20% off the sale price, what is the total discount?”
Standard model: Might respond “50%” by adding percentages naively.
Quiet-STaR model: Internally generates: “30% off first means 0.70 of original. Then 20% off sale price means 0.80 times 0.70 = 0.56 of original. Total discount is 1 - 0.56 = 0.44 = 44%.” Then responds with the correct 44% total discount.
Internal computation steps happen before producing the answer token. The model performs the multiplication internally rather than falling into the common trap of adding percentages. No “show your work” prompt was needed — the reasoning happened silently.
A user is writing a story where a character described as left-handed in chapter 1 needs to perform an action in chapter 5.
Standard model: Might write “She reached for the sword with her right hand” — contradicting the established detail.
Quiet-STaR model: Before generating the action, internally checks: “Character was established as left-handed. Actions should be consistent with left-hand dominance.” Then writes the scene with the correct hand.
Internal plot consistency checks before continuing a narrative. The model maintains character details across long contexts by reasoning about established facts before each significant action — catching continuity errors that would otherwise slip through.
When to Use Quiet-STaR
Best for building models with built-in reasoning capabilities
Perfect For
Training foundation models that reason automatically without requiring explicit chain-of-thought prompts from users.
When you want every user interaction to benefit from reasoning — not just the ones where someone remembers to ask for step-by-step thinking.
Enhancing a model’s overall prediction quality across all tasks — not just reasoning benchmarks but factual accuracy, consistency, and depth.
Studying how models can develop the ability to judge when to think deeply versus respond quickly — a form of computational metacognition.
Skip It When
Quiet-STaR is a training methodology — it requires modifying how the model is trained, not just how it is prompted.
Quiet-STaR’s thoughts are internal and invisible. If you need transparent, auditable reasoning trails, use explicit CoT or Self-Ask instead.
For straightforward autocompletion, form filling, or template generation — the overhead of internal reasoning provides minimal benefit.
Regulated industries that require explainable AI decisions need visible reasoning chains, not hidden internal thoughts.
Use Cases
Where Quiet-STaR delivers the most value
Foundation Model Training
Train next-generation language models with built-in reasoning capabilities that activate automatically, eliminating the need for explicit reasoning prompts from users.
Reasoning Model Development
Build models that match or exceed chain-of-thought performance without requiring explicit reasoning prompts — the model decides when and how deeply to reason.
Automatic Fact-Checking
Models that internally verify claims before stating them, reducing hallucination rates without requiring external fact-checking pipelines or explicit verification prompts.
Improved Code Generation
Code models that internally reason about edge cases, type safety, and algorithmic correctness before generating each line — producing more robust code without explicit prompting.
Better Translation
Translation models that internally consider context, idioms, and cultural nuance before producing each phrase — catching subtle meaning shifts that literal translation would miss.
Scientific Writing
Models that internally verify scientific claims, check unit consistency, and validate logical arguments before producing text — reducing errors in research assistance and technical writing.
Where Quiet-STaR Fits
Quiet-STaR bridges external reasoning prompts and fully internalized thinking
You can approximate Quiet-STaR’s behavior by asking the model to “Think through your reasoning internally, then provide only the final answer.” This encourages the model to reason before responding without cluttering the output with explicit chains. While not true internal reasoning, it captures the spirit of thinking before speaking.
Related Techniques
Explore complementary reasoning and self-improvement techniques
Think Before Speaking
Explore internal reasoning techniques or other advanced methods.