In-Context Learning Technique

Active Example Selection

Not all demonstrations are created equal. Active Example Selection watches where the model stumbles, then retrieves the specific examples most likely to resolve that uncertainty — turning static few-shot prompting into a dynamic, query-aware feedback loop.

Technique Context: 2023

Introduced: Active Example Selection builds on decades of active learning research (Settles, 2009) and adapts those principles specifically for in-context demonstration selection in large language models. Rather than treating few-shot examples as a fixed preamble, this approach emerged from the recognition that different queries benefit from fundamentally different demonstrations. By 2023, researchers had formalized methods for dynamically selecting in-context examples based on model uncertainty, query characteristics, and task-specific performance signals.

Modern LLM Status: Active Example Selection is an active technique that applies active learning principles to the selection of in-context demonstrations. Rather than statically choosing examples, the approach iteratively selects demonstrations based on the model’s current uncertainty or errors. When the model is uncertain about a particular input, the system retrieves examples most likely to resolve that uncertainty. This creates a feedback loop where example selection is tailored to each specific query, improving efficiency and accuracy compared to fixed demonstration sets. The technique is particularly relevant for production systems that handle diverse queries requiring different types of demonstrations.

The Core Insight

Let the Model’s Weakness Guide Your Examples

Active Example Selection dynamically chooses which few-shot demonstrations to show the model based on where the model is currently struggling. Instead of using a fixed set of examples for all queries, it identifies the model’s weak points and selects demonstrations that specifically address those weaknesses. The result is a prompt that is precisely calibrated to each individual query rather than generically assembled for all possible inputs.

This is analogous to a tutor who watches a student fail a particular type of problem and then provides a worked example of exactly that problem type — targeted instruction rather than generic review. A math tutor does not re-explain addition when the student is stuck on long division. Similarly, Active Example Selection does not waste context window space on demonstrations the model already handles well.

The approach transforms few-shot prompting from a static configuration step into a responsive, real-time process. Each query triggers a fresh assessment of what the model needs to see, and the demonstration set is assembled accordingly. This closes the gap between generic prompting and fully fine-tuned models by making the most of every token in the context window.

Why Dynamic Selection Outperforms Fixed Demonstrations

Fixed example sets are a compromise — they try to cover the broadest range of scenarios but inevitably miss the specific nuances any given query demands. A set optimized for common cases will fail on edge cases, while a set loaded with edge cases wastes tokens on the routine majority. Active Example Selection eliminates this trade-off entirely by assembling a bespoke demonstration set for every single query, ensuring the context window always contains the most informative examples possible.

The Active Example Selection Process

Four stages from uncertainty detection to tailored demonstration

1

Maintain an Example Pool

Curate a large bank of labeled input-output pairs covering the full task distribution. This pool serves as the reservoir from which demonstrations are dynamically drawn. The richer and more diverse the pool, the more precisely the system can match examples to any given query’s needs.

Example

A customer support classification system maintains 500 labeled tickets spanning billing inquiries, technical issues, cancellation requests, feature requests, and account recovery — each with its correct category label and reasoning.

2

Assess Model Uncertainty

For a given query, evaluate the model’s confidence or uncertainty. This can be measured through output probability distributions, consistency across multiple generations, or entropy of the predicted label distribution. High uncertainty signals that the model needs targeted help to resolve this particular input.

Example

The model receives “I need to change the card on file, but if this doesn’t fix the charge issue I want to close my account.” It assigns 38% probability to “billing inquiry” and 35% to “cancellation request” — high entropy indicates genuine ambiguity.

3

Select Targeted Examples

Choose examples from the pool that are most relevant to the model’s uncertainty region. Prioritize examples that are semantically similar to the query, represent the uncertain label classes, or cover the specific edge case the model is struggling with. The selection algorithm balances similarity, diversity, and informational value.

Example

The system retrieves three examples from the pool: one where a payment method change was correctly labeled as “billing inquiry,” one where a conditional cancellation threat was labeled “cancellation request,” and one borderline case where both topics appeared but billing was primary — demonstrating the distinguishing signals.

4

Generate with Tailored Context

Present the dynamically selected examples as in-context demonstrations, then pose the query. The model now has demonstrations specifically chosen to address its weaknesses on this particular input. The result is a response informed by the most relevant possible context rather than generic, one-size-fits-all demonstrations.

Example

With the three targeted examples in context, the model now correctly classifies the ticket as “billing inquiry” with 82% confidence — the borderline example showed that when the primary action requested is a payment method change, the billing label takes precedence even when cancellation is mentioned conditionally.

See the Difference

Why query-aware demonstrations outperform fixed example sets

Static Example Selection

Fixed Demonstrations

Same three examples used for every query: a clear billing ticket, a clear technical issue, and a clear feature request. These were chosen once during system setup and never change regardless of input.

Result on Ambiguous Input

Model classifies the mixed billing-and-cancellation ticket as “cancellation request” with 52% confidence. The fixed examples provided no guidance on distinguishing overlapping categories.

One-size-fits-all approach wastes context on irrelevant demonstrations
VS

Active Example Selection

Dynamic Demonstrations

System detects uncertainty between “billing” and “cancellation,” then retrieves: (1) a payment-method-change ticket labeled billing, (2) a conditional cancellation threat labeled cancellation, (3) a borderline case where billing was primary despite cancellation language.

Result on Same Input

Model correctly classifies as “billing inquiry” with 82% confidence. The targeted examples resolved the exact ambiguity the model was facing.

Each query receives the most helpful demonstrations for its specific challenges

Natural Language Works Too

While structured frameworks and contextual labels are powerful tools, LLMs are exceptionally good at understanding natural language. As long as your prompt contains the actual contextual information needed to create, answer, or deliver the response you’re looking for — the who, what, why, and constraints — the AI can produce complete and accurate results whether you use a formal framework or plain conversational language. But even in 2026, with the best prompts, verifying AI output is always a necessary step.

Active Example Selection in Action

See how uncertainty-driven demonstration selection resolves real challenges

The Challenge

A customer support system receives the message: “My subscription renewed but I thought I cancelled it last month. Can you check what happened and reverse the charge? If this keeps happening I’m done with the service.” The model is uncertain whether this is a billing dispute, a cancellation request, or an account inquiry.

Active Selection Response

Uncertainty detected: The model assigns 30% to “billing dispute,” 28% to “cancellation request,” and 25% to “account inquiry” — near-uniform distribution across three categories.

Examples retrieved: The system pulls three demonstrations from the pool: (1) a ticket where a customer disputed an unexpected charge but explicitly wanted to keep the service — labeled “billing dispute,” (2) a ticket where a customer mentioned a past cancellation attempt and demanded the account be closed — labeled “cancellation request,” (3) a ticket where a customer asked about a renewal error but expressed frustration without a firm exit intent — labeled “billing dispute.”

Result: With these targeted examples in context, the model correctly identifies the primary intent as “billing dispute” because the customer’s core request is to reverse a charge and investigate, while the cancellation language is conditional (“if this keeps happening”) rather than definitive.

The Challenge

A data extraction pipeline encounters the date string “3rd Quarter FY2024 (Oct-Dec 2023)” in a financial document. The model needs to extract a standardized date range, but the fiscal year offset and informal quarter notation create ambiguity that standard date-parsing examples do not cover.

Active Selection Response

Uncertainty detected: The model generates inconsistent outputs across three attempts — “2024-07 to 2024-09,” “2023-10 to 2023-12,” and “2024-10 to 2024-12” — revealing confusion between calendar quarters and fiscal year offsets.

Examples retrieved: The system retrieves demonstrations showing: (1) a fiscal year date with explicit calendar mapping where the correct output used the parenthetical calendar dates, (2) a quarter notation with a fiscal year offset where “Q3 FY2025” mapped to January through March 2025, (3) a case where parenthetical clarification overrode the fiscal year label as the authoritative date range.

Result: The model correctly extracts “2023-10-01 to 2023-12-31” by following the pattern established in the retrieved examples: when explicit calendar dates appear in parentheses, those take precedence over fiscal year quarter calculations.

The Challenge

An automated code review system needs to categorize a pull request that renames a function from processData to validateAndTransformInput, updates its internal logic to add input validation, and fixes a null-pointer exception in the process. The model is uncertain whether this is a “refactor,” a “bug fix,” or a “feature enhancement.”

Active Selection Response

Uncertainty detected: The model assigns 34% to “refactor,” 33% to “bug fix,” and 28% to “feature enhancement” — a three-way split with no clear winner.

Examples retrieved: The system selects: (1) a PR that only renamed functions and reorganized code with no behavior change — labeled “refactor,” (2) a PR that renamed a function but also fixed an edge case crash in the same function — labeled “bug fix” because the behavioral fix was the primary motivation, (3) a PR that added input validation to an existing function without fixing any known bug — labeled “feature enhancement.”

Result: The model correctly categorizes the PR as “bug fix” based on the pattern from the retrieved examples: when a rename and logic change accompany a fix for a known defect (the null-pointer exception), the bug fix label takes priority because the defect resolution is the motivating change.

When to Use Active Example Selection

Best for systems where query diversity demands adaptive demonstrations

Perfect For

Production Systems with Diverse Queries

When the same system handles billing questions, technical issues, and account management — each query type benefits from fundamentally different demonstrations.

Tasks with Many Edge Cases

Domains where unusual inputs appear frequently — date formats, code patterns, legal language — and a fixed example set cannot anticipate every variant.

High-Accuracy Requirements

When generic examples leave performance gaps that matter — medical triage, financial classification, or content moderation where errors have real consequences.

Large Labeled Data Pools

Systems with hundreds or thousands of labeled examples available for retrieval — the larger the pool, the more precisely the system can match demonstrations to each query.

Skip It When

Simple, Uniform Tasks

When the task is straightforward enough that the same fixed examples work consistently across all queries — sentiment analysis on product reviews, for instance.

No Retrieval Infrastructure

When there is no infrastructure for real-time example retrieval and uncertainty estimation — the technique requires embedding search, confidence scoring, and dynamic prompt assembly.

Small Example Pools

When fewer than 50 labeled examples are available, dynamic selection offers little advantage over simply including all of them or manually curating a representative subset.

Use Cases

Where Active Example Selection delivers the most value

Production Classification Systems

Route incoming tickets, documents, or messages to the correct category by selecting demonstrations that address the specific ambiguity each input presents, rather than relying on generic examples.

Adaptive Tutoring

When a student submits an answer, identify the specific misconception and retrieve worked examples that address that exact misunderstanding, creating a personalized learning experience.

Real-Time Content Moderation

When the model is uncertain whether content violates a policy, retrieve examples of borderline cases in the same category to sharpen the distinction between acceptable and prohibited content.

Dynamic Code Review

Categorize pull requests by retrieving examples of similar code changes that were correctly labeled, especially for borderline cases that blend refactoring, bug fixes, and feature additions.

Medical Triage

When symptom descriptions are ambiguous, retrieve case examples with similar presentations but different diagnoses to help the model distinguish between conditions that share overlapping symptoms.

Multilingual Query Handling

When processing queries that mix languages or use culturally specific idioms, retrieve demonstrations in the relevant language combination to help the model handle code-switching and cultural context accurately.

Where Active Example Selection Fits

Active Example Selection bridges static curation and full knowledge-base integration

Few-Shot Learning Fixed Examples Same demonstrations for every query
Example Selection Curated Examples Manually chosen per task type
Active Example Selection Uncertainty-Driven Examples Dynamically selected per query
RAG Full Knowledge Base Retrieval from entire document stores
The Uncertainty Feedback Loop

Active Example Selection is most powerful when deployed as a continuous loop rather than a one-shot process. As the system handles more queries, it can log which examples most effectively resolved uncertainty for similar inputs. Over time, the selection algorithm becomes increasingly precise — learning not just which examples are semantically similar, but which examples actually change the model’s behavior in the desired direction. This creates a self-improving pipeline where example selection quality compounds with usage.

Target Your Demonstrations

Build query-aware demonstration pipelines or explore our prompting tools to design more effective in-context learning strategies.