In-Context Learning

Few-Shot Learning

Instead of describing the task you want done, show it. Provide a handful of input-output examples directly in the prompt, and the model learns the pattern on the fly — no fine-tuning, no training data, no code required.

Technique Context: 2020

Introduced: Few-Shot Learning was popularized by Brown et al. in the landmark 2020 GPT-3 paper “Language Models are Few-Shot Learners.” The paper demonstrated that sufficiently large language models could learn entirely new tasks from just a handful of examples placed in the prompt — without any gradient updates or fine-tuning. This was a paradigm shift: previously, adapting a model to a new task required collecting labeled datasets and retraining. GPT-3 showed that 2–5 demonstrations were often enough to match or exceed fine-tuned baselines on dozens of NLP benchmarks.

Modern LLM Status: Few-Shot Learning remains the single most widely used prompting technique across all LLM applications. Every major model — Claude, GPT-4, Gemini, Llama — excels at learning from in-context examples. While modern models are increasingly capable at zero-shot tasks (following instructions without examples), few-shot prompting consistently delivers more precise, format-compliant, and reliable outputs. It is the default strategy recommended by virtually every LLM provider’s documentation, and the foundation upon which more advanced techniques like Example Selection and Chain-of-Thought are built.

The Core Insight

Show, Don’t Tell

Describing exactly what you want from a language model is surprisingly hard. You might specify the format, the tone, and the level of detail — and still get output that misses the mark. The problem is that natural language instructions are inherently ambiguous. “Be concise” means different things to different people, and to different models.

Few-Shot Learning sidesteps this ambiguity entirely. Instead of describing the desired output, you demonstrate it. You provide 2–5 concrete examples of input-output pairs, and the model reverse-engineers the pattern: the format, the reasoning style, the level of detail, the edge-case handling — all communicated implicitly through examples rather than explicit instructions.

Think of it like training a new employee. You could hand them a 20-page style guide, or you could show them three finished pieces and say “like this.” The examples communicate thousands of implicit decisions that would be exhausting to spell out in words.

Why Examples Beat Instructions

A single example implicitly encodes dozens of decisions: output length, vocabulary level, formatting choices, what to include, what to omit, how to handle ambiguity. When you write “Summarize this article in a professional tone,” the model must interpret every word. When you show three summaries you’ve already written, the model can see exactly what “professional” and “summary” mean to you. This is why few-shot prompting consistently outperforms zero-shot instructions for tasks requiring specific formatting or style.

How Few-Shot Learning Works

Three steps from examples to accurate output

1

Craft Your Demonstrations

Select 2–5 representative input-output pairs that illustrate the task you want the model to perform. Each example should show a clear mapping from input to desired output. Choose examples that cover the range of variations the model might encounter — different categories, edge cases, or difficulty levels.

Example

For a sentiment classification task, include one clearly positive review, one clearly negative review, and one nuanced or mixed review to show how you handle ambiguity.

2

Structure the Prompt

Place your examples in the prompt using a consistent format. Each example follows the same input-output structure, clearly labeled so the model can distinguish between the demonstration pairs and the actual query. Consistency is critical — the model uses the repeated pattern to understand what it should produce.

Example

Review: “The battery life is incredible and the screen is gorgeous.”
Sentiment: Positive

Review: “Broke after two weeks. Complete waste of money.”
Sentiment: Negative

Review: [your actual input here]
Sentiment:

3

Model Completes the Pattern

The model reads your examples, infers the underlying task and format, and applies the same pattern to your new input. It generates output that matches the style, format, and reasoning demonstrated in your examples. The more consistent and clear your examples, the more accurately the model reproduces the pattern.

Example

Given the structured examples above, the model outputs a single-word sentiment label that matches the format of the demonstrations — no extra explanation, no hedging, just the classification you showed it how to produce.

See the Difference

Why a few examples dramatically improve output quality

Zero-Shot

Prompt

Classify the following customer feedback as “Bug Report,” “Feature Request,” or “General Feedback.”

Feedback: “It would be great if you added dark mode to the mobile app.”

Response

This feedback is a Feature Request. The customer is expressing a desire for a new feature (dark mode) to be added to the mobile application. This isn’t reporting a bug or providing general feedback, but rather suggesting a specific enhancement they’d like to see implemented.

Correct label but verbose — format unpredictable, hard to parse programmatically
VS

Few-Shot

Prompt with Examples

Feedback: “The app crashes every time I try to upload a photo.”
Category: Bug Report

Feedback: “Love the new update! Everything feels so much smoother.”
Category: General Feedback

Feedback: “Can you add support for CSV file exports?”
Category: Feature Request

Feedback: “It would be great if you added dark mode to the mobile app.”
Category:

Response

Feature Request

Clean, consistent format — parseable output, matches demonstrated pattern exactly

Natural Language Works Too

While structured frameworks and contextual labels are powerful tools, LLMs are exceptionally good at understanding natural language. As long as your prompt contains the actual contextual information needed to create, answer, or deliver the response you’re looking for — the who, what, why, and constraints — the AI can produce complete and accurate results whether you use a formal framework or plain conversational language. But even in 2026, with the best prompts, verifying AI output is always a necessary step.

Few-Shot Learning in Action

See how examples teach the model different tasks

Few-Shot Prompt

Email: “Hi, I noticed I was charged twice for my last order #4521. Could you please refund the duplicate?”
Intent: Billing Issue

Email: “Just wanted to say your customer service rep Sarah was absolutely fantastic! She went above and beyond.”
Intent: Positive Feedback

Email: “I’ve been waiting 3 weeks for my delivery. Your tracking page says it shipped but nothing has arrived.”
Intent: Shipping Complaint

Email: “Can I change my subscription from monthly to annual billing?”
Intent:

Model Output

Account Management

Few-Shot Prompt

Technical: “The API endpoint returns a 429 status code when the rate limit is exceeded.”
User-Friendly: “You’ve made too many requests too quickly. Wait a moment and try again.”

Technical: “The SSL/TLS handshake failed due to an expired certificate on the origin server.”
User-Friendly: “We can’t establish a secure connection right now. Our team has been notified and is working on it.”

Technical: “A race condition in the session management module caused intermittent authentication failures.”
User-Friendly:

Model Output

“You may have been logged out unexpectedly. Please sign in again — we’ve fixed the issue that was causing this.”

Few-Shot Prompt

Question: A store sells apples for $2 each and oranges for $3 each. If Maria buys 4 apples and 2 oranges, how much does she spend?
Reasoning: Apples: 4 x $2 = $8. Oranges: 2 x $3 = $6. Total: $8 + $6 = $14.
Answer: $14

Question: A train travels at 60 mph for 2.5 hours, then at 80 mph for 1.5 hours. What is the total distance?
Reasoning: First leg: 60 x 2.5 = 150 miles. Second leg: 80 x 1.5 = 120 miles. Total: 150 + 120 = 270 miles.
Answer: 270 miles

Question: A rectangle has a perimeter of 36 cm. If the length is twice the width, what is the area?
Reasoning:

Model Output

Reasoning: Let width = w. Length = 2w. Perimeter: 2(2w + w) = 36, so 6w = 36, w = 6 cm. Length = 12 cm. Area: 12 x 6 = 72 sq cm.
Answer: 72 square centimeters

When to Use Few-Shot Learning

Best for tasks where showing is easier than telling

Perfect For

Format-Sensitive Tasks

When output must follow a precise structure — JSON schemas, table formats, labeling conventions — examples communicate the exact format more reliably than descriptions.

Classification and Labeling

Sentiment analysis, intent detection, content moderation — any task where you need the model to assign items to predefined categories consistently.

Style and Tone Matching

When you need output in a specific voice — brand copy, academic writing, casual tone — showing examples conveys nuance that instructions miss.

Data Transformation

Converting between formats — technical jargon to plain language, raw data to structured output, one schema to another — where the mapping is best shown by example.

Skip It When

Simple, Well-Defined Tasks

If a clear instruction like “Translate this to Spanish” or “Summarize in one paragraph” gets reliable results, adding examples wastes tokens without improving output.

Token-Constrained Environments

Each example consumes prompt tokens. When working with strict token limits or high-volume pipelines where cost per call matters, zero-shot may be more economical.

Highly Creative or Open-Ended Tasks

Brainstorming, creative writing, or open-ended exploration can be overly constrained by examples — the model may mimic rather than innovate.

Use Cases

Where Few-Shot Learning delivers the most value

Customer Support Triage

Classify incoming tickets into categories like billing, technical, shipping, or account issues by showing a few labeled examples of each type.

Content Standardization

Rewrite product descriptions, blog excerpts, or marketing copy to match a specific brand voice by demonstrating the desired style through examples.

Code Generation Patterns

Show the model a few examples of your codebase’s conventions — naming patterns, error handling, documentation style — so generated code matches your project standards.

Localization and Translation

Provide translated pairs that capture your preferred terminology, formality level, and regional conventions so all translations stay consistent.

Data Extraction

Extract structured fields from unstructured text — names, dates, amounts, addresses — by showing a few examples of source text mapped to extracted fields.

Content Moderation

Teach the model your specific moderation policies by showing examples of content that passes, gets flagged, or gets removed — calibrating the boundary through demonstrations.

Where Few-Shot Learning Fits

Few-Shot Learning is the foundation of in-context learning

Zero-Shot Instructions Only Task described in words, no examples
One-Shot Single Example One demonstration to set the pattern
Few-Shot Multiple Examples 2–5 demonstrations for robust pattern learning
Example Selection Optimized Examples Dynamically chosen examples for each input
Quality Over Quantity

Research consistently shows that 2–5 well-chosen examples outperform 10+ poorly chosen ones. Focus on diversity (covering different cases), consistency (same format every time), and representativeness (examples that match the actual inputs the model will see). When combined with Example Selection techniques, you can dynamically choose the most relevant demonstrations for each new query, getting the best of both worlds.

Build Better Prompts with Examples

Put few-shot learning into practice with our interactive tools, or explore the full library of prompting frameworks.