Query Language Architecture

LMQL (Language Model Query Language)

What if you could query a language model the way you query a database? LMQL brings SQL-like structure to LLM interaction — combining natural language prompts with Python scripting and declarative constraints to produce structured, type-safe, cost-efficient outputs.

Technique Context: 2022

Introduced: LMQL was developed in 2022 by Beurer-Kellner, Fischer, and Vechev at ETH Zurich. The language was designed to address a fundamental limitation of natural language prompting: the lack of programmatic control over model outputs. By introducing SQL-like query syntax with Python interoperability, LMQL enabled developers to specify output constraints (type, length, format), control decoding strategies, and integrate external tool calls — all within a single query. The original implementation demonstrated inference cost reductions of up to 80% through efficient constraint-guided decoding.

Modern LLM Status: LMQL pioneered the concept of programmatic LLM interaction with constraints. While many models in 2026 natively support structured output (JSON mode, function calling), LMQL’s approach of combining natural language with programmatic constraints influenced tools like DSPy, Outlines, and Guidance. The query language paradigm remains relevant for complex orchestration tasks where developers need fine-grained control over model behavior, multi-step pipelines with type-safe intermediate results, and cost optimization through constraint-guided decoding. The core insight — that LLM interaction benefits from the same structured query paradigms used in databases — has become a foundational idea in the LLM tooling ecosystem.

The Core Insight

Query Languages for Language Models

Natural language prompting is flexible but imprecise. You can ask a model to “return a JSON object with three fields,” but there is no guarantee it will comply. The output might include markdown formatting, extra commentary, missing fields, or malformed syntax. Every downstream system that consumes model output must handle these failures — and often does so poorly.

LMQL treats this as a query problem. Just as SQL lets you declare what data you want from a database without specifying how to retrieve it, LMQL lets you declare what output structure you need from a model without manually engineering the prompt to coerce compliance. You write constraints — type requirements, length limits, value ranges, format specifications — and the LMQL runtime handles the decoding strategy to satisfy them.

Think of it as the difference between asking a librarian to “find me something about history” versus submitting a structured catalog query with subject codes, date ranges, and format requirements. Both get results, but only one guarantees the shape of what comes back.

Why Constraints Beat Hope

When you prompt a model with “respond in JSON,” you are hoping for compliance. When you use LMQL’s constraint system, you are enforcing compliance at the decoding level. The model physically cannot produce tokens that violate your constraints. This eliminates an entire class of parsing errors, retry loops, and defensive coding that plagues natural language prompt pipelines. The result: more reliable systems with fewer failure modes and lower inference costs.

The LMQL Process

Four stages from query definition to constrained output

1

Define the Query Template

Write a prompt template that combines natural language with placeholder variables. These variables represent the parts of the output you want the model to generate. The template reads like a conversation with blanks — familiar to anyone who has written SQL SELECT statements or Python f-strings.

Example

“Classify the following review: [REVIEW_TEXT]. Sentiment: [SENTIMENT]. Confidence: [CONFIDENCE].” — The bracketed variables are what the model will fill in.

2

Declare Output Constraints

Specify constraints on each variable using a WHERE clause. Constraints can enforce type (string, int, float), limit value ranges (SENTIMENT in [“positive”, “negative”, “neutral”]), restrict length (len(SUMMARY) < 100), or apply custom validation functions. These constraints are not suggestions — they are enforced during token generation.

Example

“WHERE SENTIMENT in [‘positive’, ‘negative’, ‘neutral’] AND CONFIDENCE > 0 AND CONFIDENCE <= 1.0” — The model can only produce these exact values.

3

Execute with Constraint-Guided Decoding

The LMQL runtime compiles your query into an optimized decoding plan. Instead of generating all tokens freely and checking constraints afterward, it masks invalid tokens during generation. This means the model never wastes compute producing outputs that will be rejected, resulting in significant cost savings and guaranteed constraint satisfaction.

Example

When generating SENTIMENT, the decoder only allows tokens that can form “positive,” “negative,” or “neutral” — all other vocabulary tokens are masked to probability zero.

4

Receive Structured, Validated Output

The query returns structured results with each variable populated according to its constraints. No post-processing parsing is needed — the output is already in the correct format, type, and range. You can pipe these results directly into downstream systems, chain them into multi-step pipelines, or use them as inputs to subsequent LMQL queries.

Example

Returns: SENTIMENT = “positive”, CONFIDENCE = 0.92 — Both values are guaranteed to satisfy the declared constraints without any parsing or validation code.

See the Difference

Why declarative constraints produce more reliable outputs than prompt engineering alone

Natural Language Prompt

Prompt

Analyze this customer review and return a JSON object with fields: sentiment (positive, negative, or neutral), confidence (0 to 1), and summary (under 50 words). Review: “The product arrived late but the quality exceeded my expectations.”

Response

Here’s my analysis:

```json
{“sentiment”: “mixed/positive”, “confidence”: “high”, “summary”: “The customer experienced a shipping delay but was pleasantly surprised by the product quality, resulting in an overall positive impression despite the logistics issue.”}
```

Note: I classified this as mixed/positive because...

Invalid enum value, wrong confidence type, extra commentary breaks JSON parsing
VS

LMQL Query

Constrained Query

Template: Analyze this review: “The product arrived late but the quality exceeded my expectations.”
Sentiment: [SENTIMENT]
Confidence: [CONFIDENCE]
Summary: [SUMMARY]

WHERE SENTIMENT in [“positive”, “negative”, “neutral”] AND CONFIDENCE is float AND CONFIDENCE > 0 AND CONFIDENCE <= 1.0 AND len(SUMMARY) < 200

Structured Output

SENTIMENT = “positive”
CONFIDENCE = 0.78
SUMMARY = “Late delivery offset by product quality exceeding expectations.”

Valid enum, correct type, length-compliant, no extra text — machine-parseable

Natural Language Works Too

While structured frameworks and contextual labels are powerful tools, LLMs are exceptionally good at understanding natural language. As long as your prompt contains the actual contextual information needed to create, answer, or deliver the response you’re looking for — the who, what, why, and constraints — the AI can produce complete and accurate results whether you use a formal framework or plain conversational language. But even in 2026, with the best prompts, verifying AI output is always a necessary step.

LMQL in Action

See how query-based prompting enables structured, constrained LLM interaction

LMQL Query

Extract structured information from this job posting:

“We are hiring a Senior Data Engineer with 5+ years of experience in Python and Spark. Remote-friendly, based in Austin, TX. Salary range $140K-$180K.”

Title: [TITLE]
Level: [LEVEL]
Skills: [SKILLS]
Location: [LOCATION]
Remote: [REMOTE]

WHERE LEVEL in [“Junior”, “Mid”, “Senior”, “Lead”, “Principal”] AND REMOTE in [“Yes”, “No”, “Hybrid”] AND len(TITLE) < 100

Constrained Output

TITLE = “Senior Data Engineer”
LEVEL = “Senior”
SKILLS = “Python, Spark”
LOCATION = “Austin, TX”
REMOTE = “Yes”

Always verify AI-extracted data against the original source before using it in hiring decisions or automated systems.

LMQL Query

A patient presents with fatigue, weight gain, and cold sensitivity.

Step 1 — Identify possible conditions: [CONDITIONS]
Step 2 — Most likely diagnosis: [DIAGNOSIS]
Step 3 — Recommended initial test: [TEST]
Step 4 — Urgency level: [URGENCY]

WHERE URGENCY in [“Routine”, “Urgent”, “Emergency”] AND len(CONDITIONS) < 300 AND len(DIAGNOSIS) < 100 AND len(TEST) < 100

Constrained Output

CONDITIONS = “Hypothyroidism, iron deficiency anemia, chronic fatigue syndrome, depression, adrenal insufficiency”
DIAGNOSIS = “Hypothyroidism (underactive thyroid)”
TEST = “TSH (thyroid-stimulating hormone) blood test”
URGENCY = “Routine”

This is an AI-generated analysis for educational purposes only. Never use AI output for medical diagnosis — always consult a licensed healthcare professional. Verify all suggestions independently.

LMQL Query

Classify the following customer message for our support routing system:

“I was charged twice for my subscription renewal last month and I want a refund immediately.”

Category: [CATEGORY]
Priority: [PRIORITY]
Sentiment: [SENTIMENT]
Requires Human: [HUMAN_NEEDED]
Suggested Action: [ACTION]

WHERE CATEGORY in [“Billing”, “Technical”, “Account”, “Feature Request”, “General”] AND PRIORITY in [“Low”, “Medium”, “High”, “Critical”] AND SENTIMENT in [“Positive”, “Neutral”, “Negative”, “Frustrated”] AND HUMAN_NEEDED in [“Yes”, “No”] AND len(ACTION) < 200

Constrained Output

CATEGORY = “Billing”
PRIORITY = “High”
SENTIMENT = “Frustrated”
HUMAN_NEEDED = “Yes”
ACTION = “Escalate to billing team for duplicate charge investigation and refund processing”

AI classification should supplement, not replace, human judgment in customer service. Always have a human review escalated cases before taking financial actions like refunds.

When to Use LMQL

Best for structured, constrained LLM interactions in production systems

Perfect For

Production Data Pipelines

When model output feeds directly into databases, APIs, or downstream systems that require strict type and format guarantees.

Classification and Extraction

Tasks that require outputs from a fixed set of categories, labels, or structured fields — where free-form responses create parsing headaches.

Cost Optimization

High-volume inference where constraint-guided decoding eliminates wasted tokens, reducing API costs by preventing invalid generations.

Multi-Step Orchestration

Complex pipelines where each step’s output must meet specific constraints before feeding into the next step — type-safe chaining of LLM calls.

Skip It When

Free-Form Creative Tasks

Writing, brainstorming, or conversational tasks where rigid output constraints would stifle the model’s creative or exploratory capabilities.

Simple One-Shot Interactions

Quick, conversational queries where the overhead of defining a query schema outweighs the benefit — just ask the model directly.

Models with Native Structured Output

When using APIs that already support JSON mode, function calling, or tool use natively — the built-in constraints may be sufficient without LMQL’s layer.

Use Cases

Where LMQL delivers the most value

Document Processing

Extract structured fields from invoices, contracts, or forms with guaranteed output schemas that integrate directly into document management systems.

Content Moderation

Classify user-generated content into fixed policy categories with enforced confidence scores, enabling automated routing and human review workflows.

API Response Generation

Generate API-compatible responses with guaranteed JSON schema compliance, eliminating the parsing failures that plague LLM-powered endpoints.

Chatbot Routing

Classify user intents into predefined categories with constrained confidence scores, enabling deterministic routing to specialized handlers or human agents.

Compliance Automation

Evaluate regulatory documents against fixed compliance criteria with structured pass/fail/review outputs that feed directly into audit trails.

Batch Data Labeling

Label thousands of data points with constrained taxonomies at scale, ensuring consistent category assignments across entire datasets for ML training pipelines.

Where LMQL Fits

LMQL bridges natural language prompting and programmatic LLM control

Natural Language Prompts Unstructured Input Free-form text, no output guarantees
LMQL Query Language SQL-like constraints with NL templates
DSPy Compiled Pipelines Optimized prompt programs with signatures
Native Structured Output Built-In Constraints JSON mode, function calling, tool use
The Legacy That Matters

Even if you never write an LMQL query directly, understanding its paradigm shift is valuable: the idea that LLM interaction can be declarative rather than imperative — specifying what you want rather than hoping the model produces it — fundamentally changed how the industry thinks about prompt engineering. Every JSON mode, function calling API, and structured output feature in modern LLMs traces its intellectual lineage back to the constraint-guided approach LMQL pioneered.

The key takeaway for prompt engineers: think in terms of constraints, not instructions. Instead of telling the model “please format your response as JSON,” define what the output structure must be. Whether you use LMQL, a native JSON mode, or a structured output library, the mental model is the same — declare the shape of what you need, and let the system enforce it.

Structure Your LLM Interactions

Explore how constraint-based prompting can make your AI workflows more reliable, or build structured prompts with our tools.