Back to mlops
mlops v1.0.0 9.1 min read 575 lines

guidance

정규식/문법으로 LLM 출력 제어 — JSON/XML/코드 구조 보장, 구조화된 생성

Orchestra Research
MIT

Guidance: Constrained LLM Generation

When to Use This Skill

Use Guidance when you need to:

  • Control LLM output syntax with regex or grammars
  • Guarantee valid JSON/XML/code generation
  • Reduce latency vs traditional prompting approaches
  • Enforce structured formats (dates, emails, IDs, etc.)
  • Build multi-step workflows with Pythonic control flow
  • Prevent invalid outputs through grammatical constraints

GitHub Stars: 18,000+ | From: Microsoft Research

Installation

# Base installation
pip install guidance

With specific backends


pip install guidance[transformers] # Hugging Face models
pip install guidance[llama_cpp] # llama.cpp models

Quick Start

Basic Example: Structured Generation

from guidance import models, gen

Load model (supports OpenAI, Transformers, llama.cpp)


lm = models.OpenAI("gpt-4")

Generate with constraints


result = lm + "The capital of France is " + gen("capital", max_tokens=5)

print(result["capital"]) # "Paris"

With Anthropic Claude

from guidance import models, gen, system, user, assistant

Configure Claude


lm = models.Anthropic("claude-sonnet-4-5-20250929")

Use context managers for chat format


with system():
lm += "You are a helpful assistant."

with user():
lm += "What is the capital of France?"

with assistant():
lm += gen(max_tokens=20)

Core Concepts

1. Context Managers

Guidance uses Pythonic context managers for chat-style interactions.

from guidance import system, user, assistant, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

System message


with system():
lm += "You are a JSON generation expert."

User message


with user():
lm += "Generate a person object with name and age."

Assistant response


with assistant():
lm += gen("response", max_tokens=100)

print(lm["response"])

Benefits:

  • Natural chat flow
  • Clear role separation
  • Easy to read and maintain

2. Constrained Generation

Guidance ensures outputs match specified patterns using regex or grammars.

Regex Constraints

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Constrain to valid email format


lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

Constrain to date format (YYYY-MM-DD)


lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}")

Constrain to phone number


lm += "Phone: " + gen("phone", regex=r"\d{3}-\d{3}-\d{4}")

print(lm["email"]) # Guaranteed valid email
print(lm["date"]) # Guaranteed YYYY-MM-DD format

How it works:

  • Regex converted to grammar at token level
  • Invalid tokens filtered during generation
  • Model can only produce matching outputs

Selection Constraints

from guidance import models, gen, select

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Constrain to specific choices


lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")

Multiple-choice selection


lm += "Best answer: " + select(
["A) Paris", "B) London", "C) Berlin", "D) Madrid"],
name="answer"
)

print(lm["sentiment"]) # One of: positive, negative, neutral
print(lm["answer"]) # One of: A, B, C, or D

3. Token Healing

Guidance automatically "heals" token boundaries between prompt and generation.

Problem: Tokenization creates unnatural boundaries.

# Without token healing
prompt = "The capital of France is "

Last token: " is "


First generated token might be " Par" (with leading space)


Result: "The capital of France is Paris" (double space!)


Solution: Guidance backs up one token and regenerates.

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Token healing enabled by default


lm += "The capital of France is " + gen("capital", max_tokens=5)

Result: "The capital of France is Paris" (correct spacing)


Benefits:

  • Natural text boundaries
  • No awkward spacing issues
  • Better model performance (sees natural token sequences)

4. Grammar-Based Generation

Define complex structures using context-free grammars.

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

JSON grammar (simplified)


json_grammar = """
{
"name": ,
"age": ,
"email":
}
"""

Generate valid JSON


lm += gen("person", grammar=json_grammar)

print(lm["person"]) # Guaranteed valid JSON structure

Use cases:

  • Complex structured outputs
  • Nested data structures
  • Programming language syntax
  • Domain-specific languages

5. Guidance Functions

Create reusable generation patterns with the @guidance decorator.

from guidance import guidance, gen, models

@guidance
def generate_person(lm):
"""Generate a person with name and age."""
lm += "Name: " + gen("name", max_tokens=20, stop="\n")
lm += "\nAge: " + gen("age", regex=r"[0-9]+", max_tokens=3)
return lm

Use the function


lm = models.Anthropic("claude-sonnet-4-5-20250929")
lm = generate_person(lm)

print(lm["name"])
print(lm["age"])

Stateful Functions:

@guidance(stateless=False)
def react_agent(lm, question, tools, max_rounds=5):
"""ReAct agent with tool use."""
lm += f"Question: {question}\n\n"

for i in range(max_rounds):
# Thought
lm += f"Thought {i+1}: " + gen("thought", stop="\n")

# Action
lm += "\nAction: " + select(list(tools.keys()), name="action")

# Execute tool
tool_result = tools[lm["action"]]()
lm += f"\nObservation: {tool_result}\n\n"

# Check if done
lm += "Done? " + select(["Yes", "No"], name="done")
if lm["done"] == "Yes":
break

# Final answer
lm += "\nFinal Answer: " + gen("answer", max_tokens=100)
return lm

Backend Configuration

Anthropic Claude

from guidance import models

lm = models.Anthropic(
model="claude-sonnet-4-5-20250929",
api_key="your-api-key" # Or set ANTHROPIC_API_KEY env var
)

OpenAI

lm = models.OpenAI(
model="gpt-4o-mini",
api_key="your-api-key" # Or set OPENAI_API_KEY env var
)

Local Models (Transformers)

from guidance.models import Transformers

lm = Transformers(
"microsoft/Phi-4-mini-instruct",
device="cuda" # Or "cpu"
)

Local Models (llama.cpp)

from guidance.models import LlamaCpp

lm = LlamaCpp(
model_path="/path/to/model.gguf",
n_ctx=4096,
n_gpu_layers=35
)

Common Patterns

Pattern 1: JSON Generation

from guidance import models, gen, system, user, assistant

lm = models.Anthropic("claude-sonnet-4-5-20250929")

with system():
lm += "You generate valid JSON."

with user():
lm += "Generate a user profile with name, age, and email."

with assistant():
lm += """{
"name": """ + gen("name", regex=r'"[A-Za-z ]+"', max_tokens=30) + """,
"age": """ + gen("age", regex=r"[0-9]+", max_tokens=3) + """,
"email": """ + gen("email", regex=r'"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"', max_tokens=50) + """
}"""

print(lm) # Valid JSON guaranteed

Pattern 2: Classification

from guidance import models, gen, select

lm = models.Anthropic("claude-sonnet-4-5-20250929")

text = "This product is amazing! I love it."

lm += f"Text: {text}\n"
lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
lm += "\nConfidence: " + gen("confidence", regex=r"[0-9]+", max_tokens=3) + "%"

print(f"Sentiment: {lm['sentiment']}")
print(f"Confidence: {lm['confidence']}%")

Pattern 3: Multi-Step Reasoning

from guidance import models, gen, guidance

@guidance
def chain_of_thought(lm, question):
"""Generate answer with step-by-step reasoning."""
lm += f"Question: {question}\n\n"

# Generate multiple reasoning steps
for i in range(3):
lm += f"Step {i+1}: " + gen(f"step_{i+1}", stop="\n", max_tokens=100) + "\n"

# Final answer
lm += "\nTherefore, the answer is: " + gen("answer", max_tokens=50)

return lm

lm = models.Anthropic("claude-sonnet-4-5-20250929")
lm = chain_of_thought(lm, "What is 15% of 200?")

print(lm["answer"])

Pattern 4: ReAct Agent

from guidance import models, gen, select, guidance

@guidance(stateless=False)
def react_agent(lm, question):
"""ReAct agent with tool use."""
tools = {
"calculator": lambda expr: eval(expr),
"search": lambda query: f"Search results for: {query}",
}

lm += f"Question: {question}\n\n"

for round in range(5):
# Thought
lm += f"Thought: " + gen("thought", stop="\n") + "\n"

# Action selection
lm += "Action: " + select(["calculator", "search", "answer"], name="action")

if lm["action"] == "answer":
lm += "\nFinal Answer: " + gen("answer", max_tokens=100)
break

# Action input
lm += "\nAction Input: " + gen("action_input", stop="\n") + "\n"

# Execute tool
if lm["action"] in tools:
result = tools[lm["action"]](lm["action_input"])
lm += f"Observation: {result}\n\n"

return lm

lm = models.Anthropic("claude-sonnet-4-5-20250929")
lm = react_agent(lm, "What is 25 * 4 + 10?")
print(lm["answer"])

Pattern 5: Data Extraction

from guidance import models, gen, guidance

@guidance
def extract_entities(lm, text):
"""Extract structured entities from text."""
lm += f"Text: {text}\n\n"

# Extract person
lm += "Person: " + gen("person", stop="\n", max_tokens=30) + "\n"

# Extract organization
lm += "Organization: " + gen("organization", stop="\n", max_tokens=30) + "\n"

# Extract date
lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}", max_tokens=10) + "\n"

# Extract location
lm += "Location: " + gen("location", stop="\n", max_tokens=30) + "\n"

return lm

text = "Tim Cook announced at Apple Park on 2024-09-15 in Cupertino."

lm = models.Anthropic("claude-sonnet-4-5-20250929")
lm = extract_entities(lm, text)

print(f"Person: {lm['person']}")
print(f"Organization: {lm['organization']}")
print(f"Date: {lm['date']}")
print(f"Location: {lm['location']}")

Best Practices

1. Use Regex for Format Validation

# ✅ Good: Regex ensures valid format
lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

❌ Bad: Free generation may produce invalid emails


lm += "Email: " + gen("email", max_tokens=50)

2. Use select() for Fixed Categories

# ✅ Good: Guaranteed valid category
lm += "Status: " + select(["pending", "approved", "rejected"], name="status")

❌ Bad: May generate typos or invalid values


lm += "Status: " + gen("status", max_tokens=20)

3. Leverage Token Healing

# Token healing is enabled by default

No special action needed - just concatenate naturally


lm += "The capital is " + gen("capital") # Automatic healing

4. Use stop Sequences

# ✅ Good: Stop at newline for single-line outputs
lm += "Name: " + gen("name", stop="\n")

❌ Bad: May generate multiple lines


lm += "Name: " + gen("name", max_tokens=50)

5. Create Reusable Functions

# ✅ Good: Reusable pattern
@guidance
def generate_person(lm):
lm += "Name: " + gen("name", stop="\n")
lm += "\nAge: " + gen("age", regex=r"[0-9]+")
return lm

Use multiple times


lm = generate_person(lm)
lm += "\n\n"
lm = generate_person(lm)

6. Balance Constraints

# ✅ Good: Reasonable constraints
lm += gen("name", regex=r"[A-Za-z ]+", max_tokens=30)

❌ Too strict: May fail or be very slow


lm += gen("name", regex=r"^(John|Jane)$", max_tokens=10)

Comparison to Alternatives

| Feature | Guidance | Instructor | Outlines | LMQL |
|---------|----------|------------|----------|------|
| Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
| Grammar Support | ✅ CFG | ❌ No | ✅ CFG | ✅ CFG |
| Pydantic Validation | ❌ No | ✅ Yes | ✅ Yes | ❌ No |
| Token Healing | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
| Local Models | ✅ Yes | ⚠️ Limited | ✅ Yes | ✅ Yes |
| API Models | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes |
| Pythonic Syntax | ✅ Yes | ✅ Yes | ✅ Yes | ❌ SQL-like |
| Learning Curve | Low | Low | Medium | High |

When to choose Guidance:

  • Need regex/grammar constraints
  • Want token healing
  • Building complex workflows with control flow
  • Using local models (Transformers, llama.cpp)
  • Prefer Pythonic syntax

When to choose alternatives:

  • Instructor: Need Pydantic validation with automatic retrying
  • Outlines: Need JSON schema validation
  • LMQL: Prefer declarative query syntax

Performance Characteristics

Latency Reduction:

  • 30-50% faster than traditional prompting for constrained outputs
  • Token healing reduces unnecessary regeneration
  • Grammar constraints prevent invalid token generation

Memory Usage:

  • Minimal overhead vs unconstrained generation
  • Grammar compilation cached after first use
  • Efficient token filtering at inference time

Token Efficiency:

  • Prevents wasted tokens on invalid outputs
  • No need for retry loops
  • Direct path to valid outputs

Resources

  • Documentation: https://guidance.readthedocs.io
  • GitHub: https://github.com/guidance-ai/guidance (18k+ stars)
  • Notebooks: https://github.com/guidance-ai/guidance/tree/main/notebooks
  • Discord: Community support available

See Also

  • references/constraints.md - Comprehensive regex and grammar patterns
  • references/backends.md - Backend-specific configuration
  • references/examples.md - Production-ready examples

Related Skills / 관련 스킬

mlops v1.0.0

modal-serverless-gpu

서버리스 GPU 클라우드 — ML 워크로드 온디맨드 GPU, 모델 API 배포, 자동 스케일링

mlops v1.0.0

evaluating-llms-harness

60개 이상 학술 벤치마크로 LLM 평가 — MMLU, HumanEval, GSM8K, TruthfulQA 등

mlops v1.0.0

weights-and-biases

W&B로 ML 실험 추적 — 자동 로깅, 실시간 시각화, 하이퍼파라미터 스윕, 모델 레지스트리

mlops v1.0.0

huggingface-hub

Hugging Face Hub CLI (hf) — 모델/데이터셋 검색, 다운로드, 업로드, Space 관리