Back to mlops
mlops v1.0.0 9.1 min read 655 lines

outlines

생성 중 유효한 JSON/XML/코드 구조 보장 — Pydantic 모델, 타입 안전 출력

Orchestra Research
MIT

Outlines: Structured Text Generation

When to Use This Skill

Use Outlines when you need to:

  • Guarantee valid JSON/XML/code structure during generation
  • Use Pydantic models for type-safe outputs
  • Support local models (Transformers, llama.cpp, vLLM)
  • Maximize inference speed with zero-overhead structured generation
  • Generate against JSON schemas automatically
  • Control token sampling at the grammar level

GitHub Stars: 8,000+ | From: dottxt.ai (formerly .txt)

Installation

# Base installation
pip install outlines

With specific backends


pip install outlines transformers # Hugging Face models
pip install outlines llama-cpp-python # llama.cpp
pip install outlines vllm # vLLM for high-throughput

Quick Start

Basic Example: Classification

import outlines
from typing import Literal

Load model


model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Generate with type constraint


prompt = "Sentiment of 'This product is amazing!': "
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])
sentiment = generator(prompt)

print(sentiment) # "positive" (guaranteed one of these)

With Pydantic Models

from pydantic import BaseModel
import outlines

class User(BaseModel):
name: str
age: int
email: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Generate structured output


prompt = "Extract user: John Doe, 30 years old, john@example.com"
generator = outlines.generate.json(model, User)
user = generator(prompt)

print(user.name) # "John Doe"
print(user.age) # 30
print(user.email) # "john@example.com"

Core Concepts

1. Constrained Token Sampling

Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level.

How it works:

  • Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG)
  • Transform CFG into Finite State Machine (FSM)
  • Filter invalid tokens at each step during generation
  • Fast-forward when only one valid token exists

Benefits:

  • Zero overhead: Filtering happens at token level
  • Speed improvement: Fast-forward through deterministic paths
  • Guaranteed validity: Invalid outputs impossible

import outlines

Pydantic model -> JSON schema -> CFG -> FSM


class Person(BaseModel):
name: str
age: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Behind the scenes:


1. Person -> JSON schema


2. JSON schema -> CFG


3. CFG -> FSM


4. FSM filters tokens during generation

generator = outlines.generate.json(model, Person)
result = generator("Generate person: Alice, 25")

2. Structured Generators

Outlines provides specialized generators for different output types.

Choice Generator

# Multiple choice selection
generator = outlines.generate.choice(
model,
["positive", "negative", "neutral"]
)

sentiment = generator("Review: This is great!")

Result: One of the three choices


JSON Generator

from pydantic import BaseModel

class Product(BaseModel):
name: str
price: float
in_stock: bool

Generate valid JSON matching schema


generator = outlines.generate.json(model, Product)
product = generator("Extract: iPhone 15, $999, available")

Guaranteed valid Product instance


print(type(product)) #

Regex Generator

# Generate text matching regex
generator = outlines.generate.regex(
model,
r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern
)

phone = generator("Generate phone number:")

Result: "555-123-4567" (guaranteed to match pattern)


Integer/Float Generators

# Generate specific numeric types
int_generator = outlines.generate.integer(model)
age = int_generator("Person's age:") # Guaranteed integer

float_generator = outlines.generate.float(model)
price = float_generator("Product price:") # Guaranteed float

3. Model Backends

Outlines supports multiple local and API-based backends.

Transformers (Hugging Face)

import outlines

Load from Hugging Face


model = outlines.models.transformers(
"microsoft/Phi-3-mini-4k-instruct",
device="cuda" # Or "cpu"
)

Use with any generator


generator = outlines.generate.json(model, YourModel)

llama.cpp

# Load GGUF model
model = outlines.models.llamacpp(
"./models/llama-3.1-8b-instruct.Q4_K_M.gguf",
n_gpu_layers=35
)

generator = outlines.generate.json(model, YourModel)

vLLM (High Throughput)

# For production deployments
model = outlines.models.vllm(
"meta-llama/Llama-3.1-8B-Instruct",
tensor_parallel_size=2 # Multi-GPU
)

generator = outlines.generate.json(model, YourModel)

OpenAI (Limited Support)

# Basic OpenAI support
model = outlines.models.openai(
"gpt-4o-mini",
api_key="your-api-key"
)

Note: Some features limited with API models


generator = outlines.generate.json(model, YourModel)

4. Pydantic Integration

Outlines has first-class Pydantic support with automatic schema translation.

Basic Models

from pydantic import BaseModel, Field

class Article(BaseModel):
title: str = Field(description="Article title")
author: str = Field(description="Author name")
word_count: int = Field(description="Number of words", gt=0)
tags: list[str] = Field(description="List of tags")

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, Article)

article = generator("Generate article about AI")
print(article.title)
print(article.word_count) # Guaranteed > 0

Nested Models

class Address(BaseModel):
street: str
city: str
country: str

class Person(BaseModel):
name: str
age: int
address: Address # Nested model

generator = outlines.generate.json(model, Person)
person = generator("Generate person in New York")

print(person.address.city) # "New York"

Enums and Literals

from enum import Enum
from typing import Literal

class Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"

class Application(BaseModel):
applicant: str
status: Status # Must be one of enum values
priority: Literal["low", "medium", "high"] # Must be one of literals

generator = outlines.generate.json(model, Application)
app = generator("Generate application")

print(app.status) # Status.PENDING (or APPROVED/REJECTED)

Common Patterns

Pattern 1: Data Extraction

from pydantic import BaseModel
import outlines

class CompanyInfo(BaseModel):
name: str
founded_year: int
industry: str
employees: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, CompanyInfo)

text = """
Apple Inc. was founded in 1976 in the technology industry.
The company employs approximately 164,000 people worldwide.
"""

prompt = f"Extract company information:\n{text}\n\nCompany:"
company = generator(prompt)

print(f"Name: {company.name}")
print(f"Founded: {company.founded_year}")
print(f"Industry: {company.industry}")
print(f"Employees: {company.employees}")

Pattern 2: Classification

from typing import Literal
import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Binary classification


generator = outlines.generate.choice(model, ["spam", "not_spam"])
result = generator("Email: Buy now! 50% off!")

Multi-class classification


categories = ["technology", "business", "sports", "entertainment"]
category_gen = outlines.generate.choice(model, categories)
category = category_gen("Article: Apple announces new iPhone...")

With confidence


class Classification(BaseModel):
label: Literal["positive", "negative", "neutral"]
confidence: float

classifier = outlines.generate.json(model, Classification)
result = classifier("Review: This product is okay, nothing special")

Pattern 3: Structured Forms

class UserProfile(BaseModel):
full_name: str
age: int
email: str
phone: str
country: str
interests: list[str]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, UserProfile)

prompt = """
Extract user profile from:
Name: Alice Johnson
Age: 28
Email: alice@example.com
Phone: 555-0123
Country: USA
Interests: hiking, photography, cooking
"""

profile = generator(prompt)
print(profile.full_name)
print(profile.interests) # ["hiking", "photography", "cooking"]

Pattern 4: Multi-Entity Extraction

class Entity(BaseModel):
name: str
type: Literal["PERSON", "ORGANIZATION", "LOCATION"]

class DocumentEntities(BaseModel):
entities: list[Entity]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, DocumentEntities)

text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond."
prompt = f"Extract entities from: {text}"

result = generator(prompt)
for entity in result.entities:
print(f"{entity.name} ({entity.type})")

Pattern 5: Code Generation

class PythonFunction(BaseModel):
function_name: str
parameters: list[str]
docstring: str
body: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, PythonFunction)

prompt = "Generate a Python function to calculate factorial"
func = generator(prompt)

print(f"def {func.function_name}({', '.join(func.parameters)}):")
print(f' """{func.docstring}"""')
print(f" {func.body}")

Pattern 6: Batch Processing

def batch_extract(texts: list[str], schema: type[BaseModel]):
"""Extract structured data from multiple texts."""
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, schema)

results = []
for text in texts:
result = generator(f"Extract from: {text}")
results.append(result)

return results

class Person(BaseModel):
name: str
age: int

texts = [
"John is 30 years old",
"Alice is 25 years old",
"Bob is 40 years old"
]

people = batch_extract(texts, Person)
for person in people:
print(f"{person.name}: {person.age}")

Backend Configuration

Transformers

import outlines

Basic usage


model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

GPU configuration


model = outlines.models.transformers(
"microsoft/Phi-3-mini-4k-instruct",
device="cuda",
model_kwargs={"torch_dtype": "float16"}
)

Popular models


model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

llama.cpp

# Load GGUF model
model = outlines.models.llamacpp(
"./models/llama-3.1-8b.Q4_K_M.gguf",
n_ctx=4096, # Context window
n_gpu_layers=35, # GPU layers
n_threads=8 # CPU threads
)

Full GPU offload


model = outlines.models.llamacpp(
"./models/model.gguf",
n_gpu_layers=-1 # All layers on GPU
)

vLLM (Production)

# Single GPU
model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")

Multi-GPU


model = outlines.models.vllm(
"meta-llama/Llama-3.1-70B-Instruct",
tensor_parallel_size=4 # 4 GPUs
)

With quantization


model = outlines.models.vllm(
"meta-llama/Llama-3.1-8B-Instruct",
quantization="awq" # Or "gptq"
)

Best Practices

1. Use Specific Types

# ✅ Good: Specific types
class Product(BaseModel):
name: str
price: float # Not str
quantity: int # Not str
in_stock: bool # Not str

❌ Bad: Everything as string


class Product(BaseModel):
name: str
price: str # Should be float
quantity: str # Should be int

2. Add Constraints

from pydantic import Field

✅ Good: With constraints


class User(BaseModel):
name: str = Field(min_length=1, max_length=100)
age: int = Field(ge=0, le=120)
email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")

❌ Bad: No constraints


class User(BaseModel):
name: str
age: int
email: str

3. Use Enums for Categories

# ✅ Good: Enum for fixed set
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"

class Task(BaseModel):
title: str
priority: Priority

❌ Bad: Free-form string


class Task(BaseModel):
title: str
priority: str # Can be anything

4. Provide Context in Prompts

# ✅ Good: Clear context
prompt = """
Extract product information from the following text.
Text: iPhone 15 Pro costs $999 and is currently in stock.
Product:
"""

❌ Bad: Minimal context


prompt = "iPhone 15 Pro costs $999 and is currently in stock."

5. Handle Optional Fields

from typing import Optional

✅ Good: Optional fields for incomplete data


class Article(BaseModel):
title: str # Required
author: Optional[str] = None # Optional
date: Optional[str] = None # Optional
tags: list[str] = [] # Default empty list

Can succeed even if author/date missing


Comparison to Alternatives

| Feature | Outlines | Instructor | Guidance | LMQL |
|---------|----------|------------|----------|------|
| Pydantic Support | ✅ Native | ✅ Native | ❌ No | ❌ No |
| JSON Schema | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes |
| Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
| Local Models | ✅ Full | ⚠️ Limited | ✅ Full | ✅ Full |
| API Models | ⚠️ Limited | ✅ Full | ✅ Full | ✅ Full |
| Zero Overhead | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes |
| Automatic Retrying | ❌ No | ✅ Yes | ❌ No | ❌ No |
| Learning Curve | Low | Low | Low | High |

When to choose Outlines:

  • Using local models (Transformers, llama.cpp, vLLM)
  • Need maximum inference speed
  • Want Pydantic model support
  • Require zero-overhead structured generation
  • Control token sampling process

When to choose alternatives:

  • Instructor: Need API models with automatic retrying
  • Guidance: Need token healing and complex workflows
  • LMQL: Prefer declarative query syntax

Performance Characteristics

Speed:

  • Zero overhead: Structured generation as fast as unconstrained
  • Fast-forward optimization: Skips deterministic tokens
  • 1.2-2x faster than post-generation validation approaches

Memory:

  • FSM compiled once per schema (cached)
  • Minimal runtime overhead
  • Efficient with vLLM for high throughput

Accuracy:

  • 100% valid outputs (guaranteed by FSM)
  • No retry loops needed
  • Deterministic token filtering

Resources

  • Documentation: https://outlines-dev.github.io/outlines
  • GitHub: https://github.com/outlines-dev/outlines (8k+ stars)
  • Discord: https://discord.gg/R9DSu34mGd
  • Blog: https://blog.dottxt.co

See Also

  • references/json_generation.md - Comprehensive JSON and Pydantic patterns
  • references/backends.md - Backend-specific configuration
  • references/examples.md - Production-ready examples

Related Skills / 관련 스킬

mlops v1.0.0

modal-serverless-gpu

서버리스 GPU 클라우드 — ML 워크로드 온디맨드 GPU, 모델 API 배포, 자동 스케일링

mlops v1.0.0

evaluating-llms-harness

60개 이상 학술 벤치마크로 LLM 평가 — MMLU, HumanEval, GSM8K, TruthfulQA 등

mlops v1.0.0

weights-and-biases

W&B로 ML 실험 추적 — 자동 로깅, 실시간 시각화, 하이퍼파라미터 스윕, 모델 레지스트리

mlops v1.0.0

huggingface-hub

Hugging Face Hub CLI (hf) — 모델/데이터셋 검색, 다운로드, 업로드, Space 관리