Structured Outputs

For most production use, you don’t want prose — you want machine-parseable output. JSON, function arguments, schemas. Getting this reliably is often the difference between a working product and a working demo.

The problem

Ask an LLM “give me a JSON object with fields A and B,” and:

It might add prose: “Sure! Here’s the JSON: {…}”
It might use markdown code fences: json ...
It might add trailing commas, comments, or single quotes that break JSON.
It might fabricate fields or omit required ones.

Getting valid JSON 100% of the time requires more than asking nicely.

Strategy 1 — JSON mode / structured output APIs

Modern providers offer enforced structured outputs:

OpenAI structured outputs

client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "user_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                },
                "required": ["name", "age"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
)

With strict: true, the API guarantees the output matches the schema. Achieved by constraining the decoding step (only generate tokens consistent with the grammar).

Anthropic tool use as structured output

Anthropic doesn’t have JSON mode but offers tool use: define a “fake” tool whose arguments are your desired schema:

tool = {
    "name": "extract_user",
    "description": "Extract user information.",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
        },
        "required": ["name", "age"],
    },
}

response = client.messages.create(
    model="claude-sonnet-4-6",
    tools=[tool],
    tool_choice={"type": "tool", "name": "extract_user"},
    messages=[...],
)

The “tool input” in the response is your schema-conformant JSON.

Outlines, Instructor, LMQL

Open-source libraries that constrain decoding to match a Pydantic schema or a Lark grammar:

from instructor import patch
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

client = patch(OpenAI())
user = client.chat.completions.create(
    model="gpt-4o",
    response_model=User,
    messages=[...],
)

Works against any LLM provider with retry logic.

Strategy 2 — Tool calling

When the model needs to invoke a function, providers support function/tool calling:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["c", "f"]},
            },
            "required": ["city"],
        },
    },
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    tools=tools,
)

If the model decides to use a tool, the response contains a structured tool_calls array. You execute the tool, return the result, the model continues.

This is the foundation of agents (Stage 11).

Strategy 3 — Prefilling

For Anthropic, prefill the assistant message with {:

messages = [
    {"role": "user", "content": "Output a JSON object with name and age for John, 30."},
    {"role": "assistant", "content": "{"},
]

The model continues from there with \"name\":.... You re-add the { before parsing. Reliable for simple cases; not as bulletproof as schema-enforced JSON.

Strategy 4 — XML for structured but flexible output

When you need structure but not rigid JSON (e.g. you want some prose plus structured fields):

<analysis>
  <summary>The user wants to cancel their subscription.</summary>
  <intent>cancel_subscription</intent>
  <urgency>high</urgency>
  <next_action>Confirm cancellation reason</next_action>
</analysis>

XML is forgiving (typos don’t always break parsers), readable, and works well with Anthropic models, which were heavily trained on XML-tagged training data.

Schema design tips

Use enums wherever possible. "sentiment": "positive" | "negative" | "neutral" is much more reliable than free-form strings.
Required vs optional: be explicit. Don’t rely on the model to default optional fields sanely.
Add descriptions to schema fields. "age": {"type": "integer", "description": "Age in years; null if unknown"} helps the model.
Keep schemas flat when you can. Deeply nested optionals confuse models.
Examples in the schema description: "description": "ISO 8601 date, e.g. '2026-01-15'".
Don’t ask for too much at once. A schema with 30 fields fails more than one with 5. Break into smaller calls.

Validation and retry

Even with strict mode, validate downstream:

from pydantic import ValidationError

for attempt in range(3):
    response = call_llm(prompt)
    try:
        return User.model_validate(response.parsed)
    except ValidationError as e:
        prompt += f"\n\nThe previous response was invalid: {e}. Try again."

For non-strict APIs, this loop is essential. With strict mode, validation is mostly for runtime safety.

Common bugs

Extra fields: model adds fields not in schema. Strict mode rejects; otherwise log and ignore or fail loud.
Wrong types: “age”: “30” instead of 30. Pydantic will coerce; vanilla JSON might not.
Empty strings instead of nulls: depends on schema design.
Markdown code fences leaking: json ... . Strip before parsing if not using strict mode.
Hallucinated nested fields: model invents structure not in schema. Validate strictly.

When to NOT force structured output

Open-ended creative tasks: forcing structure constrains creativity.
Long-form generation: chat responses, articles, summaries.
Reasoning steps: the CoT itself shouldn’t be JSON. Just the final answer.

For these, generate freely and parse separately, or use a “reason then output” pattern with a separator.

Cost considerations

Structured-output APIs have similar token cost to normal calls.
Tool-calling responses count tool definitions as input tokens. Long tool definitions are expensive.
Validation retries multiply cost — keep them rare.

Practical advice

Default to provider-native structured outputs (OpenAI strict mode, Anthropic tool use).
For provider-agnostic code, use Instructor or Outlines.
Validate every output, even from “guaranteed” APIs.
Log violations for offline analysis.
For agents, treat tool calling as your primary structured-output mechanism — see Stage 11.