> ## Documentation Index
> Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Structured outputs

> JSON schema validation and controlled text generation across all engines

Structured outputs let you generate text that conforms to specific JSON schemas, providing reliable data extraction and controlled text generation. [Model APIs](/inference/model-apis/overview) support structured outputs. For self-deployed models, Baseten engines like [BIS-LLM](/engines/bis-llm/overview) and [Engine-Builder-LLM](/engines/engine-builder-llm/overview) support them, as do other inference frameworks like [vLLM](/examples/vllm) and [SGLang](/examples/sglang).

## Quick start

Structured outputs require two components: a Pydantic schema defining your expected output format, and an API call that enforces that schema.

### Define a schema

Define a Pydantic model whose fields describe the output you want:

```python schema.py theme={"system"}
from pydantic import BaseModel

class Task(BaseModel):
    title: str
    priority: str  # "low", "medium", "high"
    due_date: str
    description: str
```

Each field requires a type annotation. The model's response will conform to these types exactly.

### Generate structured output

Pass the schema to the parse method and read the typed result. The code is the same on both surfaces; only the base URL and model name differ.

<Tabs>
  <Tab title="Model APIs">
    Point the client at `https://inference.baseten.co/v1` and pass a model slug from the [supported models table](/inference/model-apis/overview#supported-models):

    ```python structured_output.py theme={"system"}
    import os
    from pydantic import BaseModel
    from openai import OpenAI

    class Task(BaseModel):
        title: str
        priority: str
        due_date: str
        description: str

    client = OpenAI(
        api_key=os.environ['BASETEN_API_KEY'],
        base_url="https://inference.baseten.co/v1"
    )

    response = client.beta.chat.completions.parse(
        model="moonshotai/Kimi-K2.6",
        messages=[
            {"role": "user", "content": "Create a task for: Review the quarterly report by next Friday"}
        ],
        response_format=Task
    )

    task = response.choices[0].message.parsed
    print(f"Task: {task.title}")
    print(f"Priority: {task.priority}")
    ```
  </Tab>

  <Tab title="Dedicated deployment">
    Point the client at your deployment's production endpoint; the `model` field is required by the SDK but ignored by the deployment:

    ```python structured_output.py theme={"system"}
    import os
    from pydantic import BaseModel
    from openai import OpenAI

    class Task(BaseModel):
        title: str
        priority: str
        due_date: str
        description: str

    client = OpenAI(
        api_key=os.environ['BASETEN_API_KEY'],
        base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
    )

    response = client.beta.chat.completions.parse(
        model="not-required",
        messages=[
            {"role": "user", "content": "Create a task for: Review the quarterly report by next Friday"}
        ],
        response_format=Task
    )

    task = response.choices[0].message.parsed
    print(f"Task: {task.title}")
    print(f"Priority: {task.priority}")
    ```
  </Tab>
</Tabs>

Running either version prints the parsed fields:

```text Output theme={"system"}
Task: Review the quarterly report
Priority: high
```

Pass your Pydantic class to `response_format` and use `beta.chat.completions.parse` instead of the regular `create` method.

The response includes a `parsed` attribute with your data already converted to a `Task` object, so no JSON parsing is needed.

## Engine support

Structured outputs are compatible with:

* **Engine-Builder-LLM**, except when Lookahead speculative decoding is configured.
* **BIS-LLM**, except in some configurations, such as when the overlap scheduler is enabled.

### Model support

All Engine-Builder-LLM and BIS-LLM models support structured outputs with no extra configuration.

## Best practices

### Schema design

* **Keep schemas simple**: two to three levels of nesting for best results.
* **Use basic types**: str, int, float, bool when possible.
* **Set defaults**: Provide reasonable default values for optional fields.
* **Descriptive names**: Use clear, descriptive field names.

### Prompt engineering

* **Low temperature**: Use 0.1 to 0.3 for consistent outputs.
* **Provide schema**: Dump the model schema and few-shot examples into context.
* **Provide context**: Give background for complex schemas.

## Related

* [Engine-Builder-LLM overview](/engines/engine-builder-llm/overview): Dense model documentation.
* [BIS-LLM overview](/engines/bis-llm/overview): MoE model documentation.
* [Quantization guide](/engines/performance-concepts/quantization-guide): `FP8`/`FP4` trade-offs.
