Skip to main content
Python
import anthropic
import os

API_KEY = os.environ["BASETEN_API_KEY"]

client = anthropic.Anthropic(
    base_url="https://inference.baseten.co",
    api_key=API_KEY,
    default_headers={"Authorization": f"Bearer {API_KEY}"},
)

response = client.messages.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.content[0].text)
{
  "id": "<string>",
  "type": "<string>",
  "role": "<string>",
  "content": [
    {
      "type": "<string>",
      "text": "<string>"
    }
  ],
  "model": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123
  },
  "stop_sequence": "<string>"
}

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

Download the OpenAPI schema for code generation and client libraries.
Model APIs accept requests in the Anthropic Messages API format at https://inference.baseten.co/v1/messages.

Call with the Anthropic SDK

The Anthropic SDK sends the API key as x-api-key by default. Baseten reads Authorization, so override default_headers when creating the client:
import anthropic
import os

API_KEY = os.environ["BASETEN_API_KEY"]

client = anthropic.Anthropic(
    base_url="https://inference.baseten.co",
    api_key=API_KEY,
    default_headers={"Authorization": f"Bearer {API_KEY}"},
)

response = client.messages.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.content[0].text)

Authorizations

Authorization
string
header
required

Pass your Baseten API key using either the Api-Key or Bearer scheme: Authorization: Api-Key YOUR_API_KEY or Authorization: Bearer YOUR_API_KEY. The Anthropic SDK's default x-api-key header is not accepted; override default_headers to send Authorization instead.

Body

application/json

Request body for creating a message.

model
string
required

The model slug to use. Find available models at Model APIs.

messages
InputMessage · object[]
required

The conversation history as an ordered list of input messages. Alternating user and assistant roles are expected; the final message must be from the user.

max_tokens
integer
required

The maximum number of tokens to generate in the response. Required by the Messages API. The response may be shorter if it finishes naturally or hits a stop sequence.

Required range: x >= 1
system

A system prompt that sets the model's behavior. Pass either a single string or an array of text content blocks.

temperature
number
default:1

Controls randomness. Lower values are more deterministic. Range: 0 to 1.

Required range: 0 <= x <= 1
top_p
number

Nucleus sampling: only consider tokens with cumulative probability up to this value.

Required range: x <= 1
top_k
integer

Limits token selection to the top K most probable tokens at each step.

Required range: x >= 0
stop_sequences
string[]

Custom text sequences that will stop generation. When a stop sequence is hit, stop_reason is stop_sequence and stop_sequence contains the matched string.

stream
boolean
default:false

If true, the response is streamed as server-sent events. Each event has a type such as message_start, content_block_delta, or message_stop.

tools
ToolDefinition · object[]

A list of tools the model may call. Each tool has a name, description, and input_schema (a JSON Schema object).

tool_choice
ToolChoice · object

Controls which tool (if any) the model must call.

metadata
Metadata · object

An object describing metadata about the request. Supports user_id for abuse detection.

Response

Successful response

The message response returned by the model.

id
string
required

A unique identifier for this message, such as msg_abc123.

type
string
required

The object type, always message.

Allowed value: "message"
role
string
required

The role of the generated message, always assistant.

Allowed value: "assistant"
content
(TextBlock · object | ToolUseBlock · object)[]
required

An array of content blocks generated by the model. Text responses contain a single text block; responses that invoke tools contain tool_use blocks.

A text content block.

model
string
required

The model slug that produced the response.

stop_reason
enum<string>
required

Why the model stopped generating: end_turn (natural stop), max_tokens (hit the max_tokens limit), stop_sequence (matched a stop_sequences entry), or tool_use (model invoked a tool).

Available options:
end_turn,
max_tokens,
stop_sequence,
tool_use
usage
Usage · object
required

Token usage statistics for the request.

stop_sequence
string | null

The stop sequence that was matched, if stop_reason is stop_sequence. Otherwise null.