baseten auth login or the BASETEN_API_KEY environment variable.
describe
Options
Filter JSON output with a jq expression; implies —output json (or jsonl for streamed commands)
Name of the Model API to describe.
Output formatOne of:
text, json, jsonl, noneUse a specific stored profile for this command, overriding BASETEN_PROFILE and the current profile
Enable verbose logging
Examples
Describe a Model API by nameFilter output with --jq
Print the Model API’s invoke URL
Output
Text mode (--output text): Field-per-line summary of the Model API.
JSON mode (--output json): payload type managementapi.ModelAPI.
list
--all to browse the full visible catalog instead of just the added ones.
Options
Browse the full visible catalog instead of only the Model APIs the workspace has added.
Filter JSON output with a jq expression; implies —output json (or jsonl for streamed commands)
Output formatOne of:
text, json, jsonl, noneUse a specific stored profile for this command, overriding BASETEN_PROFILE and the current profile
Enable verbose logging
Examples
List the Model APIs the workspace has addedFilter output with --jq
Print just the Model API names
Output
Text mode (--output text): Table with columns: NAME, CONTEXT, /1M OUT, ADDED. When no Model APIs match, prints “No Model APIs found.” to stderr.
JSON mode (--output json): payload type cmd.ModelAPIList.
predict
--url, which defaults to the OpenAI chat-completions endpoint on the shared inference host. Override it for other shapes (e.g. /v1/messages, /v1/embeddings) or different hosts.
--content is the simple path: it builds an OpenAI chat-completions body with a single user message and --model as the model, and prints just the assistant’s reply. It is only valid for OpenAI chat URLs and requires --model.
--data and --file send a request body verbatim, so any format the endpoint accepts works (OpenAI, Anthropic, embeddings, custom). The response is written as-is: JSON is pretty-printed, streams and binary bodies are passed through.
Options
Single user message; builds an OpenAI chat-completions request and prints the assistant’s reply. Only valid for OpenAI chat URLs and requires —model.Mutually exclusive with other flags in group
predict-input.Inline request body, sent verbatim.Mutually exclusive with other flags in group
predict-input.Path to a file containing the request body, sent verbatim. Use ’-’ for stdin.Mutually exclusive with other flags in group
predict-input.Filter JSON output with a jq expression; implies —output json (or jsonl for streamed commands)
Name of the Model API. Required with —content, where it sets the request’s model.
Output formatOne of:
text, json, jsonl, noneUse a specific stored profile for this command, overriding BASETEN_PROFILE and the current profile
Endpoint to POST the request to. Defaults to https://inference.baseten.co/v1/chat/completions.
Enable verbose logging
Examples
Send a single user messageFilter output with --jq
Extract the assistant’s message content
Output
Text mode (--output text): With --content, the assistant message text. With --data/--file, the response body as-is (pretty-printed JSON, or a raw stream/binary body).
JSON mode (--output json): payload type cmd.JSONUndefined.
Under --output json, --content emits the full chat-completions response. For --data/--file, a streamed response becomes one JSON record per chunk under --output jsonl, and a binary body is base64-encoded under a ‘body’ key.