Prerequisites
- A Dedicated deployment of your model on Baseten.
- A Baseten workspace API key with management scope, exported as
BASETEN_API_KEY. - Completed Frontier Gateway onboarding with your Baseten team.
/v1/gateway/ endpoints used here return 403 to workspaces that aren’t onboarded.
Create an endpoint
An endpoint maps a routing slug to one of your deployments. Your customers call the slug, and the gateway routes the request to the target you set here. The slug has the form{org_prefix}/{name}, where org_prefix is a prefix your organization owns. The Baseten team registers your prefixes during onboarding; registering or updating a prefix isn’t a self-service action.
Create an endpoint with POST /v1/gateway/endpoints. The body takes the slug and a targets list with one Baseten target: the model_id of the deployment that should serve the slug. The response includes the endpoint id and confirms the slug now routes to your deployment; you’ll reference this same slug when you create a group.
Create a group
A group is the resource you create per customer, plan, project, or whichever unit of your organizational hierarchy maps to a billing or access boundary. The group owns an external identifier (your stable ID for this entity), the model slugs it’s allowed to call, and the rate and usage limits enforced on every call. List the slug from the endpoint you created earlier so the group can call it. API keys are minted under the group next. Create a group withPOST /v1/gateway/groups. The request takes a metadata block (display name plus the external identifier), a non-empty models list pairing each model slug with its rate and usage limits, and a hierarchy block declaring the inheritance mode and an optional parent. This example creates a top-level (root) group with independent enforcement. The response is the new group, including the internal id you’ll use as the path parameter when minting keys.
id. You’ll need it when you mint a key. The effective_models block shows the limits the runtime enforces after inheritance; for a root group it matches models exactly. See Rate and usage limits for how this changes once you add a parent.
Mint an API key for the group
Issue a new API key under the group withPOST /v1/gateway/groups/{group_id}/api_keys. The key inherits the group’s effective model set and limits; you don’t configure either on the key itself. The response contains the plaintext key, returned exactly once.
. (here, sky_sCqhBwEy4kPd) is the prefix. You’ll use the prefix, not the full key, when fetching or revoking the key later.
Call your model through the gateway
Use the API key you minted to call your model. Frontier Gateway is OpenAI-compatible, so the OpenAI SDK works with the gateway base URL. ReplaceYOUR_API_KEY in the examples below with the value you saved from the mint-key response.
- Python
- curl
Install the OpenAI SDK:Make a chat completion request:
chat.py
Output
https://inference.baseten.co/v1 today. Once white-label routing is provisioned for your workspace, the base URL becomes the branded domain you configure with your Baseten team, and your downstream customers call your domain instead.
Next steps
- Manage groups and API keys: Build a multi-level hierarchy, mint and revoke keys, and delete groups.
- Rate and usage limits: Tune per-group, per-model thresholds and pick an inheritance mode.
- Billing webhooks: Stream signed per-request usage events into your billing pipeline.