You have a model deployed on Baseten and want to give your own customers access through your branded domain, with credentials you control and usage you meter. Baseten Frontier Gateway is the managed API gateway that makes this possible. It adds a hierarchical group resource model, per-group rate and usage limits with inheritance, billing webhooks, and white-label routing on top of your Dedicated deployment, so your customers call your model through your domain with keys you mint and revoke through the Baseten REST API.Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
Frontier Gateway is enabled for your workspace by a Baseten engineer. To turn it on, talk to us.
How Frontier Gateway works
Frontier Gateway sits on top of an existing Dedicated deployment. You model your customers, plans, and projects as a tree of groups. Each group owns an external identifier (metadata.external_entity_id), the set of model slugs it’s allowed to call, and the rate and usage limits enforced on every call. Groups can nest under a parent group, and limits flow down the tree according to the group’s limit_enforcement mode. You then mint one or more API keys under any group; those keys are what your customer uses. Every key inherits the effective config of its group, so rotating credentials never changes what the customer can spend.
When a request hits the gateway with one of your federated keys, Baseten validates the key, walks up the owning group’s hierarchy to compute effective limits, and enforces them per model slug. Valid requests route to your Dedicated deployment, and the response returns to the caller. For each request, Baseten emits a signed billing event out-of-band to your webhook endpoint with token counts and request metadata, so your billing pipeline runs independently of the inference path.
Key features
- Hierarchical groups: Model your organization however your billing structure fits, whether that’s orgs and projects, plans and customers, or tenants and seats. Groups carry the model set and the limits; keys hang off groups and inherit them. For more information, see Manage groups and API keys.
- Two inheritance modes: Pick an enforcement mode per hierarchy. An independent hierarchy lets children override their parents and meters each group’s usage separately; a cascading hierarchy makes a group’s usage count against every ancestor at once. For more information, see Inheritance modes.
- Per-group, per-model rate and usage limits: Configure
TOKENorREQUESTlimits on each group, scoped per model slug. Every key minted under the group inherits the group’s effective limits. - Billing webhooks: Receive signed per-request token usage events you can pipe into Stripe, Orb, or your own billing system. For more information, see Billing webhooks.
- White-label routing (coming soon): Serve inference traffic from your branded domain so downstream customers never see the Baseten URL. Contact your onboarding engineer for current availability.
Frontier Gateway versus Model APIs
Frontier Gateway and Model APIs are distinct products with separate endpoints. Frontier Gateway management lives under/v1/gateway/ and is gated to Frontier Gateway customers; public Model APIs customers authenticate with their workspace API key and call inference at /v1/chat/completions directly. Use the table below to confirm which product you need.
| Frontier Gateway | Model APIs | |
|---|---|---|
| Who it’s for | AI labs serving their own hosted model to downstream customers | App developers calling a Baseten-hosted open model |
| Authentication | Federated API keys you mint per group | Your workspace API key |
| Compute | Your Dedicated deployment | Shared Baseten infrastructure |
| Documentation | Frontier Gateway | Model APIs |
Next steps
- Get started: Walk through your first group, API key, and inference call.
- Manage groups and API keys: Create groups, build a hierarchy, and mint or revoke keys.
- Rate and usage limits: Control per-group, per-model usage and pick an inheritance mode.
- Billing webhooks: Meter usage by consuming signed per-request events.