# Baseten

## Docs

- [AI tools](https://docs.baseten.co/ai-tools.md): Connect AI tools to Baseten documentation for context-aware assistance with deploying and serving models.
- [Cancel a queued async request.](https://docs.baseten.co/api-reference/cancel-a-queued-async-request.md): Cancels an async request. Only requests with `QUEUED` status may be canceled. Rate limited to 20 requests per second.
- [Get the status of an async request.](https://docs.baseten.co/api-reference/get-the-status-of-an-async-request.md): Returns the current status of an async model or chain request. Rate limited to 20 requests per second.
- [Asynchronously call a named environment of a chain.](https://docs.baseten.co/api-reference/non-regional/asynchronously-call-a-named-environment-of-a-chain.md)
- [Asynchronously call a named environment of a model.](https://docs.baseten.co/api-reference/non-regional/asynchronously-call-a-named-environment-of-a-model.md)
- [Asynchronously call a specific deployment of a chain.](https://docs.baseten.co/api-reference/non-regional/asynchronously-call-a-specific-deployment-of-a-chain.md)
- [Asynchronously call a specific deployment of a model.](https://docs.baseten.co/api-reference/non-regional/asynchronously-call-a-specific-deployment-of-a-model.md)
- [Asynchronously call the development deployment of a chain.](https://docs.baseten.co/api-reference/non-regional/asynchronously-call-the-development-deployment-of-a-chain.md)
- [Asynchronously call the development deployment of a model.](https://docs.baseten.co/api-reference/non-regional/asynchronously-call-the-development-deployment-of-a-model.md)
- [Asynchronously call the production environment of a chain.](https://docs.baseten.co/api-reference/non-regional/asynchronously-call-the-production-environment-of-a-chain.md): Enqueues an asynchronous request for the chain deployment promoted to the production environment.
- [Asynchronously call the production environment of a model.](https://docs.baseten.co/api-reference/non-regional/asynchronously-call-the-production-environment-of-a-model.md): Enqueues an asynchronous predict request for the deployment promoted to the production environment. Returns a request ID that can be used to poll for status or cancel the request.
- [Call a specific chain deployment by deployment ID.](https://docs.baseten.co/api-reference/non-regional/call-a-specific-chain-deployment-by-deployment-id.md)
- [Call a specific deployment of a model by deployment ID.](https://docs.baseten.co/api-reference/non-regional/call-a-specific-deployment-of-a-model-by-deployment-id.md): Sends a synchronous predict request to the specified deployment.
- [Call the chain deployment associated with a specified environment.](https://docs.baseten.co/api-reference/non-regional/call-the-chain-deployment-associated-with-a-specified-environment.md)
- [Call the development deployment of a chain.](https://docs.baseten.co/api-reference/non-regional/call-the-development-deployment-of-a-chain.md)
- [Call the development deployment of a model.](https://docs.baseten.co/api-reference/non-regional/call-the-development-deployment-of-a-model.md): Sends a synchronous predict request to the development deployment.
- [Call the model deployment associated with a specified environment.](https://docs.baseten.co/api-reference/non-regional/call-the-model-deployment-associated-with-a-specified-environment.md): Sends a synchronous predict request to the deployment promoted to the specified environment.
- [Call the production environment of a chain.](https://docs.baseten.co/api-reference/non-regional/call-the-production-environment-of-a-chain.md): Sends a synchronous request to the chain deployment promoted to the production environment. The request body is forwarded to the chain's `run_remote` entrypoint.
- [Call the production environment of a model.](https://docs.baseten.co/api-reference/non-regional/call-the-production-environment-of-a-model.md): Sends a synchronous predict request to the deployment promoted to the production environment. The request body is forwarded directly to the model's `predict` function.
- [Get async queue status for a named environment.](https://docs.baseten.co/api-reference/non-regional/get-async-queue-status-for-a-named-environment.md)
- [Get async queue status for a specific deployment.](https://docs.baseten.co/api-reference/non-regional/get-async-queue-status-for-a-specific-deployment.md)
- [Get async queue status for the development deployment.](https://docs.baseten.co/api-reference/non-regional/get-async-queue-status-for-the-development-deployment.md)
- [Get async queue status for the production environment.](https://docs.baseten.co/api-reference/non-regional/get-async-queue-status-for-the-production-environment.md): Returns the number of queued and in-progress async requests for the deployment promoted to the production environment. Rate limited to 20 requests per second.
- [Wake a named environment of a model.](https://docs.baseten.co/api-reference/non-regional/wake-a-named-environment-of-a-model.md)
- [Wake a specific deployment of a model by deployment ID.](https://docs.baseten.co/api-reference/non-regional/wake-a-specific-deployment-of-a-model-by-deployment-id.md)
- [Wake the development deployment of a model.](https://docs.baseten.co/api-reference/non-regional/wake-the-development-deployment-of-a-model.md)
- [Wake the production environment of a model.](https://docs.baseten.co/api-reference/non-regional/wake-the-production-environment-of-a-model.md): Triggers a wake for the deployment promoted to the production environment. Returns immediately with 202 Accepted.
- [Asynchronously call a regional environment of a chain.](https://docs.baseten.co/api-reference/regional/asynchronously-call-a-regional-environment-of-a-chain.md): Enqueues an asynchronous run_remote request via a regional hostname. The environment is determined by the hostname, not the path.
- [Asynchronously call a regional environment of a model.](https://docs.baseten.co/api-reference/regional/asynchronously-call-a-regional-environment-of-a-model.md): Enqueues an asynchronous predict request via a regional hostname. The environment is determined by the hostname, not the path.
- [Call a regional environment of a chain.](https://docs.baseten.co/api-reference/regional/call-a-regional-environment-of-a-chain.md): Sends a synchronous run_remote request via a regional hostname. The environment is determined by the hostname, not the path.
- [Call a regional environment of a model.](https://docs.baseten.co/api-reference/regional/call-a-regional-environment-of-a-model.md): Sends a synchronous predict request via a regional hostname. The environment is determined by the hostname, not the path.
- [Get async queue status for a regional environment.](https://docs.baseten.co/api-reference/regional/get-async-queue-status-for-a-regional-environment.md)
- [Wake a regional environment of a model.](https://docs.baseten.co/api-reference/regional/wake-a-regional-environment-of-a-model.md)
- [How Baseten works](https://docs.baseten.co/concepts/howbasetenworks.md): The moving parts behind training, deployment, request routing, autoscaling, and environment promotion on Baseten.
- [Why Baseten](https://docs.baseten.co/concepts/whybaseten.md): Production training and inference on dedicated infrastructure, for teams that have outgrown shared API endpoints.
- [Cold starts](https://docs.baseten.co/deployment/autoscaling/cold-starts.md): Learn what makes a cold start slow and how to shrink it for your model.
- [Autoscaling](https://docs.baseten.co/deployment/autoscaling/overview.md): Configure autoscaling to dynamically adjust replicas based on traffic while minimizing idle compute costs.
- [Request lifecycle](https://docs.baseten.co/deployment/autoscaling/request-lifecycle.md): What happens to a request from submission to response, including routing, queuing, the 1200-second sync predict timeout, and error handling.
- [Traffic patterns](https://docs.baseten.co/deployment/autoscaling/traffic-patterns.md): Identify your traffic pattern and configure autoscaling settings to match.
- [CI/CD](https://docs.baseten.co/deployment/ci-cd.md): Automate Truss deployments with GitHub Actions.
- [Concepts](https://docs.baseten.co/deployment/concepts.md): Deployments, environments, resources, autoscaling, and CI/CD on Baseten.
- [Deployments](https://docs.baseten.co/deployment/deployments.md): Deploy, manage, and scale machine learning models with Baseten
- [Environments](https://docs.baseten.co/deployment/environments.md): Manage your model's release cycles with environments.
- [Regional environments](https://docs.baseten.co/deployment/regional-environments.md): Guarantee inference data stays in a specific geographic region with regional environments.
- [Resources](https://docs.baseten.co/deployment/resources.md): Manage and configure model resources
- [Rolling deployments](https://docs.baseten.co/deployment/rolling-deployments.md): Gradually shift traffic to a new deployment with replica-based rolling deployments.
- [Binary I/O](https://docs.baseten.co/development/chain/binaryio.md): Performant serialization of numeric data
- [Concepts](https://docs.baseten.co/development/chain/concepts.md): Glossary of Chains concepts and terminology
- [Deploy](https://docs.baseten.co/development/chain/deploy.md): Deploy your Chain on Baseten
- [Architecture and design](https://docs.baseten.co/development/chain/design.md): How to structure your Chainlets
- [Engine Builder LLM models](https://docs.baseten.co/development/chain/engine-builder-models.md): Engine-Builder LLM models are pre-trained models that are optimized for specific inference tasks.
- [Error handling](https://docs.baseten.co/development/chain/errorhandling.md): Understanding and handling Chains errors
- [Your first Chain](https://docs.baseten.co/development/chain/getting-started.md): Build and deploy two example Chains
- [Invocation](https://docs.baseten.co/development/chain/invocation.md): Call your deployed Chain
- [Local development](https://docs.baseten.co/development/chain/localdev.md): Iterating, Debugging, Testing, Mocking
- [Overview](https://docs.baseten.co/development/chain/overview.md)
- [Streaming](https://docs.baseten.co/development/chain/streaming.md): Streaming outputs, reducing latency, SSEs
- [Truss integration](https://docs.baseten.co/development/chain/stub.md): Integrate deployed Truss models with stubs
- [Subclassing](https://docs.baseten.co/development/chain/subclassing.md): Modularize and re-use Chainlet implementations
- [Watch](https://docs.baseten.co/development/chain/watch.md): Live-patch deployed code
- [Baseten Delivery Network](https://docs.baseten.co/development/model/bdn.md): Optimize cold starts with multi-tier caching and data delivery
- [Build your model](https://docs.baseten.co/development/model/build-your-first-model.md): Deploy a model to Baseten with just a config file. Pick an open-source model from Hugging Face, choose a GPU, and get an endpoint in minutes.
- [Configuration](https://docs.baseten.co/development/model/configuration.md): Configure model dependencies, resources, and build environment in config.yaml
- [Custom Docker containers](https://docs.baseten.co/development/model/custom-server.md): Deploy custom Docker containers to run inference servers like vLLM, SGLang, Triton, or any containerized application.
- [Dependencies](https://docs.baseten.co/development/model/dependencies.md): Declare everything your model needs to build and run: Python packages, build commands, base images, and private registries.
- [Deploy and iterate](https://docs.baseten.co/development/model/deploy-and-iterate.md): Use development deployments with live patching for rapid iteration, then promote to production.
- [Access model environments](https://docs.baseten.co/development/model/environments.md): Configure model behavior based on environment
- [gRPC](https://docs.baseten.co/development/model/grpc.md): Invoke your model over gRPC.
- [Health checks](https://docs.baseten.co/development/model/health-checks.md): Customize the health of your deployments.
- [Cached weights](https://docs.baseten.co/development/model/model-cache.md): Accelerate cold starts and availability by prefetching and caching your weights.
- [The Model class](https://docs.baseten.co/development/model/model-class.md): Write custom Python in model/model.py to control how your model loads, runs inference, and shapes responses.
- [Developing a model on Baseten](https://docs.baseten.co/development/model/overview.md): Package, configure, and iterate on a model with Truss at whatever level of control your model needs.
- [Performance optimization](https://docs.baseten.co/development/model/performance-optimization.md): Optimize model latency, throughput, and cost with Baseten engines
- [Runtime caching](https://docs.baseten.co/development/model/runtime-caching.md): Cache files your model writes at runtime so other replicas reuse them
- [Secrets](https://docs.baseten.co/development/model/secrets.md): Use secrets securely in your models
- [Streaming and endpoints](https://docs.baseten.co/development/model/streaming-and-endpoints.md): Stream model output, expose /v1 HTTP endpoints, and handle raw requests in custom Truss model code.
- [WebSockets](https://docs.baseten.co/development/model/websockets.md): Enable real-time, streaming, bidirectional communication using WebSockets for Truss models and Chains.
- [BEI-Bert](https://docs.baseten.co/engines/bei/bei-bert.md): Bidirectional encoder embeddings with cold-start optimization
- [Configuration reference](https://docs.baseten.co/engines/bei/bei-reference.md): Complete reference config for BEI and BEI-Bert engines
- [Named entity recognition](https://docs.baseten.co/engines/bei/ner.md): Token-level entity classification on BEI-Bert with /predict_tokens
- [Overview](https://docs.baseten.co/engines/bei/overview.md): Production-grade embeddings, reranking, and classification models
- [Advanced features for BIS-LLM](https://docs.baseten.co/engines/bis-llm/advanced-features.md): KV-aware routing, disaggregated serving, and speculative decoding
- [Configuration reference](https://docs.baseten.co/engines/bis-llm/bis-llm-config.md): Complete reference config for v2 inference stack and MoE models
- [Migrate from Engine-Builder-LLM](https://docs.baseten.co/engines/bis-llm/migrate-from-v1.md): Translate a v1 Engine-Builder-LLM configuration to BIS-LLM (v2), including the autoscaling, speculation, and routing changes that aren't just renames
- [Overview](https://docs.baseten.co/engines/bis-llm/overview.md): Token-based autoscaling, KV-aware routing, disaggregated serving, and speculative decoding for MoE and large dense models
- [Custom engine builder](https://docs.baseten.co/engines/engine-builder-llm/custom-engine-builder.md): Implement custom model.py for business logic, logging, and advanced inference patterns
- [Configuration reference](https://docs.baseten.co/engines/engine-builder-llm/engine-builder-config.md): Complete reference config for dense text generation models
- [Speculative decoding](https://docs.baseten.co/engines/engine-builder-llm/lookahead-decoding.md): Lookahead decoding on Engine-Builder-LLM (v1) for code generation and predictable content
- [LoRA support](https://docs.baseten.co/engines/engine-builder-llm/lora-support.md): Multi-LoRA adapters for Engine-Builder-LLM engine
- [Overview](https://docs.baseten.co/engines/engine-builder-llm/overview.md): Dense LLM text generation with lookahead decoding and structured outputs
- [Overview](https://docs.baseten.co/engines/index.md): Inference engines for embeddings, dense LLMs, MoE models, and Enterprise serving
- [Autoscaling engines](https://docs.baseten.co/engines/performance-concepts/autoscaling-engines.md): Engine-specific autoscaling settings for BEI, Engine-Builder-LLM, and BIS-LLM
- [Deploy from cloud storage](https://docs.baseten.co/engines/performance-concepts/cloud-storage-deployment.md): Connect your S3 bucket, GCS bucket, Azure container, or Hugging Face repository to Baseten's TRT-LLM inference engines and deploy without re-uploading weights.
- [Quantization guide](https://docs.baseten.co/engines/performance-concepts/quantization-guide.md): FP8 and FP4 trade-offs and hardware requirements for all engines
- [Serve embeddings with BEI](https://docs.baseten.co/examples/bei.md): Deploy embedding, reranking, and classification models on Baseten Embeddings Inference.
- [Transcribe audio with Chains](https://docs.baseten.co/examples/chains-audio-transcription.md): Process hours of audio in seconds using efficient chunking, distributed inference, and optimized GPU resources.
- [Build a RAG pipeline with Chains](https://docs.baseten.co/examples/chains-build-rag.md): Combine retrieval and generation into a single compound workflow.
- [Create a model with the REST API](https://docs.baseten.co/examples/create-a-model-with-rest.md): Deploy a model archive programmatically using the management API, without the Truss CLI.
- [Customize a model](https://docs.baseten.co/examples/customize-a-model.md): Deploy a model with custom Python code using the Truss Model class.
- [Deploy a Hugging Face model](https://docs.baseten.co/examples/deploy-a-hugging-face-model.md): Deploy Gemma 4 26B on Baseten with vLLM, BDN-cached weights, EAGLE3 speculative decoding, and prefix caching.
- [Build and deploy an LLM](https://docs.baseten.co/examples/deploy-a-llm.md): Package and deploy an LLM with Truss, from model setup to inference.
- [Deploy your first model](https://docs.baseten.co/examples/deploy-your-first-model.md): Deploy an open-source LLM to Baseten with just a config file and get an OpenAI-compatible API endpoint.
- [Deploy a Dockerized model](https://docs.baseten.co/examples/docker.md): Deploy any model in a pre-built Docker container.
- [Generate images with Flux](https://docs.baseten.co/examples/image-generation.md): Deploy Flux Schnell as a text-to-image endpoint.
- [Qwen3 Embedding](https://docs.baseten.co/examples/models/embedding/qwen3-embedding.md): Alibaba's Qwen3 Embedding is an 8B text embedding model that maps text into dense vectors for semantic search, retrieval-augmented generation, clustering, and classification.
- [Qwen3 Reranker](https://docs.baseten.co/examples/models/embedding/qwen3-reranker.md): Alibaba's Qwen3 Reranker is an 8B cross-encoder for high-quality passage reranking in retrieval-augmented generation pipelines.
- [FLUX.1](https://docs.baseten.co/examples/models/image-gen/flux1.md): FLUX.1 recipes: 2 variants (dev, schnell), diffusion-transformer architecture.
- [Gemma 4](https://docs.baseten.co/examples/models/llm/gemma-4.md): Gemma 4 recipes: 4 variants (E2B, E4B, 26B A4B, 31B), Dense and MoE architectures.
- [GLM-4.7](https://docs.baseten.co/examples/models/llm/glm-4.7.md): GLM-4.7 recipes: 2 variants (Standard, Flash), MoE architecture.
- [GLM-5](https://docs.baseten.co/examples/models/llm/glm-5.md): Z.ai's GLM-5 reasoning model, served from an FP8 checkpoint on B200:8.
- [GPT-OSS](https://docs.baseten.co/examples/models/llm/gpt-oss.md): GPT-OSS recipes: 2 variants (20B, 120B), Dense and MoE architectures.
- [Llama 3.1](https://docs.baseten.co/examples/models/llm/llama-3.1.md): Meta's Llama 3.1 8B instruction-tuned model. Runs on a single B200 from NVIDIA's FP8 checkpoint with EAGLE3 speculative decoding for high concurrent throughput.
- [Llama 3.2](https://docs.baseten.co/examples/models/llm/llama-3.2.md): Meta's compact Llama 3.2 instruction-tuned model. Runs on a single H100 40GB for low-cost chat and edge-adjacent workloads.
- [Llama 3.3](https://docs.baseten.co/examples/models/llm/llama-3.3.md): Meta's Llama 3.3 70B instruction-tuned model. Runs on H100:4 through Baseten Inference Stack from NVIDIA's FP8 checkpoint, tuned for low time-to-first-token.
- [Llama 4](https://docs.baseten.co/examples/models/llm/llama-4.md): Meta's Llama 4 Scout is a 17B-active MoE with native multimodal support and a 10M token context window.
- [MiniMax M2.5](https://docs.baseten.co/examples/models/llm/minimax-m2.5.md): Large MoE model with native reasoning and tool calling. Uses the MiniMax-specific append-think reasoning format.
- [Nemotron 3](https://docs.baseten.co/examples/models/llm/nemotron-3.md): NVIDIA's Nemotron 3 Super 120B A12B Mixture-of-Experts model. Runs on B200:2 through Baseten Inference Stack with MTP speculative decoding and the NVFP4-quantized checkpoint, tuned for high-throughput reasoning.
- [Qwen3](https://docs.baseten.co/examples/models/llm/qwen3.md): Sparse MoE model with 235B total parameters (22B active per token). FP8-quantized checkpoint for production-scale reasoning and agentic workflows.
- [Qwen3.5](https://docs.baseten.co/examples/models/llm/qwen3.5.md): Qwen3.5 recipes: 4 variants (4B, 9B, 35B, 122B), Dense, Hybrid MoE, and MoE architectures.
- [Qwen3.6](https://docs.baseten.co/examples/models/llm/qwen3.6.md): Qwen3.6 recipes: 2 variants (27B, 35B-A3B), Dense and Hybrid MoE architectures.
- [Qwen3-ASR](https://docs.baseten.co/examples/models/transcription/qwen3-asr.md): Alibaba's Qwen3-ASR is a compact 1.7B speech-to-text model with multilingual transcription support.
- [Voxtral](https://docs.baseten.co/examples/models/transcription/voxtral.md): Mistral's Voxtral Mini Realtime is a 4B speech-to-text model tuned for real-time streaming transcription.
- [Deploy LLMs with Ollama](https://docs.baseten.co/examples/ollama.md): Run LLMs on Ollama as a custom Docker server.
- [Building with Baseten](https://docs.baseten.co/examples/overview.md)
- [Deploy LLMs with SGLang](https://docs.baseten.co/examples/sglang.md): Run LLMs on SGLang's high-performance serving framework.
- [Stream LLM responses](https://docs.baseten.co/examples/streaming.md): Stream LLM output token by token.
- [Add system packages](https://docs.baseten.co/examples/system-packages.md): Deploy a model with both Python and system dependencies.
- [Deploy LLMs with TensorRT-LLM](https://docs.baseten.co/examples/tensorrt-llm.md): Optimize LLMs for low latency and high throughput.
- [Generate speech with Kokoro](https://docs.baseten.co/examples/text-to-speech.md): Deploy Kokoro as a text-to-speech endpoint.
- [Deploy LLMs with vLLM](https://docs.baseten.co/examples/vllm.md): Run any open-source LLM on vLLM's serving framework.
- [Manage groups and API keys](https://docs.baseten.co/frontier-gateway/api-keys.md): Walk the full lifecycle: create groups, build a hierarchy, mint and revoke API keys, and delete groups when a customer churns.
- [Billing webhooks](https://docs.baseten.co/frontier-gateway/billing-webhooks.md): Receive signed per-request usage events from Frontier Gateway and pipe them into your billing provider out-of-band from the inference path.
- [Calling your model](https://docs.baseten.co/frontier-gateway/calling-your-model.md): Make your first inference call through Baseten Frontier Gateway with a federated API key issued by your AI lab.
- [Manage endpoints](https://docs.baseten.co/frontier-gateway/endpoints.md): Create and manage the endpoints that route Frontier Gateway traffic to your Baseten deployments.
- [Get started](https://docs.baseten.co/frontier-gateway/get-started.md): Create an endpoint, create a group, mint an API key, and call your model through the gateway.
- [Baseten Frontier Gateway](https://docs.baseten.co/frontier-gateway/overview.md): A managed API gateway for AI labs to serve hosted models under a branded URL with hierarchical groups, inherited rate and usage limits, and billing webhooks.
- [Rate and usage limits](https://docs.baseten.co/frontier-gateway/rate-limits.md): Per-group, per-model token and request limits, two inheritance modes, and how Frontier Gateway computes the effective limits the runtime enforces.
- [Async inference](https://docs.baseten.co/inference/async.md): Run asynchronous inference on deployed models
- [Call your model](https://docs.baseten.co/inference/calling-your-model.md): Run inference on deployed models
- [Inference errors](https://docs.baseten.co/inference/errors.md): What each inference error means and where to look next.
- [Function calling](https://docs.baseten.co/inference/function-calling.md): Tool selection and structured function calls with LLMs
- [Configure HTTP clients](https://docs.baseten.co/inference/http-client-configuration.md): Configure connection pooling, retries, and timeouts for reliable inference requests at scale.
- [Integrations](https://docs.baseten.co/inference/integrations.md): Integrate your models with tools and use Baseten anywhere
- [JSON mode](https://docs.baseten.co/inference/json-mode.md): Constrain model output to syntactically valid JSON
- [Deprecation](https://docs.baseten.co/inference/model-apis/deprecation.md): Baseten's deprecation policy for Model APIs
- [Model APIs](https://docs.baseten.co/inference/model-apis/overview.md): OpenAI-compatible endpoints for high-performance LLMs
- [Rate limits and budgets](https://docs.baseten.co/inference/model-apis/rate-limits-and-budgets.md): Rate limits and usage budgets for Model APIs
- [Reasoning](https://docs.baseten.co/inference/model-apis/reasoning.md): Control extended thinking for reasoning-capable models
- [Vision](https://docs.baseten.co/inference/model-apis/vision.md): Send images and videos alongside text to vision-capable models
- [Model I/O in binary](https://docs.baseten.co/inference/output-format/binary.md): Decode and save binary model output
- [Model I/O with files](https://docs.baseten.co/inference/output-format/files.md): Call models by passing a file or URL
- [Overview](https://docs.baseten.co/inference/overview.md): Inference on Baseten: Model APIs, self-deployed models, how responses are delivered, structured outputs, tool calling, and client configuration.
- [Performance client](https://docs.baseten.co/inference/performance-client.md): High-performance client library for embeddings, reranking, classification, and generic batch requests
- [SSH access](https://docs.baseten.co/inference/ssh.md): Connect to running model deployments directly from your terminal with standard SSH.
- [Streaming](https://docs.baseten.co/inference/streaming.md): Return model output token by token as it is generated.
- [Structured outputs](https://docs.baseten.co/inference/structured-outputs.md): JSON schema validation and controlled text generation across all engines
- [Concepts](https://docs.baseten.co/loops/concepts.md): How Loops sessions, trainer servers, sampling servers, and checkpoints fit together.
- [Loops](https://docs.baseten.co/loops/overview.md): A training SDK that supports long sequence length, async RL, and one-click checkpoint deploys on the Baseten Inference Stack.
- [Quickstart](https://docs.baseten.co/loops/quickstart.md): Create a checkpoint with Loops and list it.
- [Supported base models](https://docs.baseten.co/loops/supported-models.md): Hugging Face base models Loops accepts, with sequence-length limits.
- [Tinker compatibility](https://docs.baseten.co/loops/tinker-compatibility.md): Most Tinker code runs on Loops with one install change, apart from paginated checkpoints, auth, and cluster routing.
- [Export to Datadog](https://docs.baseten.co/observability/export-metrics/datadog.md): Export metrics from Baseten to Datadog
- [Export to Grafana Cloud](https://docs.baseten.co/observability/export-metrics/grafana.md): Export metrics from Baseten to Grafana Cloud
- [Export to New Relic](https://docs.baseten.co/observability/export-metrics/new-relic.md): Export metrics from Baseten to New Relic
- [Overview](https://docs.baseten.co/observability/export-metrics/overview.md): Export metrics from Baseten to your observability stack
- [Export to Prometheus](https://docs.baseten.co/observability/export-metrics/prometheus.md): Export metrics from Baseten to Prometheus
- [Metrics support matrix](https://docs.baseten.co/observability/export-metrics/supported-metrics.md): Every metric you can export from Baseten, with its type and labels
- [Status and health](https://docs.baseten.co/observability/health.md): Every model deployment in your Baseten workspace has a status to represent its activity and health.
- [Logs](https://docs.baseten.co/observability/logs.md): Scope logs by environment or deployment, then filter by request ID for individual predictions.
- [Metrics](https://docs.baseten.co/observability/metrics.md): Understand the load and performance of your model
- [Secure model inference](https://docs.baseten.co/observability/security.md): Keeping your models safe and private
- [Tracing](https://docs.baseten.co/observability/tracing.md): Investigate the prediction flow in detail
- [Access control](https://docs.baseten.co/organization/access.md): Manage access to your Baseten organization with role-based access control.
- [API keys](https://docs.baseten.co/organization/api-keys.md): Authenticate requests to Baseten for deployment, inference, and management.
- [Audit logs](https://docs.baseten.co/organization/audit-logs.md): Track configuration and access changes across your Baseten organization, and export audit events to your SIEM.
- [Billing and usage](https://docs.baseten.co/organization/billing.md): How Baseten meters per-minute usage, and how to manage payment, credits, and invoices for your workspace.
- [OpenID Connect (OIDC) authentication](https://docs.baseten.co/organization/oidc.md): Use short-lived OIDC tokens to securely authenticate to cloud resources
- [Organization settings](https://docs.baseten.co/organization/overview.md): Manage your Baseten organization's access, security, and resources.
- [Restricted environments](https://docs.baseten.co/organization/restricted-environments.md): Control access to sensitive environments like production with environment-level permissions.
- [Secrets](https://docs.baseten.co/organization/secrets.md): Store and access sensitive credentials in your deployed models.
- [SSO and SCIM](https://docs.baseten.co/organization/sso-and-scim.md): Authenticate Baseten users through your identity provider and automatically provision accounts, directory groups, and roles.
- [Teams](https://docs.baseten.co/organization/teams.md): Organize your organization into multiple teams with isolated resources and granular access control.
- [Baseten overview](https://docs.baseten.co/overview.md): Baseten helps you train, deploy, and serve AI models at scale with high performance and cost efficiency.
- [Quickstart](https://docs.baseten.co/quickstart.md): Start running inference on Baseten.
- [Truss Push GitHub Action](https://docs.baseten.co/reference/ci/github-action.md): Deploy and validate a Truss model or chain on Baseten from GitHub Actions.
- [baseten api](https://docs.baseten.co/reference/cli/baseten/api.md): Make raw API requests
- [baseten auth](https://docs.baseten.co/reference/cli/baseten/auth.md): Manage authentication
- [baseten model](https://docs.baseten.co/reference/cli/baseten/model.md): Manage Baseten models
- [baseten model-api](https://docs.baseten.co/reference/cli/baseten/model-api.md): Manage Model APIs
- [baseten model deployment](https://docs.baseten.co/reference/cli/baseten/model-deployment.md): Manage deployments of a model
- [baseten model deployment replica](https://docs.baseten.co/reference/cli/baseten/model-deployment-replica.md): Manage replicas of a deployment
- [baseten model environment](https://docs.baseten.co/reference/cli/baseten/model-environment.md): Manage environments of a model
- [baseten org api-key](https://docs.baseten.co/reference/cli/baseten/org-api-key.md): Manage API keys
- [baseten org billing](https://docs.baseten.co/reference/cli/baseten/org-billing.md): View billing information
- [baseten org secret](https://docs.baseten.co/reference/cli/baseten/org-secret.md): Manage secrets
- [Baseten CLI overview](https://docs.baseten.co/reference/cli/baseten/overview.md): Manage your Baseten workspace from the command line: organizations, API keys, secrets, deployment lifecycle, replicas, and raw API access. Designed for automation.
- [baseten truss](https://docs.baseten.co/reference/cli/baseten/truss.md): Run truss commands
- [baseten version](https://docs.baseten.co/reference/cli/baseten/version.md): Print the baseten CLI version
- [Chains CLI reference](https://docs.baseten.co/reference/cli/chains/chains-cli.md): Deploy, manage, and develop Chains using the Truss CLI.
- [Baseten command-line tools](https://docs.baseten.co/reference/cli/index.md): Baseten ships two open-source CLIs: Truss for authoring model code and the Baseten CLI for managing your workspace. This page covers what each one is for, when they overlap, and how to use them together.
- [Loops CLI reference](https://docs.baseten.co/reference/cli/loops/loops-cli.md): Deploy and inspect Loops sessions, runs, samplers, and checkpoints using the Truss CLI.
- [Training CLI reference](https://docs.baseten.co/reference/cli/training/training-cli.md): Deploy, manage, and monitor training jobs using the Truss CLI.
- [truss auth](https://docs.baseten.co/reference/cli/truss/auth.md): Manage authentication with Baseten remotes.
- [truss cleanup](https://docs.baseten.co/reference/cli/truss/cleanup.md): Clean up Truss data.
- [truss configure](https://docs.baseten.co/reference/cli/truss/configure.md): Configure Truss settings.
- [truss container](https://docs.baseten.co/reference/cli/truss/container.md): Run and manage Truss containers locally.
- [truss download](https://docs.baseten.co/reference/cli/truss/download.md): Download the Truss for a deployed model.
- [truss image](https://docs.baseten.co/reference/cli/truss/image.md): Build and manage Truss Docker images.
- [truss init](https://docs.baseten.co/reference/cli/truss/init.md): Create a new Truss project.
- [truss login](https://docs.baseten.co/reference/cli/truss/login.md): Authenticate with Baseten.
- [truss migrate](https://docs.baseten.co/reference/cli/truss/migrate.md): Migrate model_cache and external_data to the unified weights API.
- [truss model-config](https://docs.baseten.co/reference/cli/truss/model-config.md): Fetch the config of a deployed model.
- [truss model-logs](https://docs.baseten.co/reference/cli/truss/model-logs.md): Fetch logs for a deployed model.
- [Truss CLI reference](https://docs.baseten.co/reference/cli/truss/overview.md): Deploy, manage, and develop models using the Truss CLI.
- [truss push](https://docs.baseten.co/reference/cli/truss/push.md): Deploy a model to Baseten.
- [truss run-python](https://docs.baseten.co/reference/cli/truss/run-python.md): Run a Python script in the Truss environment.
- [truss ssh](https://docs.baseten.co/reference/cli/truss/ssh.md): SSH access to Baseten workloads.
- [truss upgrade](https://docs.baseten.co/reference/cli/truss/upgrade.md): Upgrade the truss package to the latest or a specified version.
- [truss watch](https://docs.baseten.co/reference/cli/truss/watch.md): Live reload during development.
- [truss whoami](https://docs.baseten.co/reference/cli/truss/whoami.md): Show user information.
- [Create an API key](https://docs.baseten.co/reference/gateway/api-keys/create-an-api-key.md): Mint a federated API key under a Frontier Gateway group. The plaintext key is returned exactly once.
- [Get an API key](https://docs.baseten.co/reference/gateway/api-keys/get-an-api-key.md): Fetch metadata for one federated API key by its prefix. The plaintext key is never returned after creation.
- [List API keys for a group](https://docs.baseten.co/reference/gateway/api-keys/list-api-keys-for-a-group.md): List the federated API keys minted under a Frontier Gateway group. Cursor-paginated.
- [Register an API key](https://docs.baseten.co/reference/gateway/api-keys/register-an-api-key.md): Attach a caller-supplied API key to a Frontier Gateway group so downstream consumers can continue using a key they already issued.
- [Revoke an API key](https://docs.baseten.co/reference/gateway/api-keys/revoke-an-api-key.md): Revoke a federated API key by its prefix. Other keys under the same group are unaffected.
- [Billing webhooks](https://docs.baseten.co/reference/gateway/billing-webhooks.md): Payload, header, and signature reference for Frontier Gateway billing webhooks.
- [Create an endpoint](https://docs.baseten.co/reference/gateway/endpoints/create-an-endpoint.md): Create a Frontier Gateway endpoint: a routing slug and the Baseten deployment it points to.
- [Delete an endpoint](https://docs.baseten.co/reference/gateway/endpoints/delete-an-endpoint.md): Delete a Frontier Gateway endpoint and stop routing its slug.
- [Get an endpoint](https://docs.baseten.co/reference/gateway/endpoints/get-an-endpoint.md): Retrieve a single Frontier Gateway endpoint by its ID.
- [List endpoints](https://docs.baseten.co/reference/gateway/endpoints/list-endpoints.md): List every Frontier Gateway endpoint in your workspace.
- [Update an endpoint](https://docs.baseten.co/reference/gateway/endpoints/replace-endpoint-targets.md): Update a Frontier Gateway endpoint's slug or targets.
- [Create a group](https://docs.baseten.co/reference/gateway/groups/create-a-group.md): Create a Frontier Gateway group with its model set, per-model limits, and a place in the hierarchy.
- [Delete a group](https://docs.baseten.co/reference/gateway/groups/delete-a-group.md): Delete a Frontier Gateway group, recursively remove its descendants, and revoke every key in the subtree.
- [Get a group](https://docs.baseten.co/reference/gateway/groups/get-a-group.md): Fetch a single Frontier Gateway group by its internal id, including its effective limits after inheritance.
- [Get group usage](https://docs.baseten.co/reference/gateway/groups/get-group-usage.md): Read current-window consumption against the usage limits configured on a Frontier Gateway group.
- [List groups](https://docs.baseten.co/reference/gateway/groups/list-groups.md): List Frontier Gateway groups in your workspace. Cursor-paginated, with optional lookup by external identifier.
- [Update a group](https://docs.baseten.co/reference/gateway/groups/update-a-group.md): Update a Frontier Gateway group's display name or model configuration. Hierarchy and enforcement mode are immutable.
- [Overview](https://docs.baseten.co/reference/gateway/overview.md): Manage Frontier Gateway endpoints, groups, and federated API keys through the Baseten REST API.
- [Chat Completions](https://docs.baseten.co/reference/inference-api/chat-completions.md): Create chat completions using Baseten Model APIs, an OpenAI-compatible endpoint for managed LLMs.
- [Messages](https://docs.baseten.co/reference/inference-api/messages.md): Create Anthropic Messages API requests against Baseten Model APIs.
- [Overview](https://docs.baseten.co/reference/inference-api/overview.md): Baseten provides two ways to call models: Model APIs for managed LLMs and deployed model endpoints for custom models and chains.
- [Websocket deployment](https://docs.baseten.co/reference/inference-api/predict-endpoints/deployment-websocket.md): Connect over WebSocket to a specific deployment.
- [Websocket development](https://docs.baseten.co/reference/inference-api/predict-endpoints/development-websocket.md): Connect over WebSocket to the development deployment of a model or chain.
- [Websocket environment](https://docs.baseten.co/reference/inference-api/predict-endpoints/environments-websocket.md): Connect over WebSocket to the deployment associated with an environment.
- [Transcribe Streaming Audio](https://docs.baseten.co/reference/inference-api/predict-endpoints/streaming-transcription-api.md): Transcribe audio in real time over a WebSocket connection.
- [Transcribe Pre-Recorded Audio](https://docs.baseten.co/reference/inference-api/predict-endpoints/transcription-api.md): Transcribe a pre-recorded audio file using a deployed transcription model.
- [Get checkpoint files](https://docs.baseten.co/reference/loops-api/checkpoints/get-checkpoint-files.md): Get presigned URLs for the files under a Loops checkpoint. Returns a paginated list.
- [List checkpoints](https://docs.baseten.co/reference/loops-api/checkpoints/list-checkpoints.md): List Loops checkpoints filtered by run id, base model, or bt:// URI. Provide exactly one filter.
- [Validate a checkpoint](https://docs.baseten.co/reference/loops-api/checkpoints/validate-a-checkpoint.md): Whether the caller can manage and use this checkpoint.
- [Deactivate a deployment](https://docs.baseten.co/reference/loops-api/deployments/deactivate-a-deployment.md): Shut down a Loops deployment by ID. Saved checkpoints remain accessible. Resolving base_model -> deployment_id is the caller's responsibility — list deployments and pick the active one.
- [Get a deployment](https://docs.baseten.co/reference/loops-api/deployments/get-a-deployment.md): Fetch a Loops deployment by ID, including its latest status.
- [Get deployment logs](https://docs.baseten.co/reference/loops-api/deployments/get-deployment-logs.md): Fetch logs from the trainer pods of a Loops deployment. Visible to any member of the deployment's team.
- [Get deployment metrics](https://docs.baseten.co/reference/loops-api/deployments/get-deployment-metrics.md): Returns per-node GPU/CPU/memory utilization and Knative queue-proxy request rate / concurrency / latency for the trainer pods. The sampler half of a Loops deployment is an OracleVersion and uses the existing model-metrics endpoint.
- [List deployments](https://docs.baseten.co/reference/loops-api/deployments/list-deployments.md): List the caller's Loops deployments. Returns every deployment regardless of status; clients filter terminal states.
- [Loops API reference](https://docs.baseten.co/reference/loops-api/overview.md): HTTP routes for Loops sessions, runs, samplers, checkpoints, and deployments.
- [Create a run](https://docs.baseten.co/reference/loops-api/runs/create-a-run.md): Creates a Loops run with an associated sampler in the given session.
- [Get a run](https://docs.baseten.co/reference/loops-api/runs/get-a-run.md): Fetch a Loops run by ID.
- [List runs](https://docs.baseten.co/reference/loops-api/runs/list-runs.md): List Loops runs visible to the requesting user, optionally filtered by run id and/or base model.
- [Create a sampler](https://docs.baseten.co/reference/loops-api/samplers/create-a-sampler.md): Creates a standalone Loops sampler not linked to a run.
- [Get a sampler](https://docs.baseten.co/reference/loops-api/samplers/get-a-sampler.md): Fetch a Loops sampler by ID.
- [List samplers](https://docs.baseten.co/reference/loops-api/samplers/list-samplers.md): List Loops samplers visible to the requesting user.
- [Get server capabilities](https://docs.baseten.co/reference/loops-api/server/get-capabilities.md): Returns the list of models supported by the Loops server, including each model's maximum context length.
- [Create a session](https://docs.baseten.co/reference/loops-api/sessions/create-a-session.md): Creates a Loops session scoped to the calling org.
- [Get a session](https://docs.baseten.co/reference/loops-api/sessions/get-a-session.md): Fetch a Loops session by ID.
- [Create an API key](https://docs.baseten.co/reference/management-api/api-keys/creates-an-api-key.md): Creates an API key with the provided name and type. The API key is returned in the response.
- [Delete an API key](https://docs.baseten.co/reference/management-api/api-keys/delete-an-api-key.md): Deletes an API key by prefix and returns info about the API key.
- [Get all API keys](https://docs.baseten.co/reference/management-api/api-keys/lists-the-users-api-keys.md): Lists all API keys your account has access to.
- [Get billing usage summary](https://docs.baseten.co/reference/management-api/billing/gets-billing-usage-summary-for-a-date-range.md): Returns billing usage data within the specified date range. Includes dedicated model serving, training, and model APIs usage. The date range must not exceed 31 days.
- [Delete chains](https://docs.baseten.co/reference/management-api/chains/deletes-a-chain-by-id.md)
- [By ID](https://docs.baseten.co/reference/management-api/chains/gets-a-chain-by-id.md)
- [All chains](https://docs.baseten.co/reference/management-api/chains/gets-all-chains.md)
- [Any deployment by ID](https://docs.baseten.co/reference/management-api/deployments/activate/activates-a-deployment.md): Activates an inactive deployment and returns the activation status.
- [Activate environment deployment](https://docs.baseten.co/reference/management-api/deployments/activate/activates-a-deployment-associated-with-an-environment.md): Activates an inactive deployment associated with an environment and returns the activation status.
- [Development deployment](https://docs.baseten.co/reference/management-api/deployments/activate/activates-a-development-deployment.md): Activates an inactive development deployment and returns the activation status.
- [Activate production deployment](https://docs.baseten.co/reference/management-api/deployments/activate/activates-production-deployment.md): Activates an inactive production deployment and returns the activation status.
- [Add a deployment to a model](https://docs.baseten.co/reference/management-api/deployments/adds-a-deployment-to-a-model.md)
- [Update chainlet environment's autoscaling settings](https://docs.baseten.co/reference/management-api/deployments/autoscaling/update-a-chainlet-environments-autoscaling-settings.md): Updates a chainlet environment's autoscaling settings and returns the updated chainlet environment settings.
- [Any model deployment by ID](https://docs.baseten.co/reference/management-api/deployments/autoscaling/updates-a-deployments-autoscaling-settings.md): Updates a deployment's autoscaling settings and returns the update status.
- [Development model deployment](https://docs.baseten.co/reference/management-api/deployments/autoscaling/updates-a-development-deployments-autoscaling-settings.md): Updates a development deployment's autoscaling settings and returns the update status.
- [Update production deployment autoscaling settings](https://docs.baseten.co/reference/management-api/deployments/autoscaling/updates-production-deployment-autoscaling-settings.md): Updates a production deployment's autoscaling settings and returns the update status.
- [Chain deployment](https://docs.baseten.co/reference/management-api/deployments/deactivate/deactivates-a-chain-deployment.md): Deactivates a chain deployment and returns the deactivation status.
- [Any deployment by ID](https://docs.baseten.co/reference/management-api/deployments/deactivate/deactivates-a-deployment.md): Deactivates a deployment and returns the deactivation status.
- [Deactivate environment deployment](https://docs.baseten.co/reference/management-api/deployments/deactivate/deactivates-a-deployment-associated-with-an-environment.md): Deactivates a deployment associated with an environment and returns the deactivation status.
- [Development deployment](https://docs.baseten.co/reference/management-api/deployments/deactivate/deactivates-a-development-deployment.md): Deactivates a development deployment and returns the deactivation status.
- [Deactivate production deployment](https://docs.baseten.co/reference/management-api/deployments/deactivate/deactivates-production-deployment.md): Deactivates a production deployment and returns the deactivation status.
- [Delete chain deployment](https://docs.baseten.co/reference/management-api/deployments/deletes-a-chain-deployment-by-id.md)
- [Delete model deployments](https://docs.baseten.co/reference/management-api/deployments/deletes-a-models-deployment-by-id.md): Deletes a model's deployment by ID and returns the tombstone of the deployment.
- [Get model deployment config](https://docs.baseten.co/reference/management-api/deployments/get-deployment-config.md): Returns the deployment's config. `output_format` query param picks the shape: 'raw' (config.yaml text), 'parsed' (dict with defaults), or 'both' (default).
- [Get model deployment download URL](https://docs.baseten.co/reference/management-api/deployments/get-deployment-download-url.md): Gets a presigned URL to download the truss tar file for a deployment.
- [Get model deployment logs](https://docs.baseten.co/reference/management-api/deployments/get-deployment-logs.md): Gets all the logs for a model deployment in the given time range.
- [Get model deployment metrics](https://docs.baseten.co/reference/management-api/deployments/get-deployment-metrics.md): Gets the metrics for a model deployment in the given time range.
- [Any chain deployment by ID](https://docs.baseten.co/reference/management-api/deployments/gets-a-chain-deployment-by-id.md)
- [Any model deployment by ID](https://docs.baseten.co/reference/management-api/deployments/gets-a-models-deployment-by-id.md): Gets a model's deployment by ID and returns the deployment.
- [Development model deployment](https://docs.baseten.co/reference/management-api/deployments/gets-a-models-development-deployment.md): Gets a model's development deployment and returns the deployment.
- [Production model deployment](https://docs.baseten.co/reference/management-api/deployments/gets-a-models-production-deployment.md): Gets a model's production deployment and returns the deployment.
- [Get all chain deployments](https://docs.baseten.co/reference/management-api/deployments/gets-all-chain-deployments.md)
- [Get all model deployments](https://docs.baseten.co/reference/management-api/deployments/gets-all-deployments-of-a-model.md)
- [Cancel model promotion](https://docs.baseten.co/reference/management-api/deployments/promote/cancel-promotion.md): Cancels an ongoing promotion to an environment and returns the cancellation status.
- [Force cancel rolling deployment](https://docs.baseten.co/reference/management-api/deployments/promote/force-cancel-promotion.md): Immediately cancels an in-progress rolling promotion and triggers rollback to the previous version.
- [Force roll forward promotion](https://docs.baseten.co/reference/management-api/deployments/promote/force-roll-forward-promotion.md): Immediately completes the rolling promotion, shifting all traffic to the new version. This works even if the promotion is in the process of rolling back.
- [Pause rolling deployment](https://docs.baseten.co/reference/management-api/deployments/promote/pause-promotion.md): Pauses an in-progress rolling promotion after the current step completes. No further scaling changes are made until resumed.
- [Promote to chain environment](https://docs.baseten.co/reference/management-api/deployments/promote/promotes-a-chain-deployment-to-an-environment.md): Promotes an existing chain deployment to an environment and returns the promoted chain deployment.
- [Promote to model environment](https://docs.baseten.co/reference/management-api/deployments/promote/promotes-a-deployment-to-an-environment.md): Promotes an existing deployment to an environment and returns the promoted deployment.
- [Any model deployment by ID](https://docs.baseten.co/reference/management-api/deployments/promote/promotes-a-deployment-to-production.md): Promotes an existing deployment to production and returns the same deployment.
- [Development model deployment](https://docs.baseten.co/reference/management-api/deployments/promote/promotes-a-development-deployment-to-production.md): Creates a new production deployment from the development deployment, the currently building deployment is returned.
- [Resume rolling deployment](https://docs.baseten.co/reference/management-api/deployments/promote/resume-promotion.md): Resumes a paused rolling promotion, continuing from where it was paused.
- [Any deployment by ID](https://docs.baseten.co/reference/management-api/deployments/retry/retries-a-deployment.md): Retries a failed deployment and returns the retry status and updated deployment.
- [Development deployment](https://docs.baseten.co/reference/management-api/deployments/retry/retries-a-development-deployment.md): Retries a failed development deployment and returns the retry status and updated deployment.
- [Production deployment](https://docs.baseten.co/reference/management-api/deployments/retry/retries-production-deployment.md): Retries a failed production deployment and returns the retry status and updated deployment.
- [Terminate deployment replica](https://docs.baseten.co/reference/management-api/deployments/terminates-deployment-replica.md): Terminates a deployment replica and returns the termination status.
- [Create Chain environment](https://docs.baseten.co/reference/management-api/environments/create-a-chain-environment.md): Create a chain environment. Returns the resulting environment.
- [Create environment](https://docs.baseten.co/reference/management-api/environments/create-an-environment.md): Creates an environment for the specified model and returns the environment.
- [Get Chain environment](https://docs.baseten.co/reference/management-api/environments/get-a-chain-environments-details.md): Gets a chain environment's details and returns the chain environment.
- [Get all Chain environments](https://docs.baseten.co/reference/management-api/environments/get-all-chain-environments.md): Gets all chain environments for a given chain
- [Get all environments](https://docs.baseten.co/reference/management-api/environments/get-all-environments.md): Gets all environments for a given model
- [Get environment](https://docs.baseten.co/reference/management-api/environments/get-an-environments-details.md): Gets an environment's details and returns the environment.
- [Update Chain environment](https://docs.baseten.co/reference/management-api/environments/update-a-chain-environments-settings.md): Update a chain environment's settings and returns the chain environment.
- [Update chainlet environment's instance type](https://docs.baseten.co/reference/management-api/environments/update-a-chainlet-environments-instance-type-settings.md): Updates a chainlet environment's instance type settings. The chainlet environment setting must exist. When updated, a new chain deployment is created and deployed. It is promoted to the chain environment according to promotion settings on the environment.
- [Update model environment](https://docs.baseten.co/reference/management-api/environments/update-an-environments-settings.md): Asynchronously updates an environment's settings. Poll the GET endpoint for the applied state.
- [All instance types](https://docs.baseten.co/reference/management-api/instance-types/gets-all-instance-types.md)
- [Instance type prices](https://docs.baseten.co/reference/management-api/instance-types/gets-instance-type-prices.md)
- [By name](https://docs.baseten.co/reference/management-api/model-apis/gets-a-model-api-by-name.md): Fetch a Model API by name, with workspace overlay when added.
- [All Model APIs](https://docs.baseten.co/reference/management-api/model-apis/gets-all-model-apis.md): List Model APIs visible to the caller. By default returns the full catalog; pass `added_only=true` to restrict to Model APIs the workspace has added.
- [Create a model from a source](https://docs.baseten.co/reference/management-api/models/creates-a-model-from-a-source.md): Creates a new model in the caller's organization. The `source` field selects how the model is constructed (currently: `library_listing` — fork an accessible listing from `GET /v1/library_models`). The deployment isn't instantly ready — poll GET endpoint until status is ACTIVE.
- [Delete models](https://docs.baseten.co/reference/management-api/models/deletes-a-model-by-id.md)
- [By ID](https://docs.baseten.co/reference/management-api/models/gets-a-model-by-id.md)
- [All models](https://docs.baseten.co/reference/management-api/models/gets-all-models.md)
- [Prepare a model upload](https://docs.baseten.co/reference/management-api/models/prepare-model-upload.md)
- [Overview](https://docs.baseten.co/reference/management-api/overview.md): Manage models and deployments with the Baseten management API. It supports monitoring, CI/CD, and automation at both the model and workspace levels.
- [Rate limits](https://docs.baseten.co/reference/management-api/rate-limits.md): Rate limits, response shape, and retry handling for the Baseten management API.
- [Get all secrets](https://docs.baseten.co/reference/management-api/secrets/gets-all-secrets.md)
- [Upsert a secret](https://docs.baseten.co/reference/management-api/secrets/upserts-a-secret.md): Creates or updates a secret by name. Scoped to the caller's primary team — use the team-scoped variant to target a specific team.
- [Create a team API key](https://docs.baseten.co/reference/management-api/teams/creates-a-team-api-key.md): Creates a team API key with the provided name and type. The API key is returned in the response.
- [Create a team training project](https://docs.baseten.co/reference/management-api/teams/creates-a-team-training-project.md): Upserts a training project with the specified metadata for a team.
- [Get all team secrets](https://docs.baseten.co/reference/management-api/teams/gets-all-team-secrets.md)
- [List all teams](https://docs.baseten.co/reference/management-api/teams/lists-all-teams.md): Returns a list of all teams the authenticated user has access to.
- [Upsert a team secret](https://docs.baseten.co/reference/management-api/teams/upserts-a-team-secret.md): Creates a new secret or updates an existing secret if one with the provided name already exists. The name and creation date of the created or updated secret is returned. This secret belongs to the specified team
- [Reference documentation](https://docs.baseten.co/reference/overview.md): For deploying, managing, and interacting with machine learning models on Baseten.
- [Chains SDK Reference](https://docs.baseten.co/reference/sdk/chains.md): Python SDK Reference for Chains
- [Errors](https://docs.baseten.co/reference/sdk/loops/errors.md): The Loops SDK exception types and when each is raised.
- [Loops SDK reference](https://docs.baseten.co/reference/sdk/loops/overview.md): Python client for Loops: ServiceClient, TrainingClient, SamplingClient, types, and errors.
- [SamplingClient](https://docs.baseten.co/reference/sdk/loops/sampling-client.md): Generate completions from current or version-pinned Loops weights.
- [ServiceClient](https://docs.baseten.co/reference/sdk/loops/service-client.md): Provision trainer and sampling servers, manage the Loops session, and list checkpoints.
- [TrainingClient](https://docs.baseten.co/reference/sdk/loops/training-client.md): Run forward and backward passes, optimizer steps, and publish weights against a live Loops trainer.
- [Types](https://docs.baseten.co/reference/sdk/loops/types.md): Training inputs, configuration, and result handles passed to and from the Loops clients.
- [Training SDK](https://docs.baseten.co/reference/sdk/training.md): API reference for the Baseten training SDK.
- [truss.load](https://docs.baseten.co/reference/sdk/truss/load.md)
- [truss.login](https://docs.baseten.co/reference/sdk/truss/login.md)
- [ModelDeployment](https://docs.baseten.co/reference/sdk/truss/model-deployment.md): The object returned by truss.push().
- [Truss SDK Reference](https://docs.baseten.co/reference/sdk/truss/overview.md): Python SDK for deploying and managing models with Truss.
- [truss.push](https://docs.baseten.co/reference/sdk/truss/push.md)
- [truss.whoami](https://docs.baseten.co/reference/sdk/truss/whoami.md)
- [Create training job](https://docs.baseten.co/reference/training-api/create-training-job.md): Creates a training job with the specified configuration.
- [Create training project](https://docs.baseten.co/reference/training-api/create-training-project.md): Upserts a training project with the specified metadata.
- [Delete training job](https://docs.baseten.co/reference/training-api/delete-training-job.md): Deletes a training job. Stops it first if still running.
- [Delete training project](https://docs.baseten.co/reference/training-api/delete-training-project.md): Deletes a training project and all associated training jobs.
- [Download training job source code](https://docs.baseten.co/reference/training-api/download-training-job.md): Get the uploaded training job as a S3 Artifact
- [Get auth codes for training job](https://docs.baseten.co/reference/training-api/get-auth-codes-for-training-job.md): Get authentication codes for all nodes of a training job's interactive sessions.
- [Get training job](https://docs.baseten.co/reference/training-api/get-training-job.md): Get the details of an existing training job.
- [Get training job checkpoint files](https://docs.baseten.co/reference/training-api/get-training-job-checkpoint-files.md): Get presigned URLs for all checkpoint files for a training job.
- [List training job checkpoints](https://docs.baseten.co/reference/training-api/get-training-job-checkpoints.md): Get the checkpoints for a training job.
- [Get training job logs](https://docs.baseten.co/reference/training-api/get-training-job-logs.md): Get the logs for a training job with the provided filters.
- [Get training job metrics](https://docs.baseten.co/reference/training-api/get-training-job-metrics.md): Get the metrics for a training job.
- [Get training project](https://docs.baseten.co/reference/training-api/get-training-project.md): Get the details of an existing training project.
- [Get training project cache summary](https://docs.baseten.co/reference/training-api/get-training-project-cache-summary.md): Get the cache summary for the most recent training job in the project.
- [List training projects](https://docs.baseten.co/reference/training-api/get-training-projects.md): List all training projects for the organization.
- [List training jobs](https://docs.baseten.co/reference/training-api/list-training-jobs.md): List all training jobs for the training project.
- [Overview](https://docs.baseten.co/reference/training-api/overview.md): Programmatically manage Baseten Training resources.
- [Recreate training job](https://docs.baseten.co/reference/training-api/recreate-training-job.md): Create a new training job with the same configuration as an existing training job.
- [Search training jobs](https://docs.baseten.co/reference/training-api/search-training-jobs.md): Search training jobs for the organization.
- [Stop training job](https://docs.baseten.co/reference/training-api/stop-training-job.md): Stops a training job.
- [Truss configuration](https://docs.baseten.co/reference/truss-configuration.md): Set your model resources, dependencies, and more
- [Baseten platform status](https://docs.baseten.co/status/status.md): Current operational status of Baseten's services.
- [Building blocks](https://docs.baseten.co/training/concepts/basics.md): Learn how to get up and running on Baseten Training
- [Cache](https://docs.baseten.co/training/concepts/cache.md): Learn how to use the training cache to speed up your training iterations by persisting data between jobs.
- [Checkpoints](https://docs.baseten.co/training/concepts/checkpoints.md): Learn how to use Baseten's checkpointing feature to manage model checkpoints and avoid disk errors during training.
- [Multinode training](https://docs.baseten.co/training/concepts/multinode.md): Learn how to configure and run multinode training jobs with Baseten Training.
- [Storage and data ingestion](https://docs.baseten.co/training/concepts/storage.md): Load model weights and training data into Baseten training containers through BDN, S3, Hugging Face, and GCS.
- [Deploy with optimized inference engines](https://docs.baseten.co/training/deploy-with-engine-builder.md): Deploy model checkpoints from Baseten Training directly to an inference engine without downloading or re-uploading weights.
- [Serving your trained model](https://docs.baseten.co/training/deployment.md): How to deploy checkpoints from Baseten Training jobs as usable models.
- [Get started](https://docs.baseten.co/training/getting-started.md): Run your first training job and deploy it to production.
- [VS Code and Cursor remote tunnels](https://docs.baseten.co/training/interactive-sessions.md): Connect to training containers for remote debugging and development through VS Code or Cursor Remote Tunnels.
- [Lifecycle](https://docs.baseten.co/training/lifecycle.md): Understanding the different states and transitions in a Baseten training job's lifecycle.
- [Loading checkpoints](https://docs.baseten.co/training/loading.md): Resume training from existing checkpoints to continue where you left off.
- [Management](https://docs.baseten.co/training/management.md): How to monitor, manage, and interact with your Baseten Training projects and jobs.
- [Training on Baseten](https://docs.baseten.co/training/overview.md): Train custom models with developer-first training infrastructure on Baseten.
- [Remote access](https://docs.baseten.co/training/remote-access.md): Connect to running training jobs from your local machine to debug, inspect state, and develop interactively.
- [Slurm workstations](https://docs.baseten.co/training/slurm.md): Launch a multi-node Slurm cluster on Baseten training infrastructure with a single CLI command.
- [SSH access](https://docs.baseten.co/training/ssh.md): Connect to training containers directly from your terminal with standard SSH.
- [Deployments](https://docs.baseten.co/troubleshooting/deployments.md): Troubleshoot common problems during model deployment
- [Inference](https://docs.baseten.co/troubleshooting/inference.md): Troubleshoot common problems during model inference

## OpenAPI Specs

- [management-api-spec](https://docs.baseten.co/reference/management-api-spec.json)
- [messages-openapi-spec](https://docs.baseten.co/reference/inference-api/messages-openapi-spec.json)
- [llm-openapi-spec](https://docs.baseten.co/reference/inference-api/llm-openapi-spec.json)
- [inference-api-spec](https://docs.baseten.co/reference/inference-api/inference-api-spec.json)
- [meta](https://docs.baseten.co/styles/proselint/meta.json)
- [Very](https://docs.baseten.co/styles/proselint/Very.yml)
- [Uncomparables](https://docs.baseten.co/styles/proselint/Uncomparables.yml)
- [Typography](https://docs.baseten.co/styles/proselint/Typography.yml)
- [Spelling](https://docs.baseten.co/styles/proselint/Spelling.yml)
- [Skunked](https://docs.baseten.co/styles/proselint/Skunked.yml)
- [RASSyndrome](https://docs.baseten.co/styles/proselint/RASSyndrome.yml)
- [P-Value](https://docs.baseten.co/styles/proselint/P-Value.yml)
- [Oxymorons](https://docs.baseten.co/styles/proselint/Oxymorons.yml)
- [Nonwords](https://docs.baseten.co/styles/proselint/Nonwords.yml)
- [Needless](https://docs.baseten.co/styles/proselint/Needless.yml)
- [Malapropisms](https://docs.baseten.co/styles/proselint/Malapropisms.yml)
- [LGBTTerms](https://docs.baseten.co/styles/proselint/LGBTTerms.yml)
- [LGBTOffensive](https://docs.baseten.co/styles/proselint/LGBTOffensive.yml)
- [Jargon](https://docs.baseten.co/styles/proselint/Jargon.yml)
- [Hyperbole](https://docs.baseten.co/styles/proselint/Hyperbole.yml)
- [Hedging](https://docs.baseten.co/styles/proselint/Hedging.yml)
- [GroupTerms](https://docs.baseten.co/styles/proselint/GroupTerms.yml)
- [GenderBias](https://docs.baseten.co/styles/proselint/GenderBias.yml)
- [Diacritical](https://docs.baseten.co/styles/proselint/Diacritical.yml)
- [DenizenLabels](https://docs.baseten.co/styles/proselint/DenizenLabels.yml)
- [DateSpacing](https://docs.baseten.co/styles/proselint/DateSpacing.yml)
- [DateRedundancy](https://docs.baseten.co/styles/proselint/DateRedundancy.yml)
- [DateMidnight](https://docs.baseten.co/styles/proselint/DateMidnight.yml)
- [DateCase](https://docs.baseten.co/styles/proselint/DateCase.yml)
- [Cursing](https://docs.baseten.co/styles/proselint/Cursing.yml)
- [Currency](https://docs.baseten.co/styles/proselint/Currency.yml)
- [CorporateSpeak](https://docs.baseten.co/styles/proselint/CorporateSpeak.yml)
- [Cliches](https://docs.baseten.co/styles/proselint/Cliches.yml)
- [But](https://docs.baseten.co/styles/proselint/But.yml)
- [Archaisms](https://docs.baseten.co/styles/proselint/Archaisms.yml)
- [Apologizing](https://docs.baseten.co/styles/proselint/Apologizing.yml)
- [Annotations](https://docs.baseten.co/styles/proselint/Annotations.yml)
- [AnimalLabels](https://docs.baseten.co/styles/proselint/AnimalLabels.yml)
- [Airlinese](https://docs.baseten.co/styles/proselint/Airlinese.yml)
- [WordList](https://docs.baseten.co/styles/Google/WordList.yml)
- [Will](https://docs.baseten.co/styles/Google/Will.yml)
- [We](https://docs.baseten.co/styles/Google/We.yml)
- [Units](https://docs.baseten.co/styles/Google/Units.yml)
- [Spacing](https://docs.baseten.co/styles/Google/Spacing.yml)
- [Slang](https://docs.baseten.co/styles/Google/Slang.yml)
- [Semicolons](https://docs.baseten.co/styles/Google/Semicolons.yml)
- [Ranges](https://docs.baseten.co/styles/Google/Ranges.yml)
- [Quotes](https://docs.baseten.co/styles/Google/Quotes.yml)
- [Periods](https://docs.baseten.co/styles/Google/Periods.yml)
- [Passive](https://docs.baseten.co/styles/Google/Passive.yml)
- [Parens](https://docs.baseten.co/styles/Google/Parens.yml)
- [OxfordComma](https://docs.baseten.co/styles/Google/OxfordComma.yml)
- [Ordinal](https://docs.baseten.co/styles/Google/Ordinal.yml)
- [OptionalPlurals](https://docs.baseten.co/styles/Google/OptionalPlurals.yml)
- [LyHyphens](https://docs.baseten.co/styles/Google/LyHyphens.yml)
- [Latin](https://docs.baseten.co/styles/Google/Latin.yml)
- [Headings](https://docs.baseten.co/styles/Google/Headings.yml)
- [HeadingPunctuation](https://docs.baseten.co/styles/Google/HeadingPunctuation.yml)
- [Gender](https://docs.baseten.co/styles/Google/Gender.yml)
- [FirstPerson](https://docs.baseten.co/styles/Google/FirstPerson.yml)
- [Exclamation](https://docs.baseten.co/styles/Google/Exclamation.yml)
- [EmDash](https://docs.baseten.co/styles/Google/EmDash.yml)
- [Ellipses](https://docs.baseten.co/styles/Google/Ellipses.yml)
- [DateFormat](https://docs.baseten.co/styles/Google/DateFormat.yml)
- [Contractions](https://docs.baseten.co/styles/Google/Contractions.yml)
- [Colons](https://docs.baseten.co/styles/Google/Colons.yml)
- [Acronyms](https://docs.baseten.co/styles/Google/Acronyms.yml)
- [AMPM](https://docs.baseten.co/styles/Google/AMPM.yml)
- [package](https://docs.baseten.co/package.json)
- [package-lock](https://docs.baseten.co/package-lock.json)
- [settings](https://docs.baseten.co/.vscode/settings.json)
- [spec](https://docs.baseten.co/spec)