Secure model inference

Baseten maintains SOC 2 Type II certification and HIPAA compliance, with robust security measures beyond compliance.

Data privacy

Baseten does not store model inputs, outputs, or weights by default. This zero data retention (ZDR) posture applies to synchronous inference out of the box.

Model inputs/outputs: Inputs for async inference are temporarily stored until processed. Outputs are never stored.
Model weights: Loaded dynamically from sources like Hugging Face, GCS, or S3, moving directly to GPU memory.
- Users can enable caching through Truss. You can permanently delete cached weights on request.
KV cache: The attention KV cache is an in-memory, GPU-resident structure used during inference. It is not persisted to disk and is discarded when a replica restarts or scales down.
Postgres data tables: Existing users may store data in Baseten’s hosted Postgres tables, which can be deleted anytime.

Baseten’s network accelerator optimizes model downloads. Contact support to disable it. To learn more and access official policies and certifications, visit the Baseten Trust Center.

View your compliance policy

If Baseten has set a compliance policy for your account, the policy appears in your Organization and Team settings under the General tab, and on the model environment detail view. The policy shows the boundaries your inference workloads run within:

Framework: the compliance programs your workloads are restricted to.
Region: the geographic regions where your workloads can run.

Compliance policies are read-only and managed by Baseten. To set or change a policy, contact support. For Baseten’s certifications and official compliance posture, visit the Baseten Trust Center.

Workload security

Baseten isolates inference workloads to protect users and Baseten’s infrastructure.

Container security:
- Baseten never shares GPUs across users.
- Security tooling: Falco (Sysdig), Gatekeeper (Pod Security Policies).
- Minimal privileges for workloads and nodes to limit incident impact.
Network security:
- Each customer has a dedicated Kubernetes namespace.
- Isolation enforced through Calico and Cilium.
- Nodes run in a private subnet with firewall protections.
Pentesting:
- Extended pentesting by RunSybil (ex-OpenAI and CrowdStrike experts).
- Malicious model deployments tested in a dedicated prod-like environment.

Self-hosted model inference

Baseten offers single-tenant environments and self-hosted deployments. The cloud version is recommended for ease of setup, cost efficiency, and elastic GPU access. For self-hosting, contact support.

​Data privacy

​View your compliance policy

​Workload security

​Self-hosted model inference

Data privacy

View your compliance policy

Workload security

Self-hosted model inference