Data privacy
Baseten does not store model inputs, outputs, or weights by default. This zero data retention (ZDR) posture applies to synchronous inference out of the box.- Model inputs/outputs: Inputs for async inference are temporarily stored until processed. Outputs are never stored.
- Model weights: Loaded dynamically from sources like Hugging Face, GCS, or S3, moving directly to GPU memory.
- Users can enable caching through Truss. You can permanently delete cached weights on request.
- KV cache: The attention KV cache is an in-memory, GPU-resident structure used during inference. It is not persisted to disk and is discarded when a replica restarts or scales down.
- Postgres data tables: Existing users may store data in Baseten’s hosted Postgres tables, which can be deleted anytime.
View your compliance policy
If Baseten has set a compliance policy for your account, the policy appears in your Organization and Team settings under the General tab, and on the model environment detail view. The policy shows the boundaries your inference workloads run within:- Framework: the compliance programs your workloads are restricted to.
- Region: the geographic regions where your workloads can run.
Workload security
Baseten isolates inference workloads to protect users and Baseten’s infrastructure.- Container security:
- Baseten never shares GPUs across users.
- Security tooling: Falco (Sysdig), Gatekeeper (Pod Security Policies).
- Minimal privileges for workloads and nodes to limit incident impact.
- Network security:
- Pentesting:
- Extended pentesting by RunSybil (ex-OpenAI and CrowdStrike experts).
- Malicious model deployments tested in a dedicated prod-like environment.