Start a development deployment
Create a development deployment and start watching for changes:LOADING_MODEL stage, Truss enters watch mode early so you can start iterating while the model finishes loading.
--watch-hot-reload flag:
Re-attach to a development deployment
If you stop the watch session (Ctrl+C), re-attach to the existing development deployment with:truss watch syncs any changes made while disconnected, then resumes watching. It requires an existing development deployment. If you don’t have one, use truss push --watch to create it.
To apply model code changes without restarting, add the --hot-reload flag:
What gets live-patched
Truss monitors your project directory (respecting.trussignore patterns) and applies patches for the following changes without a full rebuild:
| Change type | Examples |
|---|---|
| Model code | Files in the model/ directory: model.py, helper modules, utilities, and binary files (like .so, .png). |
| Bundled packages | Files in the packages/ directory, including binary files (like .pyd, .so). |
| Python requirements | Adding, removing, or updating packages in requirements or a requirements file. |
| Environment variables | Adding, removing, or updating values in environment_variables. |
| External data | Adding or removing entries in external_data. |
| Config values | Most config.yaml changes (except those listed below). |
--watch-hot-reload or --hot-reload flags, Truss hot-reloads model code changes by swapping the model class in-process without restarting the inference server. This preserves in-memory state like loaded weights and caches. If a patch includes non-model changes (such as requirements or config), Truss falls back to a standard restart.
What requires a full redeploy
The patch system doesn’t support some changes. When you make these changes, stop the watch session and runtruss push (or truss push --watch to start a new development deployment):
| Change type | Why |
|---|---|
resources (GPU type, count) | Requires a new instance. |
python_version | Requires a new base image. |
system_packages | Requires apt installation in the container. |
live_reload | Changes the deployment mode. |
Data directory (data/) | The patch system doesn’t track file changes in data/. |
Limitations
Development deployments optimize for iteration, not production traffic:- Single replica: Fixed at 0 minimum, 1 maximum. No autoscaling beyond one replica.
- No gRPC: Trusses with gRPC transport require a published deployment.
- No TRT-LLM engine builds: TRT-LLM build flow requires a published deployment.
Deploy to production
When you’re done iterating, deploy a published version:truss push creates a published deployment with full autoscaling support. Published deployments can scale to multiple replicas and are suitable for production traffic.
To deploy and promote directly to the production environment:
CLI reference: truss push
Full list of options for the push command.
CLI reference: truss watch
Full list of options for the watch command.
Autoscaling
Configure replicas, concurrency targets, and scale-to-zero for production.
Environments
Manage staging, production, and custom environments.