Set your model resources, dependencies, and more
YAML syntax help
model_name
description
model_class_name
Model
)
The name of the class that defines your Truss model. Note that this class must implement
at least a predict
method.
model_module_dir
model
)
Folder in the Truss where to find the model class.
data_dir
data/
)
Folder where to place data files in your Truss. Note that you can access this within your model like so:
packages
packages/
)
Folder in the Truss to put your custom packages.
Inside the packages
folder you can place your own code that you want to reference inside model.py
. Here is an example:
Imagine you have the project setup below:
model.py
the package can be imported like this:
external_package_dirs
super_cool_awesome_plugin/
is outside the truss.
stable-diffusion/config.yaml
the path to your external package needs to be specified. For the example above, the config.yaml
would look like this:
stable-diffusion/model/model.py
the super_cool_awesome_plugin/
package can be imported like so:
environment_variables
secrets
arg for information on properly managing
secrets.model_metadata
requirements_file
requirements
requirements_file
.
We strongly recommend pinning versions in your requirements.
resources
resources
section is where you specify the compute resources that your model needs. This includes CPU, memory, and GPU resources.
If you need a GPU, you must also set resources.use_gpu
to true
.
resources.cpu
1000m
and 1
are equivalent.
Fractional CPU amounts can be requested using millicpus. For example, 500m
is half of a CPU core.
resources.memory
1Gi
and 1024Mi
are equivalent.
resources.use_gpu
resources.accelerator
T4
L4
A10G
V100
A100
H100
H100_40GB
See details:
operator to request multiple GPUs on your instance, eg:
secrets
system_packages
apt
on a Debian operating system.
python_version
base_image
base_image
option is used if you need to bring your own custom base image.
Custom base images are useful if there are scripts that need to run at build time, or dependencies
that are complicated to install. After creating a custom base image, you can specify it
in this field.
See Custom Base Images for more detail on how to use these.
base_image.image
nvcr.io/nvidia/nemo:23.03
.
base_image.python_executable_path
/usr/bin/python
.
Tying it together, a custom base image configuration might look
like this:
base_image.docker_auth
base_image.docker_auth.auth_method
GCP_SERVICE_ACCOUNT_JSON
- authenticate with a GCP service account. To use this, make sure you
add your service account JSON blob as a Truss secret.AWS_IAM
- authenticate with an AWS IAM service account. To use this, make sure
that aws_access_key_id
and aws_secret_access_key
have been added to your Baseten secrets.docker_auth
settings like so:
secret_name
references the secret that you added your service account json to.
In the case of AWS_IAM
, you would use the following:
base_image.docker_auth.secret_name
secrets
section of your Truss.
base_image.docker_auth.registry
us-east4-docker.pkg.dev
).
runtime
runtime.predict_concurrency
1
)
This field governs how much concurrency can run in the predict method of your model. This is useful
if you have a model that has support for parallelism, and you'd like to take advantage of that.
By default, this value is set to 1, implying that predict
can only run for one request at a time.
This protects the GPU from being over-utilized, and is a good default for many models.
See How to configure concurrency for more detail on how to set this value.
runtime.enable_tracing_data
False
)
Enables trace data export with builtin OTEL instrumentation. If not further
specified, this data is only collected baseten-internally and can help us
troubleshoot.
You can additionally export it to your own systems. Refer to the
tracing guide
Turning this on, could add performance overhead.
runtime.enable_debug_logs
False
)
If turned on, the log level for the Truss server is changed from INFO
to
DEBUG
.
runtime.health_checks
runtime.health_checks
. For details on setup and customization, see Configuring health checks.
external_data
external_data
if you have data that you want to be bundled in your image at build time.
This is useful if you have a large amount of data that you want to be available to your model.
By including it at build-time, you reduce the cold-start time of your instance, as the data is
already available in the image. You can use it like so:
external_data.<list_item>.url
external_data.<list_item>.local_data_path
external_data.<list_item>.name
build_commands
build
build
section is used to define options for builds.
build.secret_to_path_mapping
build_commands
section by running cat
on the file. For instance,
to install a pip package from a private Github repository, you could do the following:
my-github-access-token
, and that it can be accessed at the path
/root/my-github-access-token
. In our build_commands
section, we then access that secret through using cat
to get access
to the contents.
Under the hood, this option mounts your secret as a build secret. This means that the value of your secret will be secure and
will not be exposed via Docker history or logs.
model_cache
, there are
multiple backends supported, not just Hugging Face. You can also cache weights
stored on GCS, for instance.model_cache.<list_item>.repo_id
madebyollin/sdxl-vae-fp16-fix
for a Hugging Face repo, or gcs://path-to-my-bucket
for
a GCS bucket. GCS bucket is only supported with use_volume: False
model_cache.<list_item>.revision
model_cache.<list_item>.use_volume
model_cache.<list_item>.volume_folder
use_volume
is set to True.
For instance volume_folder: myrepo
will make the model available under /app/model_cache/myrepo
at runtime.
model_cache.<list_item>.allow_patterns
model_cache.<list_item>.ignore_patterns
["*.onnx", "Readme.md"]
By default, nothing is ignored.