Get started - Baseten

Baseten Training runs your training code on managed cloud GPUs. You bring your own framework, point it at a GPU type, and submit. Baseten handles provisioning, syncs checkpoints as they’re saved, and deploys any checkpoint as a production endpoint in one command. This tutorial fine-tunes Qwen3-4B with LoRA on a single H100, from job submission to calling the deployed model. You’ll set up a project directory, define your infrastructure in a configuration file, and write the training scripts that run on an H100. You’re billed per minute of GPU time while the job runs and while the deployed model serves traffic; see Baseten pricing for H100 rates. The training job is capped at 50 steps and ends on its own within minutes.

Prerequisites

Baseten account: Sign up for Baseten.
API key: Generate an API key from Settings > API keys.
Hugging Face token: Store a Hugging Face access token as a Baseten secret named hf_access_token. The deploy step at the end of this tutorial needs it to download the base model.
uv: This guide uses uvx to run Truss commands without a separate install step. Log in to Baseten:
```
uvx truss login
```

Create your training project

mkdir my-training-project && cd my-training-project

Write your configuration file

Your configuration file uses the truss_train library to define your training infrastructure as Python objects:

TrainingProject: the top-level container for your project.
TrainingJob: a single job within a project, combining:
- Image: what container to run.
- Compute: what hardware to provision.
- Runtime: how to start training and what to persist.

This is the file Baseten reads when you submit a job. It tells the platform which GPU to provision, which container image to use, and where to sync checkpoints. Create config.py:

config.py

from truss_train import (
    TrainingProject,
    TrainingJob,
    Image,
    Compute,
    Runtime,
    CacheConfig,
    CheckpointingConfig,
)
from truss.base.truss_config import AcceleratorSpec

BASE_IMAGE = "pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime"

training_runtime = Runtime(
    start_commands=[
        "chmod +x ./run.sh && ./run.sh",
    ],
    cache_config=CacheConfig(enabled=True),
    checkpointing_config=CheckpointingConfig(enabled=True),
)

training_compute = Compute(
    accelerator=AcceleratorSpec(accelerator="H100", count=1),
)

training_job = TrainingJob(
    image=Image(base_image=BASE_IMAGE),
    compute=training_compute,
    runtime=training_runtime,
)

training_project = TrainingProject(
    name="qwen3-4b-lora-sft",
    job=training_job,
)

CacheConfig avoids re-downloading models and datasets between jobs. CheckpointingConfig tells Baseten to sync your saved checkpoints so you can deploy them later.

Write your training scripts

Create run.sh to install dependencies and launch training. This tutorial uses pip install in the start command, but you can also pre-install dependencies in a custom base image.

run.sh

#!/bin/bash
set -eux

pip install "trl>=0.20.0" "peft>=0.17.0" "transformers>=4.55.0" "datasets"

python train.py

Your train.py is your own training code. Baseten runs it as-is, so you can use any framework or training loop that works locally. This example fine-tunes Qwen3-4B on the pirate-ultrachat-10k dataset using LoRA with TRL (Transformer Reinforcement Learning). The dataset teaches the model to respond in pirate dialect, so you’ll know fine-tuning worked when the deployed model starts saying “Ahoy, matey!”

train.py

import os
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from peft import LoraConfig
from trl import SFTConfig, SFTTrainer

MODEL_ID = "Qwen/Qwen3-4B"
DATASET_ID = "winglian/pirate-ultrachat-10k"

dataset = load_dataset(DATASET_ID, split="train")

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    use_cache=False,
)

peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules="all-linear",
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)

training_args = SFTConfig(
    learning_rate=2e-4,
    num_train_epochs=1,
    max_steps=50,
    logging_steps=5,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    max_length=1024,
    warmup_steps=5,
    lr_scheduler_type="cosine",
    save_steps=25,
    bf16=True,
    output_dir=os.getenv("BT_CHECKPOINT_DIR", "./checkpoints"),
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    processing_class=tokenizer,
    peft_config=peft_config,
)
trainer.train()

trainer.save_model(training_args.output_dir)
print(f"Training complete. Model saved to {training_args.output_dir}")

Save checkpoints to $BT_CHECKPOINT_DIR so Baseten can sync and deploy them. Baseten sets this variable automatically when checkpointing is enabled.

With save_steps=25 and max_steps=50, the trainer saves LoRA checkpoints at steps 25 and 50.

Submit your training job

Now that your project is set up, submit your training job. The CLI packages your files, creates the training project, and starts the job on your specified GPU.

uvx truss train push config.py

You’ll see:

✨ Training job successfully created!
🪵 View logs for your job via 'truss train logs --job-id q4ok7dw --tail'
🔍 View metrics for your job via 'truss train metrics --job-id q4ok7dw'
📁 View cache summary via 'truss train cache summarize "qwen3-4b-lora-sft"'
🌐 View job in the UI: https://app.baseten.co/training/4w51mm3/logs/q4ok7dw

Copy the job_id to use in the next steps.

Monitor your training job

Tail logs in real time with the job ID from the previous step.

uvx truss train logs --job-id <job_id> --tail

You can also view logs, metrics, and job status in the Baseten dashboard.

Deploy your trained model

Checkpoints sync while training runs: the logs show each one move from SYNCING to COMPLETE as it’s saved. When the job finishes, you’ll see:

{'train_runtime': '133.4', 'train_samples_per_second': '5.997', 'train_steps_per_second': '0.375', 'train_loss': '1.911', 'epoch': '0.0801'}
Training complete. Model saved to /mnt/ckpts
Job already completed successfully, terminating job...

CLI
Dashboard

Deploy your checkpoint to Baseten’s inference platform. The deployment downloads the base model weights and serves them with your LoRA adapter using vLLM. This step uses the hf_access_token secret from the prerequisites because the serving layer downloads the base model separately.

uvx truss train deploy_checkpoints

Follow the interactive prompts to select a checkpoint, name your model, and choose a GPU.

Fetching checkpoints for training job <job_id>...
? Use spacebar to select/deselect checkpoints to deploy.
  ○ .
  ○ checkpoint-50
❯ ○ checkpoint-25

? Enter the model name for your deployment: my-fine-tuned-model
? Select the GPU type to use for deployment: H100
? Select the number of H100 GPUs to use for deployment: 1
? Enter the huggingface secret name: hf_access_token

Writing truss config to truss_configs/deployment-1_<deployment_id>/config.yaml
To call the deployment, set your `model` parameter to one of ['checkpoint-25']

To script this step instead of answering prompts, pass --config with a DeployCheckpointsConfig and --non-interactive; see serve your trained model.

Test your deployment

Call your deployed model using the OpenAI-compatible chat format. The model field matches the checkpoint name you selected during deployment.

cURL
Python
CLI

export BASETEN_API_KEY="paste-your-api-key-here"

curl -X POST https://model-<model_id>.api.baseten.co/environments/production/sync/v1/chat/completions \
  -H "Authorization: Bearer $BASETEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "checkpoint-25", "messages": [{"role": "user", "content": "What is the best way to learn Python programming?"}]}'

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["BASETEN_API_KEY"],
    base_url="https://model-<model_id>.api.baseten.co/environments/production/sync/v1"
)

response = client.chat.completions.create(
    model="checkpoint-25",
    messages=[{"role": "user", "content": "What is the best way to learn Python programming?"}],
)

print(response.choices[0].message.content)

uvx truss predict --model <model-id> --data '{"model": "checkpoint-25", "messages": [{"role": "user", "content": "What is the best way to learn Python programming?"}]}'

The fine-tuned model responds in pirate dialect, confirming that the LoRA adapter is active. Qwen3 opens its reply with a (often empty) <think> block from its reasoning mode:

<think>

</think>

Ahoy there, matey!  Seeking to learn Python, ye be in for a grand adventure!
Here be some tips to guide ye:

**1. Start with the basics!**  Aye, like a pirate needs a ship to sail the
seven seas, a programmer needs to know the basics...

Next steps

Train on your own data: swap the dataset, mount weights through BDN, and scale the hardware.
Monitor and manage training jobs: for logs, metrics, and job lifecycle commands.
Training SDK reference: for all configuration options, including base images, secrets, private registries, and .truss_ignore syntax.
Browse the ML Cookbook: for framework examples and advanced recipes.

​Prerequisites

​Create your training project

​Write your configuration file

​Write your training scripts

​Submit your training job

​Monitor your training job

​Deploy your trained model

​Test your deployment

​Next steps

Prerequisites

Create your training project

Write your configuration file

Write your training scripts

Submit your training job

Monitor your training job

Deploy your trained model

Test your deployment

Next steps