In this example, we'll be training a PixArt Sigma model using the SimpleTuner toolkit and will be using the full
model type, as it being a smaller model will likely fit in VRAM.
Make sure that you have python installed; SimpleTuner does well with 3.10 or 3.11. Python 3.12 should not be used.
You can check this by running:
python --version
If you don't have python 3.11 installed on Ubuntu, you can try the following:
apt -y install python3.11 python3.11-venv
For Vast, RunPod, and TensorDock (among others), the following will work on a CUDA 12.2-12.4 image:
apt -y install nvidia-cuda-toolkit libgl1-mesa-glx
If libgl1-mesa-glx
is not found, you might need to use libgl1-mesa-dri
instead. Your mileage may vary.
Clone the SimpleTuner repository and set up the python venv:
git clone --branch=release https://github.com/bghira/SimpleTuner.git
cd SimpleTuner
python -m venv .venv
source .venv/bin/activate
pip install -U poetry pip
# Necessary on some systems to prevent it from deciding it knows better than us.
poetry config virtualenvs.create false
Depending on your system, you will run one of 3 commands:
# MacOS
poetry install -C install/apple
# Linux
poetry install
# Linux with ROCM
poetry install -C install/rocm
The following must be executed for an AMD MI300X to be useable:
apt install amd-smi-lib
pushd /opt/rocm/share/amd_smi
python3 -m pip install --upgrade pip
python3 -m pip install .
popd
These two dependencies cause numerous issues for container hosts such as RunPod and Vast.
To remove them after poetry
has installed them, run the following command in the same terminal:
pip uninstall -y deepspeed bitsandbytes
To run SimpleTuner, you will need to set up a configuration file, the dataset and model directories, and a dataloader configuration file.
An experimental script, configure.py
, may allow you to entirely skip this section through an interactive step-by-step configuration. It contains some safety features that help avoid common pitfalls.
Note: This doesn't configure your dataloader. You will still have to do that manually, later.
To run it:
python configure.py
⚠️ For users located in countries where Hugging Face Hub is not readily accessible, you should addHF_ENDPOINT=https://hf-mirror.com
to your~/.bashrc
or~/.zshrc
depending on which$SHELL
your system uses.
If you prefer to manually configure:
Copy config/config.json.example
to config/config.json
:
cp config/config.json.example config/config.json
There, you will need to modify the following variables:
{
"model_type": "full",
"use_bitfit": false,
"pretrained_model_name_or_path": "pixart-alpha/pixart-sigma-xl-2-1024-ms",
"model_family": "pixart_sigma",
"output_dir": "/home/user/output/models",
"validation_resolution": "1024x1024,1280x768",
"validation_guidance": 3.5
}
pretrained_model_name_or_path
- Set this toPixArt-alpha/PixArt-Sigma-XL-2-1024-MS
.MODEL_TYPE
- Set this tofull
.USE_BITFIT
- Set this tofalse
.MODEL_FAMILY
- Set this topixart_sigma
.OUTPUT_DIR
- Set this to the directory where you want to store your checkpoints and validation images. It's recommended to use a full path here.VALIDATION_RESOLUTION
- As PixArt Sigma comes in a 1024px or 2048xp model format, you should carefully set this to1024x1024
for this example.- Additionally, PixArt was fine-tuned on multi-aspect buckets, and other resolutions may be specified using commas to separate them:
1024x1024,1280x768
- Additionally, PixArt was fine-tuned on multi-aspect buckets, and other resolutions may be specified using commas to separate them:
VALIDATION_GUIDANCE
- PixArt benefits from a very-low value. Set this between3.6
to4.4
.
There are a few more if using a Mac M-series machine:
mixed_precision
should be set tono
.
It's crucial to have a substantial dataset to train your model on. There are limitations on the dataset size, and you will need to ensure that your dataset is large enough to train your model effectively. Note that the bare minimum dataset size is TRAIN_BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS
. The dataset will not be discoverable by the trainer if it is too small.
Depending on the dataset you have, you will need to set up your dataset directory and dataloader configuration file differently. In this example, we will be using pseudo-camera-10k as the dataset.
In your /home/user/simpletuner/config
directory, create a multidatabackend.json:
[
{
"id": "pseudo-camera-10k-pixart",
"type": "local",
"crop": true,
"crop_aspect": "square",
"crop_style": "random",
"resolution": 1.0,
"minimum_image_size": 0.25,
"maximum_image_size": 1.0,
"target_downsample_size": 1.0,
"resolution_type": "area",
"cache_dir_vae": "cache/vae/pixart/pseudo-camera-10k",
"instance_data_dir": "/home/user/simpletuner/datasets/pseudo-camera-10k",
"disabled": false,
"skip_file_discovery": "",
"caption_strategy": "filename",
"metadata_backend": "discovery"
},
{
"id": "text-embeds",
"type": "local",
"dataset_type": "text_embeds",
"default": true,
"cache_dir": "cache/text/pixart/pseudo-camera-10k",
"disabled": false,
"write_batch_size": 128
}
]
Then, create a datasets
directory:
mkdir -p datasets
pushd datasets
huggingface-cli download --repo-type=dataset bghira/pseudo-camera-10k --local-dir=pseudo-camera-10k
popd
This will download about 10k photograph samples to your datasets/pseudo-camera-10k
directory, which will be automatically created for you.
You'll want to login to WandB and HF Hub before beginning training, especially if you're using push_to_hub: true
and --report_to=wandb
.
If you're going to be pushing items to a Git LFS repository manually, you should also run git config --global credential.helper store
Run the following commands:
wandb login
and
huggingface-cli login
Follow the instructions to log in to both services.
From the SimpleTuner directory, one simply has to run:
bash train.sh
This will begin the text embed and VAE output caching to disk.
For more information, see the dataloader and tutorial documents.
If you wish to enable evaluations to score the model's performance, see this document for information on configuring and interpreting CLIP scores.