This project builds upon key concepts from the following research papers:
- Peebles & Xie (2023) explore the application of transformer architectures to diffusion models, achieving state-of-the-art performance on various generation tasks;
- Karras et al. (2024) introduce the idea of preserving the magnitude of features during the diffusion process, enhancing the stability and quality of generated outputs.
Below, we present some preliminary results of using magnitude preservation (right) vs. not using magnitude preservation (left) with DiT-S/2 on the ImageNet-128 dataset. Note that DiT-S/2 is a very small model, so the samples are not of high quality. However, MaP-DiT displays much higher quality and consistency than vanilla DiT.
Fig 1. DiT-S/2 samples of Jay without (left) and with (right) magnitude preserving layers.
Fig 2. DiT-S/2 samples of Macaw without (left) and with (right) magnitude preserving layers.
Fig 3. DiT-S/2 samples of St. Bernard without (left) and with (right) magnitude preserving layers.
Fig 4. DiT-S/2 samples of Mushroom without (left) and with (right) magnitude preserving layers.
python train.py --data-path /path/to/data --results-dir /path/to/results --model DiT-S/2 --num-steps 400_000 <map feature flags>
python sample.py --result-dir /path/to/results/<dir> --class-label <class label>