Replies: 1 comment 7 replies
-
i was seeing about 7-8 seconds per step with 3x A100-80G in DeepSpeed ZeRO 2 and just around 58,000M VRAM in use on each GPU, when training on mixed 512px and 1024px. |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I'm trying to use FLUX full fine-tuning, and managed to get it to work with ~20sec/iter, using batch size 1 and resolution 512x512 on a single 80G A100. This takes about 71GB VRAM. I'd really like to get it to work on higher resolution and high batch size (1 is honestly a bit of a stretch to get good results, I guess), so any advice will be very appreciated. Here is my accelerate config:
Beta Was this translation helpful? Give feedback.
All reactions