more explaination and correct usage timestep_bias_strategy and other timestep bias argument #194

zhuliyi0 · 2023-09-27T02:08:21Z

zhuliyi0
Sep 27, 2023

I have read the explaination in the source, but couldn't track it down where these arguments are used. Could you please provide more information of where and how they are used, and some recommendation/examples of what values can achieve what result?

bghira · 2023-09-27T02:48:47Z

bghira
Sep 27, 2023
Maintainer

there will be an article about it this weekend for huggingface. here are some basics.

This is how StabilityAI's training looks on a scatter plot, where we visualise timesteps per batch along the Y axis and steps over time via the X axis.

In other words, this is a solid block of blue because their scatter plot is fully covering all 1000 timesteps on every single batch size of 2048.

With a batch size of 128, we see major gaps in the timestep coverage on each batch. This is a visualisation of training on an A100-80G and those gaps are actually conservative, since the blue dots are fairly large. In other words, we have even less coverage over each training step than this.

Here is what it would look like if you were to visualise:
--timestep_bias_strategy=later --timestep_bias_portion=0.25 --timestep_bias_multiplier=5

The 'later' timesteps (higher numeric values) will be biased 5x higher than any other timestep in the schedule, while not entirely avoiding training of those timesteps.

If we increase --timestep_bias_multiplier from 5 to 20 then we'll see a chart more like this:

WHY?

The SDXL refiner was trained on just 200 of the 1000 total timesteps, rather than biasing them. The idea behind this is the same.

However, we are building potentially base models with this tool. Biasing the later timesteps has actually been pretty instructive for coherence, while the earlier timesteps have been apparently impacting fine-details:

Training an SDXL v-prediction variant on a 5x bias toward the earlier 25% of timesteps has led to this kind of weird monstrosity butterfly, but with some very good fine details - notably, there is some residual noise:

Taking that same checkpoint and then applying the inverse, a 25% bias toward the later 25% timesteps of the schedule resulted in this improvement:

It seems if the loss landscape of the later portions of the model are still crusty, that no matter how much you try and hammer on the early timesteps, at inference time, the model's validations will look like garbage.

But it's promising that we can potentially direct training toward the later timesteps and make more sweeping changes to the model, and direct training toward earlier timesteps for improving fine-details with very high quality training data.

This should ultimately result in less overfitting as the training run will not need to unnecessarily sample on unrelated timesteps.

0 replies

bghira · 2023-09-27T02:53:23Z

bghira
Sep 27, 2023
Maintainer

sorry, my copy-paste went to hell there. i have fixed all the images.

here is another comparison, with earlier timesteps biased on the right, and later biased on the left

however, it's worth noting that i'm doing a major overhaul of the model, and that incoherence isn't the fault of training the early timesteps. it is how the model was already. just pointing out that training on later timesteps seems to improve larger features of the image.

0 replies

zhuliyi0 · 2023-09-27T04:04:26Z

zhuliyi0
Sep 27, 2023
Author

looks like a very useful technique! I was guessing this must somehow related to the refiner. So you were saying the refiner was basically trained with the later 800 steps freezed, or in other words, infinitely biased towards the earlier 200 steps?

What about batch size? Does a small batch size still work?

0 replies

bghira · 2023-09-27T04:27:54Z

bghira
Sep 27, 2023
Maintainer

the refiner seems to be "capable" of doing the early noise schedule but it doesn't do it very well. i don't know if that's because it "Never" saw it, or because it was fine-tuned on just the final inference steps.

small batch sizes have always been terrible but if you have a small dataset with a lot of visually-similar images, a large batch size might lead to overfitting.

0 replies

zhuliyi0 · 2023-09-27T07:13:24Z

zhuliyi0
Sep 27, 2023
Author

can I change bias when continue from last checkpoint?

0 replies

bghira · 2023-09-27T14:13:01Z

bghira
Sep 27, 2023
Maintainer

yes, it's got no impact in that way, unlike batch size or learning rate.

0 replies

zhuliyi0 · 2023-09-28T01:59:51Z

zhuliyi0
Sep 28, 2023
Author

the argument "timestep_bias_begin" and "timestep_bias_end" requires --timestep_bias_strategy=range as the source says, but "--timestep_bias_strategy" says it only has three options ["earlier", "later", "none"]. Is "range" not supported yet? I added it to the choices, it seems to be running normally.

0 replies

bghira · 2023-09-28T02:08:50Z

bghira
Sep 28, 2023
Maintainer

it's there, it's just missing in the help output

0 replies

zhuliyi0 · 2024-09-11T06:39:10Z

zhuliyi0
Sep 11, 2024
Author

ok finially I found this old discussion.

I am in a training run, the model shows sign of converge. Validation image's composition is almost fixed, most things looks correct at first glance, saturation is increasing. Some small details needs to be fixed, and I am wondering if this is the right moment to use time step bias for earlier steps. If yes, should lr be decreased or just keep it as it is?

I would try and found out but would like to know more about the thing I am about to do. Thanks.

1 reply

bghira Sep 13, 2024
Maintainer

i think you are correct but it needs experimentation to know the best values

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more explaination and correct usage timestep_bias_strategy and other timestep bias argument #194

{{title}}

Replies: 9 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

more explaination and correct usage timestep_bias_strategy and other timestep bias argument #194

zhuliyi0 Sep 27, 2023

Replies: 9 comments · 1 reply

bghira Sep 27, 2023 Maintainer

bghira Sep 27, 2023 Maintainer

zhuliyi0 Sep 27, 2023 Author

bghira Sep 27, 2023 Maintainer

zhuliyi0 Sep 27, 2023 Author

bghira Sep 27, 2023 Maintainer

zhuliyi0 Sep 28, 2023 Author

bghira Sep 28, 2023 Maintainer

zhuliyi0 Sep 11, 2024 Author

bghira Sep 13, 2024 Maintainer

zhuliyi0
Sep 27, 2023

Replies: 9 comments 1 reply

bghira
Sep 27, 2023
Maintainer

bghira
Sep 27, 2023
Maintainer

zhuliyi0
Sep 27, 2023
Author

bghira
Sep 27, 2023
Maintainer

zhuliyi0
Sep 27, 2023
Author

bghira
Sep 27, 2023
Maintainer

zhuliyi0
Sep 28, 2023
Author

bghira
Sep 28, 2023
Maintainer

zhuliyi0
Sep 11, 2024
Author

bghira Sep 13, 2024
Maintainer