-
I am using the Parquet Dataloader to load a large amount of data. I think this feature is really cool, but there are some aspects I don’t understand...
My training dataset uses tags, and I prefer using an array to store them, as this better utilizes the advantages of Parquet, resulting in smaller files. It seems that the values in the caption_column do not support arrays. I can simply add code in the Loader to convert captions that are np.array into strings, but it might be better if arrays were supported, and even better if the array order could be partially shuffled during training, similar to sd-scripts. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
well, an array of captions isnt going to behave like tags. they each will be their own caption. i despise tag based prompts really.. |
Beta Was this translation helpful? Give feedback.
same image will get random chance of one of the captions. to sample all captions you would probably need to do N number of epochs/repeats.