You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run the photo-concept-bucket example to train SD3 with parquet files. I have cloned the repository in my datasets folder and the size of the parquet file is coming out to be 150 MB. The tree looks like: /Users/boom/SimpleTuner/datasets/photo-concept-bucket/
2024-08-27 11:51:24,932 [ERROR] (main) 'str' object has no attribute 'get', traceback: Traceback (most recent call last):
File "/Users/boom/SimpleTuner/train.py", line 449, in main
configure_multi_databackend(
File "/Users/boom/SimpleTuner/helpers/data_backend/factory.py", line 348, in configure_multi_databackend
dataset_type = backend.get("dataset_type", None)
AttributeError: 'str' object has no attribute 'get'
I tried looking at existing discussion and found this question: #415
which has specified dataloader config in an array. I am not sure what should I put in the text-embeds entry since the guide available above doesn't specify anything for the parquet strategy in the example, but I tried to wrap the above config in an array and it gives:
2024-08-27 11:57:57,681 [ERROR] (main) Your dataloader config must contain at least one image dataset AND at least one text_embed dataset. See this link for more information about dataset_type: https://github.com/bghira/SimpleTuner/blob/main/documentation/DATALOADER.md#configuration-options, traceback: Traceback (most recent call last):
File "/Users/boom/SimpleTuner/train.py", line 449, in main
configure_multi_databackend(
File "/Users/boom/SimpleTuner/helpers/data_backend/factory.py", line 442, in configure_multi_databackend
raise ValueError(
ValueError: Your dataloader config must contain at least one image dataset AND at least one text_embed dataset. See this link for more information about dataset_type: https://github.com/bghira/SimpleTuner/blob/main/documentation/DATALOADER.md#configuration-options
So I tried to provide a text_embed following the existing discussion example and the config looks like:
Note: I do not know what to put in cache_dir so I just put my OUTPUT_DIR? Would be great to know how to configure text_embeds and what are their purpose for better understanding but running the above config gives me:
(Rank: 0) | Bucket | Image Count (per-GPU)
2024-08-27 12:09:56,318 [ERROR] (main) No images were discovered by the bucket manager in the dataset: photo-concept-bucket., traceback: Traceback (most recent call last):
File "/Users/boom/SimpleTuner/train.py", line 449, in main
configure_multi_databackend(
File "/Users/boom/SimpleTuner/helpers/data_backend/factory.py", line 812, in configure_multi_databackend
raise Exception(
Exception: No images were discovered by the bucket manager in the dataset: photo-concept-bucket.**
Some extra logs generated by above execution:
2024-08-27 12:09:43,977 [INFO] (DataBackendFactory) Configuring text embed backend: alt-embed-cache
2024-08-27 12:09:43,995 [INFO] (TextEmbeddingCache) (Rank: 0) (id=alt-embed-cache) Listing all text embed cache entries
2024-08-27 12:09:46,263 [WARNING] (DataBackendFactory) No default text embed was defined, using alt-embed-cache as the default. See this page for information about the default text embed backend: https://github.com/bghira/SimpleTuner/blob/main/documentation/DATALOADER.md#configuration-options
2024-08-27 12:09:46,263 [INFO] (DataBackendFactory) Completed loading text embed services.
2024-08-27 12:09:46,263 [INFO] (DataBackendFactory) Configuring data backend: photo-concept-bucket
2024-08-27 12:09:46,264 [INFO] (DataBackendFactory) (id=photo-concept-bucket) Loading bucket manager.
2024-08-27 12:09:56,297 [INFO] (DataBackendFactory) (id=photo-concept-bucket) Refreshing aspect buckets on main process.
2024-08-27 12:09:56,298 [INFO] (ParquetMetadataBackend) Discovering new files...
2024-08-27 12:09:56,315 [WARNING] (DataBackendFactory) Key disable_validation not found in the current backend config, using the existing value 'False'.
2024-08-27 12:09:56,315 [INFO] (DataBackendFactory) Configured backend: {'id': 'photo-concept-bucket', 'config': {'vae_cache_clear_each_epoch': True, 'probability': 1.0, 'repeats': 1, 'crop': True, 'crop_aspect': 'closest', 'crop_aspect_buckets': [1.0, 0.75, 1.23], 'crop_style': 'random', 'disable_validation': False, 'resolution': 1.0, 'resolution_type': 'area', 'parquet': {'path': '/Users/boom/SimpleTuner/datasets/photo-concept-bucket/photo-concept-bucket.parquet', 'filename_column': 'id', 'caption_column': 'cogvlm_caption', 'fallback_caption_column': 'tags', 'width_column': 'width', 'height_column': 'height', 'identifier_includes_extension': False}, 'caption_strategy': 'parquet', 'instance_data_dir': '/Users/boom/SimpleTuner/datasets/photo-concept-bucket/', 'maximum_image_size': 2.0, 'target_downsample_size': 1.5, 'config_version': 1}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x13f2fc670>, 'instance_data_dir': '/Users/boom/SimpleTuner/datasets/photo-concept-bucket', 'metadata_backend': <helpers.metadata.backends.parquet.ParquetMetadataBackend object at 0x13f2fc640>}
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am trying to run the photo-concept-bucket example to train SD3 with parquet files. I have cloned the repository in my datasets folder and the size of the parquet file is coming out to be 150 MB. The tree looks like: /Users/boom/SimpleTuner/datasets/photo-concept-bucket/
System: M2 Pro 16 GB MacOS
The config is set to the following:
Following the guide: https://github.com/bghira/SimpleTuner/blob/main/documentation/DATALOADER.md#parquet-caption-strategy--json-lines-datasets
Dataloader config looks like:
ON running bash train.sh I am getting:
2024-08-27 11:51:24,932 [ERROR] (main) 'str' object has no attribute 'get', traceback: Traceback (most recent call last):
File "/Users/boom/SimpleTuner/train.py", line 449, in main
configure_multi_databackend(
File "/Users/boom/SimpleTuner/helpers/data_backend/factory.py", line 348, in configure_multi_databackend
dataset_type = backend.get("dataset_type", None)
AttributeError: 'str' object has no attribute 'get'
I tried looking at existing discussion and found this question: #415
which has specified dataloader config in an array. I am not sure what should I put in the text-embeds entry since the guide available above doesn't specify anything for the parquet strategy in the example, but I tried to wrap the above config in an array and it gives:
2024-08-27 11:57:57,681 [ERROR] (main) Your dataloader config must contain at least one image dataset AND at least one text_embed dataset. See this link for more information about dataset_type:
https://github.com/bghira/SimpleTuner/blob/main/documentation/DATALOADER.md#configuration-options, traceback: Traceback (most recent call last):
File "/Users/boom/SimpleTuner/train.py", line 449, in main
configure_multi_databackend(
File "/Users/boom/SimpleTuner/helpers/data_backend/factory.py", line 442, in configure_multi_databackend
raise ValueError(
ValueError: Your dataloader config must contain at least one image dataset AND at least one text_embed dataset. See this link for more information about dataset_type: https://github.com/bghira/SimpleTuner/blob/main/documentation/DATALOADER.md#configuration-options
So I tried to provide a text_embed following the existing discussion example and the config looks like:
Note: I do not know what to put in cache_dir so I just put my OUTPUT_DIR? Would be great to know how to configure text_embeds and what are their purpose for better understanding but running the above config gives me:
(Rank: 0) | Bucket | Image Count (per-GPU)
2024-08-27 12:09:56,318 [ERROR] (main) No images were discovered by the bucket manager in the dataset: photo-concept-bucket., traceback: Traceback (most recent call last):
File "/Users/boom/SimpleTuner/train.py", line 449, in main
configure_multi_databackend(
File "/Users/boom/SimpleTuner/helpers/data_backend/factory.py", line 812, in configure_multi_databackend
raise Exception(
Exception: No images were discovered by the bucket manager in the dataset: photo-concept-bucket.**
Some extra logs generated by above execution:
2024-08-27 12:09:43,977 [INFO] (DataBackendFactory) Configuring text embed backend: alt-embed-cache
2024-08-27 12:09:43,995 [INFO] (TextEmbeddingCache) (Rank: 0) (id=alt-embed-cache) Listing all text embed cache entries
2024-08-27 12:09:46,263 [WARNING] (DataBackendFactory) No default text embed was defined, using alt-embed-cache as the default. See this page for information about the default text embed backend: https://github.com/bghira/SimpleTuner/blob/main/documentation/DATALOADER.md#configuration-options
2024-08-27 12:09:46,263 [INFO] (DataBackendFactory) Completed loading text embed services.
2024-08-27 12:09:46,263 [INFO] (DataBackendFactory) Configuring data backend: photo-concept-bucket
2024-08-27 12:09:46,264 [INFO] (DataBackendFactory) (id=photo-concept-bucket) Loading bucket manager.
2024-08-27 12:09:56,297 [INFO] (DataBackendFactory) (id=photo-concept-bucket) Refreshing aspect buckets on main process.
2024-08-27 12:09:56,298 [INFO] (ParquetMetadataBackend) Discovering new files...
2024-08-27 12:09:56,315 [WARNING] (DataBackendFactory) Key disable_validation not found in the current backend config, using the existing value 'False'.
2024-08-27 12:09:56,315 [INFO] (DataBackendFactory) Configured backend: {'id': 'photo-concept-bucket', 'config': {'vae_cache_clear_each_epoch': True, 'probability': 1.0, 'repeats': 1, 'crop': True, 'crop_aspect': 'closest', 'crop_aspect_buckets': [1.0, 0.75, 1.23], 'crop_style': 'random', 'disable_validation': False, 'resolution': 1.0, 'resolution_type': 'area', 'parquet': {'path': '/Users/boom/SimpleTuner/datasets/photo-concept-bucket/photo-concept-bucket.parquet', 'filename_column': 'id', 'caption_column': 'cogvlm_caption', 'fallback_caption_column': 'tags', 'width_column': 'width', 'height_column': 'height', 'identifier_includes_extension': False}, 'caption_strategy': 'parquet', 'instance_data_dir': '/Users/boom/SimpleTuner/datasets/photo-concept-bucket/', 'maximum_image_size': 2.0, 'target_downsample_size': 1.5, 'config_version': 1}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x13f2fc670>, 'instance_data_dir': '/Users/boom/SimpleTuner/datasets/photo-concept-bucket', 'metadata_backend': <helpers.metadata.backends.parquet.ParquetMetadataBackend object at 0x13f2fc640>}
Beta Was this translation helpful? Give feedback.
All reactions