Handled mixed modality #14

mano3-1 · 2024-10-30T17:04:01Z

This PR introduces support for datasets containing mixed modalities by implementing the following changes:

Conditional Image Data Inclusion: Checks if there is atleast one image associated with a given datapoint. If no images are tagged, all_pixel_values and all_image_grid_thw are excluded from the final data dictionary, data_dict.
Homogeneous Batch Sampling: Implements a custom sampler, HomogeneousBatchSampler, to ensure each batch is homogeneous—either containing only text or exclusively text-image pairs.
Sampler Methods for Training and Evaluation: Adds _get_eval_sampler and _get_train_sampler methods to QwenTrainer

2U1

This makes an error of

[rank2]:     sources = self.list_data_dict[i]
[rank2]: TypeError: list indices must be integers or slices, not list

Also, to the best of my knowledge the sampler should just boost the training speed, not a ciritcal issue for running the code. I have some other issues with my env so, I can't really dig in to the code.
I'll try to benchmark the grouping modality code from LLaVA.

modified: scripts/finetune_lora.sh

modified: src/training/trainer.py

mano3-1 and others added 2 commits October 30, 2024 17:02

handled mixed modality

41847d6

Merge branch 'master' into enh/mixed_data_handle

891f9cf

mano3-1 changed the title ~~handled mixed modality~~ Handled mixed modality Oct 30, 2024

removed tokenizer parallelism argument

5c4b6be

mano3-1 mentioned this pull request Oct 30, 2024

Shape Mismtach #12

Open

modified: src/training/data.py

e65843f

2U1 reviewed Oct 31, 2024

View reviewed changes

mano3-1 added 5 commits November 1, 2024 19:53

added trainer for handling multi modality

ab14f03

modified: src/training/data.py

309216a

new file: notebooks/test.ipynb

2776ab7

modified: scripts/finetune_lora.sh

modified: src/training/data.py

8143cfc

modified: src/training/trainer.py

Merge branch 'master' into enh/mixed_data_handle

acb55c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handled mixed modality #14

Handled mixed modality #14

mano3-1 commented Oct 30, 2024 •

edited

Loading

2U1 left a comment

Handled mixed modality #14

Are you sure you want to change the base?

Handled mixed modality #14

Conversation

mano3-1 commented Oct 30, 2024 • edited Loading

2U1 left a comment

Choose a reason for hiding this comment

mano3-1 commented Oct 30, 2024 •

edited

Loading