Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge wilds mllm #266

Draft
wants to merge 36 commits into
base: mllm
Choose a base branch
from
Draft

Merge wilds mllm #266

wants to merge 36 commits into from

Conversation

i-gao
Copy link
Collaborator

@i-gao i-gao commented Sep 21, 2023

Modular eval code

TODOs:

  • test an eval for each dataset

@liyongqi67
Copy link

liyongqi67 commented Oct 4, 2023

Does this branch test the "evaluate" code? I tested with the eval_flickr30 flag, and found it reported an error:

  File "/home/share/yongqi/project/open_flamingo/open_flamingo/src/helpers.py", line 240, in forward
    assert (
AssertionError: current text cannot be longer than conditioned media locations

My script is

CUDA_VISIBLE_DEVICES=3,4,6,7 torchrun --nnodes=1 --nproc_per_node=4 --master_port=1997 ./open_flamingo/eval/evaluate.py \
    --model_family flamingo \
    --vision_encoder_path ViT-L-14 \
    --vision_encoder_pretrained openai\
    --lm_path anas-awadalla/mpt-1b-redpajama-200b-hf-style  \
    --tokenizer_path anas-awadalla/mpt-1b-redpajama-200b-hf-style  \
    --cross_attn_every_n_layers 1 \
    --results_file results.json \
    --precision fp32 \
    --batch_size 1 \
    --eval_flickr30 \
    --shots 0 \

I printed the two corresponding length values via " # print(x.shape[1], media_locations.shape[1])" in helpers.py before line 240.

47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
48 47

At the last call, 48>47.

And if I set batchsize=2, it will report another error.

  File "/home/share/yongqi/project/open_flamingo/open_flamingo/src/helpers.py", line 273, in forward
    sim = sim.masked_fill(~text_to_media_mask, -torch.finfo(sim.dtype).max)
RuntimeError: The size of tensor a (2) must match the size of tensor b (6) at non-singleton dimension 0

@liyongqi67
Copy link

In the evaluate.py line 747, the code should be revised from

        outputs = eval_model.get_outputs(
            batch_images=batch_images,
            batch_text=batch_text,
            min_generation_length=min_generation_length,
            max_generation_length=max_generation_length,
            num_beams=num_beams,
            length_penalty=length_penalty,
        )

to

        outputs = eval_model.get_outputs(
            batch_images=batch_images,
            batch_text=batch_text,
            min_new_tokens=min_generation_length,
            max_new_tokens=max_generation_length,
            num_beams=num_beams,
            length_penalty=length_penalty,
        )

Because min_new_tokens and max_new_tokens are accepted arguments for the LLM generate().

@i-gao
Copy link
Collaborator Author

i-gao commented Oct 4, 2023

Hi @liyongqi67, thanks for pointing out these issues! Sorry, I have not finished cleaning up this gnarly merge yet -- will get to it in the next few days.

@liyongqi67
Copy link

Hi @liyongqi67, thanks for pointing out these issues! Sorry, I have not finished cleaning up this gnarly merge yet -- will get to it in the next few days.

Many thanks for your effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants