You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ValueError: Trying to set a tensor of shape torch.Size([128257, 4096]) in "weight" (which has shape torch.Size([128256, 4096])), this look incorrect.
#31
Open
basteran opened this issue
May 28, 2024
· 1 comment
I am trying to further fine-tune the LLaVA architecture using your implementation with LLaMA 3 Instruct 8B. I can already fine-tune the Vicuna model using the original LLaVA code and now I am looking for some implementation with LLaMA 3.
I found your repo and followed your instructions from the README.md file for each step. I am able to train the model using the following bash file and it looks like it's correctly saved. NOTE: I downloaded the model from your huggingface repo
I then tried to merge (using this script from LLaVA) the resulting adapters with the original model LLaVA-Meta-Llama-3-8B-Instruct-FT and I got the following error.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LLaVA from base model...
/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/user/mm-iglu-it/./scripts/merge_lora_weights.py", line 22, in <module>
merge_lora(args)
File "/user/mm-iglu-it/./scripts/merge_lora_weights.py", line 8, in merge_lora
tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu')
File "/user/mm-iglu-it/llava/model/builder.py", line 64, in load_pretrained_model
model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)
File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3682, in from_pretrained
) = cls._load_pretrained_model(
File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4109, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 887, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([128257, 4096]) in "weight" (which has shape torch.Size([128256, 4096])), this look incorrect.
Finally, I even tried using the adapters (without merging) with the following script but I get the same identical error. The file llava/eval/test_llava.py is very similar to the inference script from the original LLaVA repo, but I made very little changes for my convenience (such as --prompt-version, --input-file-path, etc.).
Hello everyone, thank you for the great job!
I am trying to further fine-tune the LLaVA architecture using your implementation with LLaMA 3 Instruct 8B. I can already fine-tune the Vicuna model using the original LLaVA code and now I am looking for some implementation with LLaMA 3.
I found your repo and followed your instructions from the README.md file for each step. I am able to train the model using the following bash file and it looks like it's correctly saved. NOTE: I downloaded the model from your huggingface repo
TRAINING CODE
I then tried to merge (using this script from LLaVA) the resulting adapters with the original model
LLaVA-Meta-Llama-3-8B-Instruct-FT
and I got the following error.Finally, I even tried using the adapters (without merging) with the following script but I get the same identical error. The file
llava/eval/test_llava.py
is very similar to the inference script from the original LLaVA repo, but I made very little changes for my convenience (such as--prompt-version
,--input-file-path
, etc.).TESTING CODE
Do you have any idea what I am doing wrong? I can't find anything online.
The text was updated successfully, but these errors were encountered: