运行multimodal_understanding.py报错，只改了模型从魔搭社区下载那一部分 #36

zhrli · 2024-10-24T03:19:35Z

Exception has occurred: ValueError
Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.
RuntimeError: Could not infer dtype of numpy.float32

During handling of the above exception, another exception occurred:

File "/home/lizhaorui/.cache/huggingface/modules/transformers_modules/Emu3-VisionTokenizer/image_processing_emu3visionvq.py", line 349, in preprocess
return BatchFeature(data=data, tensor_type=return_tensors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/emu3/mllm/processing_emu3.py", line 274, in tokenize_image
image_inputs = self.image_processor(image, return_tensors="pt")["pixel_values"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/emu3/mllm/processing_emu3.py", line 159, in call
image_tokens = self.tokenize_image(image, padding_image=padding_image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/multimodal_understanding.py", line 35, in
inputs = processor(
^^^^^^^^^^
ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

图片的例子是项目里的例子

ryanzhangfan · 2024-10-24T03:42:27Z

拉一下最新的代码和模型呢？我check了github，modelscope里所有最新代码，processing_emu3.py line 159都不是一行有效代码。

zhrli · 2024-10-24T04:16:10Z

拉一下最新的代码和模型呢？我check了github，modelscope里所有最新代码，processing_emu3.py line 159都不是一行有效代码。

确实能跑起来了，爆内存了。项目能不能考虑分块儿分卡执行，单卡执行需要的资源太多了。Emu3确实是我们行业里可以依赖的唯一模型，感谢智源研究院。

ryanzhangfan · 2024-10-24T04:26:48Z

模型完全兼容transformers中的各种优化方法，可以直接使用transformers或者accelerate支持的自动化分卡（仅限多模态理解模型），代码可以参考Emu2 demo code，或者使用transformers自带的int4量化。如果只是kv cache爆了也可以尝试transformers库支持的offload kvcache的方式。

zhrli · 2024-10-24T06:18:31Z

拉一下最新的代码和模型呢？我check了github，modelscope里所有最新代码，processing_emu3.py line 159都不是一行有效代码。

Traceback (most recent call last):
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/feature_extraction_utils.py", line 186, in convert_to_tensors
tensor = as_tensor(value)
^^^^^^^^^^^^^^^^
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/feature_extraction_utils.py", line 142, in as_tensor
return torch.tensor(value)
^^^^^^^^^^^^^^^^^^^
RuntimeError: Could not infer dtype of numpy.float32

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/lizhaorui/DL/Emu3/multimodal_understanding.py", line 34, in
inputs = processor(
^^^^^^^^^^
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/emu3/mllm/processing_emu3.py", line 156, in call
image_tokens = self.tokenize_image(image, padding_image=padding_image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/emu3/mllm/processing_emu3.py", line 271, in tokenize_image
image_inputs = self.image_processor(image, return_tensors="pt")["pixel_values"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/image_processing_utils.py", line 41, in call
return self.preprocess(images, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/.cache/huggingface/modules/transformers_modules/Emu3-VisionTokenizer/image_processing_emu3visionvq.py", line 349, in preprocess
return BatchFeature(data=data, tensor_type=return_tensors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/feature_extraction_utils.py", line 79, in init
self.convert_to_tensors(tensor_type=tensor_type)
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/feature_extraction_utils.py", line 192, in convert_to_tensors
raise ValueError(
ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

在运行multimodal_understanding.py时候报错仍然存在，新拉了代码

ryanzhangfan · 2024-10-24T06:44:55Z

可以check下numpy和torch的版本，目测这是np.array转torch.tensor的时候报的错误。

zhrli · 2024-10-24T06:57:50Z

可以check下numpy和torch的版本，目测这是np.array转torch.tensor的时候报的错误。

Name: torch
Version: 2.2.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, typing-extensions
Required-by: accelerate, bitsandbytes, flash_attn, torchaudio, torchvision

Name: numpy
Version: 1.26.4

ryanzhangfan · 2024-10-24T08:13:58Z

可以尝试换下numpy版本试试？看着是numpy转tensor报错，但是识别到的numpy.dtype也没啥问题。。我们的环境同样的版本torch 2.2.1, numpy 1.26.4, transformers 4.44.0是能正常运行的。如果换版本还是不行可以在/home/lizhaorui/.cache/huggingface/modules/transformers_modules/Emu3-VisionTokenizer/image_processing_emu3visionvq.py line 349前打印确认下pixel_values的dtype和shape.

zhrli · 2024-10-24T08:32:58Z

transformers自带的int4量化

print(pixel_values.shape)
(2, 3, 512, 512)

print(pixel_values.dtype)
float32

ryanzhangfan · 2024-10-24T08:42:42Z

确认下环境问题吧，仅从目前提供的信息看，看起来不太像是我们代码的问题，而是numpy.array转torch.tensor报错了。

zhrli · 2024-10-25T02:53:11Z

确认下环境问题吧，仅从目前提供的信息看，看起来不太像是我们代码的问题，而是numpy.array转torch.tensor报错了。
搞定了
pytorch 2.2.1 配合numpy1.25.2

zhrli · 2024-10-25T03:11:12Z

模型完全兼容transformers中的各种优化方法，可以直接使用transformers或者accelerate支持的自动化分卡（仅限多模态理解模型），代码可以参考Emu2 demo code，或者使用transformers自带的int4量化。如果只是kv cache爆了也可以尝试transformers库支持的offload kvcache的方式。

双卡4090跑成功了

-- coding: utf-8 --

from PIL import Image
from transformers import AutoTokenizer, AutoModel, AutoImageProcessor, AutoModelForCausalLM
from transformers import BitsAndBytesConfig # 导入量化配置
from transformers.generation.configuration_utils import GenerationConfig
import torch

from emu3.mllm.processing_emu3 import Emu3Processor

from modelscope import snapshot_download

model path

EMU_HUB = snapshot_download("BAAI/Emu3-Chat")
VQ_HUB = snapshot_download("BAAI/Emu3-VisionTokenizer")

Quantization configuration

quantization_config = BitsAndBytesConfig(
load_in_4bit=True, # 使用 int4 量化
bnb_4bit_quant_type='nf4', # 量化类型
bnb_4bit_compute_dtype=torch.bfloat16, # 计算精度
)

prepare model and processor

model = AutoModelForCausalLM.from_pretrained(
EMU_HUB,
quantization_config=quantization_config, # 使用量化配置
device_map="auto", # 自动分配到所有可用的GPU上
trust_remote_code=True,
)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(EMU_HUB, trust_remote_code=True, padding_side="left")
image_processor = AutoImageProcessor.from_pretrained(VQ_HUB, trust_remote_code=True)
image_tokenizer = AutoModel.from_pretrained(VQ_HUB, device_map="auto", trust_remote_code=True).eval() # 同样量化模型
processor = Emu3Processor(image_processor, image_tokenizer, tokenizer)

prepare input

text = ["Please describe the image", "Please describe the image"]
image = Image.open("assets/demo.png")
image = [image, image]

inputs = processor(
text=text,
image=image,
mode='U',
padding_image=True,
padding="longest",
return_tensors="pt",
)

prepare hyper parameters

GENERATION_CONFIG = GenerationConfig(pad_token_id=tokenizer.pad_token_id, bos_token_id=tokenizer.bos_token_id, eos_token_id=tokenizer.eos_token_id)

generate

outputs = model.generate(
inputs.input_ids.to("cuda:0"), # 这里应该使用 inputs.input_ids
generation_config=GENERATION_CONFIG,
max_new_tokens=1024,
attention_mask=inputs.attention_mask.to("cuda:0"),
)

outputs = outputs[:, inputs.input_ids.shape[-1]:]
answers = processor.batch_decode(outputs, skip_special_tokens=True)
for ans in answers:
print(ans)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行multimodal_understanding.py报错，只改了模型从魔搭社区下载那一部分 #36

运行multimodal_understanding.py报错，只改了模型从魔搭社区下载那一部分 #36

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 25, 2024

zhrli commented Oct 25, 2024

运行multimodal_understanding.py报错，只改了模型从魔搭社区下载那一部分 #36

运行multimodal_understanding.py报错，只改了模型从魔搭社区下载那一部分 #36

Comments

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 24, 2024

ryanzhangfan commented Oct 24, 2024

zhrli commented Oct 25, 2024

zhrli commented Oct 25, 2024

-- coding: utf-8 --

model path

Quantization configuration

prepare model and processor

prepare input

prepare hyper parameters

generate