-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: probability tensor contains either inf
, nan
or element < 0
#9
Comments
我运行colab的notebook没有问题。 |
还是报错 用的包是4.30.2的 Start Inference with instruction mode. change image:[image_path] load the image from [image_path] Image: pics/examples/food.jpg
$ pip list|grep trans |
使用的模型是在4.30.2的transformers下合并的吗?可以查看下模型文件的sha256值是否一致。
|
$ sha256sum pytorch_model-00001-of-00002.bin 模型是在4.30.2的transformers下合并的 sha256值应该是一样 |
试下不使用load_in_8bit,看能否正常运行? |
不知道为啥 换了台机器 重新来一遍就好了 |
$ python scripts/inference/inference.py --visualcla_model visualcla --image_file pics/examples/food.jpg --load_in_8bit
[INFO|tokenization_utils_base.py:1837] 2023-07-24 16:05:27,669 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:1837] 2023-07-24 16:05:27,669 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1837] 2023-07-24 16:05:27,669 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1837] 2023-07-24 16:05:27,669 >> loading file tokenizer_config.json
[WARNING|logging.py:295] 2023-07-24 16:05:27,670 >> You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
[INFO|tokenization_utils.py:426] 2023-07-24 16:05:27,697 >> Adding to the vocabulary
[INFO|tokenization_utils.py:426] 2023-07-24 16:05:27,697 >> Adding to the vocabulary
[INFO|tokenization_utils.py:426] 2023-07-24 16:05:27,697 >> Adding to the vocabulary
[INFO|tokenization_utils.py:426] 2023-07-24 16:05:27,697 >> Adding <img_token> to the vocabulary
2023-07-24 16:05:27,698 - INFO - visualcla.modeling_utils - Init VisualCLA model from pretrained
[INFO|configuration_utils.py:710] 2023-07-24 16:05:27,698 >> loading configuration file visualcla/config.json
[INFO|configuration_utils.py:768] 2023-07-24 16:05:27,699 >> Model config VisualCLAConfig {
"image_size": 224,
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"model_type": "visualcla",
"text_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": [
"LlamaForCausalLM"
],
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": 1,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 2,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "silu",
"hidden_size": 4096,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 11008,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 2048,
"min_length": 0,
"model_type": "llama",
"no_repeat_ngram_size": 0,
"num_attention_heads": 32,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": 0,
"prefix": null,
"pretraining_tp": 1,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": false,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": "float16",
"torchscript": false,
"transformers_version": "4.31.0",
"typical_p": 1.0,
"use_bfloat16": false,
"use_cache": true,
"vocab_size": 49954
},
"tie_word_embeddings": false,
"transformers_version": "4.31.0",
"use_visual_resampler": true,
"vision_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "quick_gelu",
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"image_size": 224,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 4096,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1.0,
"max_length": 20,
"min_length": 0,
"model_type": "clip_vision_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_channels": 3,
"num_hidden_layers": 24,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"patch_size": 14,
"prefix": null,
"problem_type": null,
"projection_dim": 768,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.31.0",
"typical_p": 1.0,
"use_bfloat16": false
},
"visual_resampler_config": {
"hidden_size": 1024,
"intermediate_size": 4096,
"num_attention_heads": 16,
"num_hidden_layers": 6,
"num_query_tokens": 64
},
"vocab_size": 49958
}
[INFO|configuration_utils.py:710] 2023-07-24 16:05:27,844 >> loading configuration file visualcla/text_encoder/config.json
[INFO|configuration_utils.py:768] 2023-07-24 16:05:27,845 >> Model config LlamaConfig {
"_name_or_path": "chinese-alpaca-plus-7b",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pad_token_id": 0,
"pretraining_tp": 1,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.31.0",
"use_cache": true,
"vocab_size": 49954
}
[INFO|modeling_utils.py:2600] 2023-07-24 16:05:27,845 >> loading weights file visualcla/text_encoder/pytorch_model.bin.index.json
[INFO|modeling_utils.py:1172] 2023-07-24 16:05:27,845 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:599] 2023-07-24 16:05:27,846 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.31.0"
}
[INFO|modeling_utils.py:2715] 2023-07-24 16:05:28,053 >> Detected 8-bit loading: activating 8-bit loading for this model
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00, 4.24s/it]
[INFO|modeling_utils.py:3329] 2023-07-24 16:05:44,119 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:3337] 2023-07-24 16:05:44,119 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at visualcla/text_encoder.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:559] 2023-07-24 16:05:44,122 >> loading configuration file visualcla/text_encoder/generation_config.json
[INFO|configuration_utils.py:599] 2023-07-24 16:05:44,122 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.31.0"
}
[INFO|configuration_utils.py:710] 2023-07-24 16:05:44,188 >> loading configuration file visualcla/vision_encoder/config.json
[INFO|configuration_utils.py:768] 2023-07-24 16:05:44,188 >> Model config CLIPVisionConfig {
"_name_or_path": "clip-vit-large-patch14",
"architectures": [
"CLIPVisionModel"
],
"attention_dropout": 0.0,
"dropout": 0.0,
"hidden_act": "quick_gelu",
"hidden_size": 1024,
"image_size": 224,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"model_type": "clip_vision_model",
"num_attention_heads": 16,
"num_channels": 3,
"num_hidden_layers": 24,
"patch_size": 14,
"projection_dim": 768,
"torch_dtype": "float16",
"transformers_version": "4.31.0"
}
[INFO|modeling_utils.py:2600] 2023-07-24 16:05:44,188 >> loading weights file visualcla/vision_encoder/pytorch_model.bin
[INFO|modeling_utils.py:1172] 2023-07-24 16:05:44,483 >> Instantiating CLIPVisionModel model under default dtype torch.float16.
[INFO|modeling_utils.py:3329] 2023-07-24 16:05:45,066 >> All model checkpoint weights were used when initializing CLIPVisionModel.
[INFO|modeling_utils.py:3337] 2023-07-24 16:05:45,066 >> All the weights of CLIPVisionModel were initialized from the model checkpoint at visualcla/vision_encoder.
If your task is similar to the task the model of the checkpoint was trained on, you can already use CLIPVisionModel for predictions without further training.
[INFO|image_processing_utils.py:337] 2023-07-24 16:05:46,059 >> loading configuration file visualcla/preprocessor_config.json
[INFO|image_processing_utils.py:389] 2023-07-24 16:05:46,059 >> Image processor CLIPImageProcessor {
"crop_size": {
"height": 224,
"width": 224
},
"do_center_crop": true,
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"feature_extractor_type": "CLIPFeatureExtractor",
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "CLIPImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"resample": 3,
"rescale_factor": 0.00392156862745098,
"size": {
"shortest_edge": 224
}
}
2023-07-24 16:05:46,062 - INFO - main - *** Start Inference ***
========== Usage ==========
Start Inference with instruction mode.
You can enter instruction or special control commands after '>'. Below are the usage of the control commands
change image:[image_path] load the image from [image_path]
clear Clear chat history. This command will not change the image.
exit Exit Inference
Image: pics/examples/food.jpg
The text was updated successfully, but these errors were encountered: