RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #4

Phoebe-ovo · 2023-10-18T12:31:09Z

Hello, when I evaluate for Perception and Action Prediction, I got this error for decapoda-research/llama-7b-hf.
How can I fix this? Thanks!

xzebin775 · 2023-10-21T12:20:54Z

Hi, have you solved it, I come across the same problem

melights · 2023-10-24T13:16:57Z

Hi @Phoebe-ovo @xzebin775 thanks for reporting this issue! We are not able to reproduce this error on the GPUs we have. Could you please let me know what GPUs were you using?
Also, can you try setting load_in_8bit to False to see if this issue can be solved?

Phoebe-ovo · 2023-10-24T13:33:39Z

The GPU I used is V100, what GPUs were you using?

xzebin775 · 2023-10-25T01:13:53Z

It is GTX 1080 Ti.

melights · 2023-10-26T13:47:24Z

Thanks for confirming. Can you try if setting load_in_8bit to False in here solves the problem?

xzebin775 · 2023-10-27T09:30:37Z

I set load_in_8bit to False, but I get the error below. It seems I cann't load the model to GPU

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.92 GiB total capacity; 10.44 GiB already allocated; 22.62 MiB free; 10.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

melights · 2023-10-31T14:03:31Z

Upon thorough investigation, we are not able to reproduce the error on the GPUs we have (NVIDIA A100 and 3090), but it might related to other issues. I suggest you try these:

Try clean up the virtualenv (rm env -rf), pull the latest main and setup the environment following the Setup section in the README
Try using Python 3.9+

Also, we noticed the base model we used decapoda-research/llama-7b-hf was removed by the author in the huggingface model repo and we are testing the workarounds.

xjturjc · 2024-01-05T07:47:01Z

I meet the same problem, and I set "do_sample" = False, then it worked. Don't know what impact this will have. (same with GPUV100)

kevinchiu19 · 2024-02-27T08:36:10Z

Change 'decapoda-research/llama-7b-hf' to 'huggyllama/llama-7b' and 'load_in_8bit=False'.

It works for me. (My env v100)

uniquezhengjie · 2024-03-05T07:55:54Z

Change 'decapoda-research/llama-7b-hf' to 'huggyllama/llama-7b' and 'load_in_8bit=False'.

It works for me. (My env v100)

I get error:
ValueError: The device_map provided does not give any device for the following parameters: base_model.model.weighted_mask

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #4

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #4

Phoebe-ovo commented Oct 18, 2023

xzebin775 commented Oct 21, 2023

melights commented Oct 24, 2023

Phoebe-ovo commented Oct 24, 2023

xzebin775 commented Oct 25, 2023

melights commented Oct 26, 2023

xzebin775 commented Oct 27, 2023

melights commented Oct 31, 2023

xjturjc commented Jan 5, 2024

kevinchiu19 commented Feb 27, 2024

uniquezhengjie commented Mar 5, 2024

RuntimeError: probability tensor contains either inf, nan or element < 0 #4

RuntimeError: probability tensor contains either inf, nan or element < 0 #4

Comments

Phoebe-ovo commented Oct 18, 2023

xzebin775 commented Oct 21, 2023

melights commented Oct 24, 2023

Phoebe-ovo commented Oct 24, 2023

xzebin775 commented Oct 25, 2023

melights commented Oct 26, 2023

xzebin775 commented Oct 27, 2023

melights commented Oct 31, 2023

xjturjc commented Jan 5, 2024

kevinchiu19 commented Feb 27, 2024

uniquezhengjie commented Mar 5, 2024

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #4

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #4