-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In GPU mode generated image is all black with NaN tensor values (no problems in CPU mode) #31
Comments
Seems to be related to this: pytorch/pytorch#58123 and specifically to this: pytorch/pytorch#58123 (comment) However I don't understand how this is possibile, given the bug documented at the link above is affecting cudnn releases before v8.2.2 while I'm using cudnn v8.3.0.2, which I verified by doing:
Any ideas? |
I don't have much technical insight to add, but I have seen such samples (everything black) being generated occassionally while doing transfer learning on Colab using below implementation. |
Ok I could finally make it work by installing a version of torch and torchvision coming with Specifically I downloaded In theory newer version of Torch should work too, provided they come with CUDA 10.2, eg: at least this was a requirement in my case... |
I hope this workaround works for my user who is suffering from the same issue, but... it is a very cumbersome workaround, because pytorch download page says 10.2 is no longer supported, and also, CUDA 10.2 won't work on newer GPU's such as 2070 and above, meaning the users have to download a separate thing. At least one other source that downgrading should work: https://discuss.pytorch.org/t/half-precision-convolution-cause-nan-in-forward-pass/117358/3 Wondering if anyone knows of a better workaround that doesn't involve downgrading CUDA version (edit: Wait a sec, I just realized my torch version is only 8200. Will update with more comments) |
@monsieurpooh I observed that in some other cases the following newer version also works for me:
does it work in your case? |
Thanks for your input! Updating above cuDNN 8.2.2 was sufficient to fix my issue even with cuda toolkit 11.3. It was not necessary to downgrade CUDA toolkit. I have not tried installing with cuda toolkit 11.5. But presumably that may also work; maybe the torch that has cuda 11.5 also has cuDNN of higher than 8.2.2? I noticed the default torch 1.11 only had cuDNN 8.2 or so. |
Same Problem here. (also same hardware) |
Same happens while trying Stable Diffusion with autocast/fp16. |
Hello,
For both "text2im.ipynb" and "clip_guided.ipynb" I'm seeing that the generated image is all black.
This only happens in GPU mode (Nvidia GTX 1660 TI, 6 GB), while in CPU mode the image is generated correctly.
I'm on Windows 10 using Python 3.8 and
torch-1.11.0+cu115 pypi_0 pypi
torchvision-0.12.0+cu115 pypi_0 pypi
and this environment works fine for all other ML projects I'm running.
In "text2im.ipynb" I saw that tensor values become NaN in the model_fn function, when model() is called:
As I tried to track down the problem a bit further, I found that the values start getting wrong in the forward function of "text2im_model.py":
glide-text2im/glide_text2im/text2im_model.py
Lines 123 to 142 in 69b5307
specifically at line 133, where module is called:
glide-text2im/glide_text2im/text2im_model.py
Lines 133 to 135 in 69b5307
Here, at iteration # 2 some values become NaN and at iteration # 6 all values become NaN.
Please take a look:
As you can see at this point only some values have become NaN.
This remain like so until iteration # 6, where, after the module call ALL values become NaN:
With my limited knowledge of this field this is all I could find.
Please let me know if there some other info I can provide.
The text was updated successfully, but these errors were encountered: