You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.
Hello! You are doing a very cool project that helps ordinary users to solve the problem with the lack of memory in the video card. But unfortunately my neural network training ends up with an error.
Sometimes this error is:
F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:136] Check failed: this->H2D_stream_->ok()
Aborted (core dumped)
Sometimes:
F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:175] Check failed: this->D2H_stream_->ok()
Aborted (core dumped)
Sometimes errors appear immediately, sometimes in the middle of epoch.
I am using LMS with tensorflow 2.2.0.
The text was updated successfully, but these errors were encountered:
Thanks for trying out LMS. LMS is not being actively maintained, hence the 2.2.0 version. It has been nearly 2 years since I ran a job with LMS, but I recall that those types of errors would come out during some types of out of memory errors.
Even with LMS enabled it is possible to run out of memory on the GPU in certain cases such as when the model requires too many active tensors, or individual operations have input and output tensors so large they cannot fit.
If you provide more error messages / output from your call I may be able to help more.
Thank you for the clarification! I am using a GeForce GTX 1060 3GB and this card probably has very little video memory. I saw that if I reduce the batch size, the error disappears. This is all that I can write to you, except for the above, nothing appears in the console.
Were you able to reason out why sometimes the error would come at the beginning of the process. Correct me if I am wrong if there's a large batch size then it would start the computation on the GPU and when a tensor is no longer in use it should be sent back to the host machine for storage. A lot like paging algorithm so it should not be problem to handle large batch sizes
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello! You are doing a very cool project that helps ordinary users to solve the problem with the lack of memory in the video card. But unfortunately my neural network training ends up with an error.
Sometimes this error is:
F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:136] Check failed: this->H2D_stream_->ok()
Aborted (core dumped)
Sometimes:
F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:175] Check failed: this->D2H_stream_->ok()
Aborted (core dumped)
Sometimes errors appear immediately, sometimes in the middle of epoch.
I am using LMS with tensorflow 2.2.0.
The text was updated successfully, but these errors were encountered: