Aborted (core dumped) #56

vshkurin · 2022-01-03T17:18:59Z

Hello! You are doing a very cool project that helps ordinary users to solve the problem with the lack of memory in the video card. But unfortunately my neural network training ends up with an error.

Sometimes this error is:

F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:136] Check failed: this->H2D_stream_->ok()
Aborted (core dumped)

Sometimes:

F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:175] Check failed: this->D2H_stream_->ok()
Aborted (core dumped)

Sometimes errors appear immediately, sometimes in the middle of epoch.

I am using LMS with tensorflow 2.2.0.

smatzek · 2022-01-04T15:18:10Z

Thanks for trying out LMS. LMS is not being actively maintained, hence the 2.2.0 version. It has been nearly 2 years since I ran a job with LMS, but I recall that those types of errors would come out during some types of out of memory errors.

Even with LMS enabled it is possible to run out of memory on the GPU in certain cases such as when the model requires too many active tensors, or individual operations have input and output tensors so large they cannot fit.

If you provide more error messages / output from your call I may be able to help more.

vshkurin · 2022-01-05T14:07:55Z

Thank you for the clarification! I am using a GeForce GTX 1060 3GB and this card probably has very little video memory. I saw that if I reduce the batch size, the error disappears. This is all that I can write to you, except for the above, nothing appears in the console.

HarshaanNiles010 · 2022-12-05T22:43:39Z

Were you able to reason out why sometimes the error would come at the beginning of the process. Correct me if I am wrong if there's a large batch size then it would start the computation on the GPU and when a tensor is no longer in use it should be sent back to the host machine for storage. A lot like paging algorithm so it should not be problem to handle large batch sizes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborted (core dumped) #56

Aborted (core dumped) #56

vshkurin commented Jan 3, 2022

smatzek commented Jan 4, 2022

vshkurin commented Jan 5, 2022

HarshaanNiles010 commented Dec 5, 2022

Aborted (core dumped) #56

Aborted (core dumped) #56

Comments

vshkurin commented Jan 3, 2022

smatzek commented Jan 4, 2022

vshkurin commented Jan 5, 2022

HarshaanNiles010 commented Dec 5, 2022