Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM Error when running evalation step #2

Open
wtheune opened this issue Jan 9, 2024 · 0 comments
Open

OOM Error when running evalation step #2

wtheune opened this issue Jan 9, 2024 · 0 comments

Comments

@wtheune
Copy link

wtheune commented Jan 9, 2024

Hi there! Im getting an out of memory error when running the evalation code 'python main.py --inference_option Evaluation'. I tried it on an A10 20GB and A100 80GB but getting the same error either way.

2024-01-09 15:35:39.453000: W external/local_tsl/tsl/framework/bfc_allocator.cc:497] ******************************************************************************__________________
2024-01-09 15:35:39.453020: W tensorflow/core/framework/op_kernel.cc:1827] RESOURCE_EXHAUSTED: failed to allocate memory
Traceback (most recent call last):
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/main.py", line 694, in
run_eval_model(FLAGS)
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/main.py", line 664, in run_eval_model
aff_preds = model([affinity_data_val[0], affinity_data_val[1]], training=False)[1]
File "miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/transformer_encoder.py", line 174, in call
x, attn_enc_w = layer(x, mask)
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/transformer_encoder.py", line 83, in call
attn_out, attn_w = self.mha_layer([x, x, x], mask=mask)
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/mha_layer.py", line 154, in call
attention_output, attention_weights = self.attention([query, key, value], mask=mask)
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/mha_layer.py", line 62, in call
scaled_attention_scores += (mask * -1e9)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Exception encountered when calling layer 'scaled_dot_product_attention' (type scaled_dot_product_attention).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:AddV2] name:

Call arguments received by layer 'scaled_dot_product_attention' (type scaled_dot_product_attention):
• inputs=['tf.Tensor(shape=(4867, 4, 576, 64), dtype=float32)', 'tf.Tensor(shape=(4867, 4, 576, 64), dtype=float32)', 'tf.Tensor(shape=(4867, 4, 576, 64), dtype=float32)']
• mask=tf.Tensor(shape=(4867, 1, 1, 576), dtype=float32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant