You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to profile a custom training loop in tensorflow following the documentation in tensorflow's trace.py,
tf.profiler.experimental.start('logdir')
for step in range(num_steps):
# Creates a trace event for each training step with the
# step number.
with tf.profiler.experimental.Trace("Train", step_num=step):
train_fn()
tf.profiler.experimental.stop()
the error aqlprofile API table load failed: HSA_STATUS_ERROR: A generic error has occurred. is thrown and the program halts. I have hsa-amd-aqlprofile-bin, rocprofile and roctracer installed, built from source from the PKGBUILDs provided here. I use a gfx900 card, and tensorflow 2.6.0 also built from source with a customized PKGBUILD based on tensorflow-rocm also provided in rocm-arch.
Would someone try to use profiling in tensorflow+rocm and see if they encounter the same error?
This discussion was converted from issue #628 on February 17, 2022 16:38.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
When attempting to profile a custom training loop in tensorflow following the documentation in tensorflow's trace.py,
the error
aqlprofile API table load failed: HSA_STATUS_ERROR: A generic error has occurred.
is thrown and the program halts. I have hsa-amd-aqlprofile-bin, rocprofile and roctracer installed, built from source from the PKGBUILDs provided here. I use a gfx900 card, and tensorflow 2.6.0 also built from source with a customized PKGBUILD based on tensorflow-rocm also provided in rocm-arch.Would someone try to use profiling in tensorflow+rocm and see if they encounter the same error?
Beta Was this translation helpful? Give feedback.
All reactions