Skip to content

Commit

Permalink
deploy: 5a40444
Browse files Browse the repository at this point in the history
  • Loading branch information
zingale committed Nov 21, 2024
1 parent 8e9acdf commit 6d8eee7
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 8 deletions.
15 changes: 12 additions & 3 deletions _sources/olcf-workflow.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,16 @@ If it doesn't crash with the trace, then try:
interrupt
bt

It might say that the memory location is not precise, to enable precise
memory, in the debugger, do:

.. prompt::
:prompts: (gdb)

set amdgpu precise-memory on
show amdgpu precise-memory

and rerun.



Expand All @@ -222,11 +231,11 @@ Workaround to prevent hangs for collectives:
export FI_MR_CACHE_MONITOR=memhooks


Some AMReX reports are that it hangs if the initial Arena size is too big, and we should do
Some AMReX reports are that it hangs if the initial Arena size is too
big, and we should do

::

amrex.the_arena_init_size=0

The arena size would then grow as needed with time. There is a suggestion that if the size is
larger than
The arena size would then grow as needed with time.
13 changes: 9 additions & 4 deletions olcf-workflow.html
Original file line number Diff line number Diff line change
Expand Up @@ -229,19 +229,24 @@ <h3>Debugging<a class="headerlink" href="#debugging" title="Link to this heading
</pre></div></div><p>If it doesn’t crash with the trace, then try:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span class="prompt2">interrupt</span>
<span class="prompt2">bt</span>
</pre></div></div></section>
</pre></div></div><p>It might say that the memory location is not precise, to enable precise
memory, in the debugger, do:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span class="prompt2">set amdgpu precise-memory on</span>
<span class="prompt2">show amdgpu precise-memory</span>
</pre></div></div><p>and rerun.</p>
</section>
<section id="troubleshooting">
<h3>Troubleshooting<a class="headerlink" href="#troubleshooting" title="Link to this heading"></a></h3>
<p>Workaround to prevent hangs for collectives:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">FI_MR_CACHE_MONITOR</span><span class="o">=</span>memhooks
</pre></div>
</div>
<p>Some AMReX reports are that it hangs if the initial Arena size is too big, and we should do</p>
<p>Some AMReX reports are that it hangs if the initial Arena size is too
big, and we should do</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>amrex.the_arena_init_size<span class="o">=</span><span class="m">0</span>
</pre></div>
</div>
<p>The arena size would then grow as needed with time. There is a suggestion that if the size is
larger than</p>
<p>The arena size would then grow as needed with time.</p>
</section>
</section>
</section>
Expand Down
Loading

0 comments on commit 6d8eee7

Please sign in to comment.