You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We’re interested in acquiring a machine with several GPUs in order to begin some benchmarking of JAX/Tensorflow's sharded HMC framework. As part of our choice of machine, we’d like to know if GPU interconnect topology can significantly reduce overhead. For example, is the jax.psum() backbone of the sharded log density built on a Reduce operation, and can it therefore leverage GPU-GPU direct communication via e.g. NVLink?
More generally, is there reason to believe that the current implementation of sharded HMC can take advantage of a toroidal or fully connected GPU network?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We’re interested in acquiring a machine with several GPUs in order to begin some benchmarking of JAX/Tensorflow's sharded HMC framework. As part of our choice of machine, we’d like to know if GPU interconnect topology can significantly reduce overhead. For example, is the jax.psum() backbone of the sharded log density built on a Reduce operation, and can it therefore leverage GPU-GPU direct communication via e.g. NVLink?
More generally, is there reason to believe that the current implementation of sharded HMC can take advantage of a toroidal or fully connected GPU network?
Beta Was this translation helpful? Give feedback.
All reactions