Distributed inference with JAX: GPU/TPU interconnect #23216

jwang194 · 2024-08-23T20:13:47Z

jwang194
Aug 23, 2024

We’re interested in acquiring a machine with several GPUs in order to begin some benchmarking of JAX/Tensorflow's sharded HMC framework. As part of our choice of machine, we’d like to know if GPU interconnect topology can significantly reduce overhead. For example, is the jax.psum() backbone of the sharded log density built on a Reduce operation, and can it therefore leverage GPU-GPU direct communication via e.g. NVLink?

More generally, is there reason to believe that the current implementation of sharded HMC can take advantage of a toroidal or fully connected GPU network?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed inference with JAX: GPU/TPU interconnect #23216

{{title}}

Replies: 0 comments

Select a reply

Distributed inference with JAX: GPU/TPU interconnect #23216

jwang194 Aug 23, 2024

Replies: 0 comments

jwang194
Aug 23, 2024