-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can allreduce (and other multi-GPU communications) work via SLI? #1
Comments
For fast allreduce - you would want all 4 on the same PCI-E root complex. These motherboards are not always cheap - on a 4 GPU motherboard - they will probably split the 4 GPUs across two root complexes. We haven't tried SLI - we use OpenMPI as our transport layer. If OpenMPI supports SLI in their Byte Transport Layer (btl) then allreduce will automatically work across it. Updating with what I learned from skimming the SLI docs - the communication is handled by the driver so that you can do alternate frame rendering. So this is very much only used for graphics and only the driver (I am assuming) knows the communication protocol across SLI. So I think you are out of luck trying to use SLI for allreduce. |
Added caching of handles for libxsmm forward convolutions
Thanks a ton for your take on this. I appreciate it! I fear you are right that SLI is too little general purpose and just for splitting up graphics rendering and physics calc in games or VR, specifically prepared for that. And then I could get nowhere hold of any transfer speed numbers. Upon reading up more: The compromise with PCIe: Due to CPU PCIe lane restrictions in consumer hardware, lanes are multiplexed and four physical x16 PCIe slots mostly end up as four logical x8 ones. (or x16/x16 or x16/x8/x8) I guess a perf prognosis with the different factors (transfer speed, latency,...) would be very hard to make, so I'll simply take the plunge in the coming weeks. Thanks again |
Your best bet is to buy the motherboard that has the maximum number of GPUs per PCI-E root complex and that fits in your budget. You have to go through the specs of the motherboard to figure this out - it is not always spelled out clearly. Tyan or SuperMicro should have something. |
For quite some time I've been wondering how to get a "poor-man's" 4-GPU-accelerated DL machine at maximum price / performance ratio.
PCIe is said to be the communications bottleneck.
DGX-1 @ 125k is no-go, DIGITS DevBox hardware is partly jaded and not exactly a snap at 15k...
Now there will be the GTX 1080TI @ half the Titan price... What would be a 4-way hardware @ rock-bottom price / perf?
Nvidia says SLI is fast, but not how fast. Any chance to use SLI as comm between the GPUs for allreduce?
Thanks
G.
The text was updated successfully, but these errors were encountered: