tutel is slower than the naive p2p using 2DH for small scale #214

DongyuXu77 · 2023-08-27T15:55:46Z

Hello, and thank you for providing this excellent codebase. I have a question regarding our experiment, where we observed that the 2DH inter-node communication is slower compared to the naive peer-to-peer approach, despite both methods transmitting the same amount of data. This observation contradicts our initial intuition, and I'm curious about the potential reasons behind it?

ghostplant · 2023-08-29T07:01:20Z

Did you run with large numbers of GPUs? In our experiment, 2DH gradually becomes faster than Linear A2A when the number of distributed A100s are at least over 256. Otherwise, 2DH is slower than Linear A2A in smaller scale.

DongyuXu77 · 2023-08-29T12:04:42Z

Oh, that's may be the reason. In our experiment, the number of experts is 32 , which is less that 256

ghostplant · 2023-08-30T05:24:33Z

Yes, 2DH is faster to serve large scales only.

ghostplant changed the title ~~tutel is slower than the naive p2p~~ tutel is slower than the naive p2p using 2DH for small scale Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutel is slower than the naive p2p using 2DH for small scale #214

tutel is slower than the naive p2p using 2DH for small scale #214

DongyuXu77 commented Aug 27, 2023

ghostplant commented Aug 29, 2023

DongyuXu77 commented Aug 29, 2023

ghostplant commented Aug 30, 2023 •

edited

Loading

tutel is slower than the naive p2p using 2DH for small scale #214

tutel is slower than the naive p2p using 2DH for small scale #214

Comments

DongyuXu77 commented Aug 27, 2023

ghostplant commented Aug 29, 2023

DongyuXu77 commented Aug 29, 2023

ghostplant commented Aug 30, 2023 • edited Loading

ghostplant commented Aug 30, 2023 •

edited

Loading