Cleans tensorParallelDegree with MultiDevice #1222

zachgk · 2023-10-26T00:58:02Z

This is a refactor to simplify the handling of tensor parallel degree. Before, it is read independently in 3+ locations in code and the behavior determining the tpDegree is hard to follow. This moves the reading to a single place and then represents it using a MultiDevice. This also means that the behavior can be entirely visible - a worker group will show up with the tp devices that it's worker will be using.

sindhuvahinis · 2023-10-27T00:14:06Z

Now that we have MultiDevice, Here getVisibleDevices could make of use of this as well right? https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/src/main/java/ai/djl/python/engine/Connection.java#L207

zachgk · 2023-10-27T00:46:14Z

Now that we have MultiDevice, Here getVisibleDevices could make of use of this as well right? https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/src/main/java/ai/djl/python/engine/Connection.java#L207

Yeah. The engine would already have the list of devices that should be part of the connection already. It doesn't use the CUDA_VISIBLE_DEVICES though.

Also, I talked with @frankfliu and we may save this PR for post-reinvent as it is hard to get the behavior right and we don't test every case even in integration. Although, I did add a bunch of tests which may be sufficient

This is a refactor to simplify the handling of tensor parallel degree. Before, it is read independently in 3+ locations in code and the behavior determining the tpDegree is hard to follow. This moves the reading to a single place and then represents it using a MultiDevice. This also means that the behavior can be entirely visible - a worker group will show up with the tp devices that it's worker will be using.

zachgk requested review from frankfliu and a team as code owners October 26, 2023 00:58

sindhuvahinis approved these changes Oct 26, 2023

View reviewed changes

zachgk force-pushed the tpDegree branch from 093fef8 to f0f1ab3 Compare December 11, 2023 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleans tensorParallelDegree with MultiDevice #1222

Cleans tensorParallelDegree with MultiDevice #1222

zachgk commented Oct 26, 2023

sindhuvahinis commented Oct 27, 2023

zachgk commented Oct 27, 2023

Cleans tensorParallelDegree with MultiDevice #1222

Are you sure you want to change the base?

Cleans tensorParallelDegree with MultiDevice #1222

Conversation

zachgk commented Oct 26, 2023

sindhuvahinis commented Oct 27, 2023

zachgk commented Oct 27, 2023