MPI GPU interface refactoring #2577

ethanglaser · 2023-11-14T21:33:25Z

Description

Changes proposed in this pull request:

Add virtual get_mpi_offload_support function to base communicator - defaults to false in nearly all cases
Add logic to get_mpi_offload_support function in mpi/communicator.h to check mpi libs for correct symbol and determine if level zero is supported
Add conditional in detail/communicator.cpp that uses result of get_mpi_offload_support to determine whether to convert data to host (previous default) or leave as is (yields performance improvements if GPU offload support in MPI)
Modify sendrecv_replace args to include optional additional buffer to accommodate MPICH workaround to call sendrecv with 2 GPU buffers

ethanglaser · 2023-11-14T21:36:01Z

/intelci: run

Alexandr-Solovev

Looks as a great opportunity to get more speedup across all algorithms

cpp/oneapi/dal/detail/communicator.cpp

cpp/oneapi/dal/test/engine/thread_communicator.hpp

ethanglaser · 2023-11-15T22:50:50Z

Looks as a great opportunity to get more speedup across all algorithms

Thanks! Yeah its pretty ugly right now, working towards functional first then will clean things up. But good points.

ethanglaser · 2023-12-19T14:50:58Z

/intelci: run

* Profiling additions for benchmarking * dblock cap+last iter,split_table profile,var names * trying revert of data_management * custom max and split table event * address some todos and cleanup finalize * remove temp_resp_ + clang * send recv replace debug * updated debug * extended profiling * temporary for CI build * cleanup and removal of unneeded profiling * syncing data_management with master * I_MPI_OFFLOAD condition for green bazel * temporary conditionals add for bench * for bench only * detailed select_indexed profiling * removing select_indexed_local calls * restoring communicator (see #2577) * select_indexed debugging removals * search_dpc debugging cleanup * knn cleanup and clang * single gpu/distributed unification * addressing comments * correction to previous * clean up comments * addressing some comments * clang

ethanglaser · 2024-01-02T14:29:47Z

/intelci: run

ethanglaser · 2024-01-02T16:14:38Z

/intelci: run

ethanglaser · 2024-01-02T17:16:05Z

/intelci: run

ethanglaser · 2024-01-03T22:47:00Z

/intelci: run

ethanglaser · 2024-01-08T16:25:45Z

/intelci: run

ethanglaser · 2024-01-11T16:33:59Z

/intelci: run

ethanglaser · 2024-01-29T19:51:02Z

/intelci: run

ethanglaser · 2024-01-30T18:34:45Z

/intelci: run

ethanglaser · 2024-02-13T18:43:57Z

/intelci: run

ethanglaser · 2024-05-29T22:29:24Z

/intelci: run

ethanglaser · 2024-05-30T17:27:45Z

/intelci: run

ethanglaser · 2024-05-31T18:31:46Z

/intelci: run

ethanglaser · 2024-05-31T23:40:16Z

Job with infra branch: http://intel-ci.intel.com/ef1f7d20-65f3-f16f-89f4-a4bf010d0e2e

ethanglaser · 2024-06-03T19:01:53Z

/intelci: run

ethanglaser · 2024-06-03T20:48:41Z

/intelci: run

ethanglaser · 2024-06-04T20:46:06Z

/intelci: run

ethanglaser · 2024-06-06T00:29:21Z

Job with infra branch: http://intel-ci.intel.com/ef2396dd-0148-f19e-b1e2-a4bf010d0e2e

ethanglaser · 2024-06-18T13:31:28Z

/intelci: run

ethanglaser · 2024-06-18T21:18:32Z

Job with infra branch: http://intel-ci.intel.com/ef2396dd-0148-f19e-b1e2-a4bf010d0e2e

Updated job: http://intel-ci.intel.com/ef2dbbfc-328c-f15d-82ad-a4bf010d0e2e

cpp/oneapi/dal/detail/mpi/communicator.hpp

host transfers to thread_comm, dev upds

98e9a9d

ethanglaser changed the title ~~host transfers to thread_comm, dev upds~~ MPI GPU interface refactoring Nov 14, 2023

Alexandr-Solovev reviewed Nov 15, 2023

View reviewed changes

cpp/oneapi/dal/detail/communicator.cpp Show resolved Hide resolved

cpp/oneapi/dal/test/engine/thread_communicator.hpp Outdated Show resolved Hide resolved

ethanglaser added a commit to ethanglaser/oneDAL that referenced this pull request Dec 1, 2023

restoring communicator (see oneapi-src#2577)

433bae9

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

eb6d8c1

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

4e15049

memory from detail not backend

125172a

add detail prefix

8ae3504

ethanglaser added 2 commits January 3, 2024 14:45

using memcpy with policy instead of queue

c3f5729

clang

bee8fd1

temp for debug build

3e4bac4

ethanglaser and others added 3 commits January 11, 2024 12:12

revert last

39eb70b

forgot to add deps wait

e506250

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

647d2cf

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

3c9cd24

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

a5be0e5

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

36989ee

ethanglaser and others added 3 commits May 29, 2024 15:26

add function for workaround

5591e41

clang

7050d6f

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

06c96bb

another debug...

3693d5a

ethanglaser added 2 commits May 31, 2024 10:14

alternative workaround

39fd503

remove debug

46aae4a

create function to identify mpi backend

4f7f694

revert previous

95705a1

revised workaround condition

eee67ec

ethanglaser requested a review from Alexandr-Solovev June 6, 2024 00:34

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

29528c9

ethanglaser added 2 commits August 9, 2024 10:06

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

7ca9614

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

a81899d

ethanglaser commented Aug 17, 2024

View reviewed changes

cpp/oneapi/dal/detail/mpi/communicator.hpp Show resolved Hide resolved

Temporarily comment mpich sendrecv workaround

a2c95d9

ethanglaser commented Aug 17, 2024

View reviewed changes

cpp/oneapi/dal/detail/mpi/communicator.hpp Show resolved Hide resolved

ethanglaser and others added 3 commits August 17, 2024 14:37

Update cpp/oneapi/dal/detail/mpi/communicator.hpp

70e2a89

Merge branch 'oneapi-src:main' into dev/eglaser-mpi-gpu

c5d02cf

remove temporary workaround

8e0ee3c

ethanglaser merged commit a8df345 into oneapi-src:main Aug 30, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI GPU interface refactoring #2577

MPI GPU interface refactoring #2577

ethanglaser commented Nov 14, 2023 •

edited

Loading

ethanglaser commented Nov 14, 2023

Alexandr-Solovev left a comment

ethanglaser commented Nov 15, 2023

ethanglaser commented Dec 19, 2023

ethanglaser commented Jan 2, 2024

ethanglaser commented Jan 2, 2024

ethanglaser commented Jan 2, 2024

ethanglaser commented Jan 3, 2024

ethanglaser commented Jan 8, 2024

ethanglaser commented Jan 11, 2024

ethanglaser commented Jan 29, 2024

ethanglaser commented Jan 30, 2024

ethanglaser commented Feb 13, 2024

ethanglaser commented May 29, 2024

ethanglaser commented May 30, 2024

ethanglaser commented May 31, 2024

ethanglaser commented May 31, 2024

ethanglaser commented Jun 3, 2024

ethanglaser commented Jun 3, 2024

ethanglaser commented Jun 4, 2024

ethanglaser commented Jun 6, 2024

ethanglaser commented Jun 18, 2024

ethanglaser commented Jun 18, 2024 •

edited

Loading

MPI GPU interface refactoring #2577

MPI GPU interface refactoring #2577

Conversation

ethanglaser commented Nov 14, 2023 • edited Loading

Description

ethanglaser commented Nov 14, 2023

Alexandr-Solovev left a comment

Choose a reason for hiding this comment

ethanglaser commented Nov 15, 2023

ethanglaser commented Dec 19, 2023

ethanglaser commented Jan 2, 2024

ethanglaser commented Jan 2, 2024

ethanglaser commented Jan 2, 2024

ethanglaser commented Jan 3, 2024

ethanglaser commented Jan 8, 2024

ethanglaser commented Jan 11, 2024

ethanglaser commented Jan 29, 2024

ethanglaser commented Jan 30, 2024

ethanglaser commented Feb 13, 2024

ethanglaser commented May 29, 2024

ethanglaser commented May 30, 2024

ethanglaser commented May 31, 2024

ethanglaser commented May 31, 2024

ethanglaser commented Jun 3, 2024

ethanglaser commented Jun 3, 2024

ethanglaser commented Jun 4, 2024

ethanglaser commented Jun 6, 2024

ethanglaser commented Jun 18, 2024

ethanglaser commented Jun 18, 2024 • edited Loading

ethanglaser commented Nov 14, 2023 •

edited

Loading

ethanglaser commented Jun 18, 2024 •

edited

Loading