Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI GPU interface refactoring #2577

Merged
merged 81 commits into from
Aug 30, 2024

Conversation

ethanglaser
Copy link
Contributor

@ethanglaser ethanglaser commented Nov 14, 2023

Description

Changes proposed in this pull request:

  • Add virtual get_mpi_offload_support function to base communicator - defaults to false in nearly all cases
  • Add logic to get_mpi_offload_support function in mpi/communicator.h to check mpi libs for correct symbol and determine if level zero is supported
  • Add conditional in detail/communicator.cpp that uses result of get_mpi_offload_support to determine whether to convert data to host (previous default) or leave as is (yields performance improvements if GPU offload support in MPI)
  • Modify sendrecv_replace args to include optional additional buffer to accommodate MPICH workaround to call sendrecv with 2 GPU buffers

@ethanglaser ethanglaser changed the title host transfers to thread_comm, dev upds MPI GPU interface refactoring Nov 14, 2023
@ethanglaser
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@Alexandr-Solovev Alexandr-Solovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks as a great opportunity to get more speedup across all algorithms

cpp/oneapi/dal/detail/communicator.cpp Show resolved Hide resolved
cpp/oneapi/dal/test/engine/thread_communicator.hpp Outdated Show resolved Hide resolved
@ethanglaser
Copy link
Contributor Author

Looks as a great opportunity to get more speedup across all algorithms

Thanks! Yeah its pretty ugly right now, working towards functional first then will clean things up. But good points.

ethanglaser added a commit to ethanglaser/oneDAL that referenced this pull request Dec 1, 2023
@ethanglaser
Copy link
Contributor Author

/intelci: run

ethanglaser added a commit that referenced this pull request Dec 21, 2023
* Profiling additions for benchmarking

* dblock cap+last iter,split_table profile,var names

* trying revert of data_management

* custom max and split table event

* address some todos and cleanup finalize

* remove temp_resp_ + clang

* send recv replace debug

* updated debug

* extended profiling

* temporary for CI build

* cleanup and removal of unneeded profiling

* syncing data_management with master

* I_MPI_OFFLOAD condition for green bazel

* temporary conditionals add for bench

* for bench only

* detailed select_indexed profiling

* removing select_indexed_local calls

* restoring communicator (see #2577)

* select_indexed debugging removals

* search_dpc debugging cleanup

* knn cleanup and clang

* single gpu/distributed unification

* addressing comments

* correction to previous

* clean up comments

* addressing some comments

* clang
@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

1 similar comment
@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

ethanglaser commented Jun 18, 2024

Job with infra branch: http://intel-ci.intel.com/ef2396dd-0148-f19e-b1e2-a4bf010d0e2e

Updated job: http://intel-ci.intel.com/ef2dbbfc-328c-f15d-82ad-a4bf010d0e2e

@ethanglaser ethanglaser merged commit a8df345 into oneapi-src:main Aug 30, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants