-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch4/ofi: refactor gpu pipeline #6891
base: main
Are you sure you want to change the base?
Commits on Mar 5, 2024
-
misc: rename MPIR_gpu_req to MPIR_async_req
MPIR_gpu_req is a union type for either a MPL_gpu_request or a MPIR_Typerep_req, thus it is not just for gpu. Potentially this type can be extended to include other internal async task handles. Thus we rename it to MPIR_async_req. We also establish the convention of naming the variable async_req.
Configuration menu - View commit details
-
Copy full SHA for b2074d1 - Browse repository at this point
Copy the full SHA b2074d1View commit details -
Add an inline wrapper for testing MPIR_async_req. Modify the order of header inclusion due to the dependency on typerep_pre.h.
Configuration menu - View commit details
-
Copy full SHA for 20464b6 - Browse repository at this point
Copy the full SHA 20464b6View commit details -
Configuration menu - View commit details
-
Copy full SHA for e0e64ee - Browse repository at this point
Copy the full SHA e0e64eeView commit details -
ch4/ofi: refactor pipeline recv async copy
Refactor the async copy in receive events using MPIR_async facilities.
Configuration menu - View commit details
-
Copy full SHA for 010f231 - Browse repository at this point
Copy the full SHA 010f231View commit details -
ch4/ofi: refactor pipeline send async copy
Refactor the async copy before sending a chunk.
Configuration menu - View commit details
-
Copy full SHA for 79f020f - Browse repository at this point
Copy the full SHA 79f020fView commit details -
ch4/ofi: remove MPIDI_OFI_gpu_progress_task
Both gpu_send_task_queue and gpu_recv_task_queue have been ported to async things.
Configuration menu - View commit details
-
Copy full SHA for 70469a4 - Browse repository at this point
Copy the full SHA 70469a4View commit details -
ch4/ofi: refactor pipeline send
Pipeline send allocates chunk buffers then spawns async copy. The allocation may run out of genq buffers, thus it is disigned as async tasks. The send copy are triggered upon completion of buffer alloc, thus it is renamed into spawn_send_copy and turned into internal static function. This removes MPIDI_OFI_global.gpu_send_queue.
Configuration menu - View commit details
-
Copy full SHA for 0ee16a6 - Browse repository at this point
Copy the full SHA 0ee16a6View commit details -
ch4/ofi: refactor pipeline recv
Pipeline recv allocates chunk buffers and then post fi_trecv. The allocation may run out of genq buffers and we also control the number of outstanding recvs, thus it is designed as async tasks. The async recv copy are triggered in recv event when data arrives. This removes MPIDI_OFI_global.gpu_recv_queue. All ofi-layer progress routines for gpu pipelining are now removed.
Configuration menu - View commit details
-
Copy full SHA for e297eee - Browse repository at this point
Copy the full SHA e297eeeView commit details -
ch4/ofi: move gpu pipeline events into ofi_gpu_pipeline.c
Consolidate the gpu pipeline code. MPIDI_OFI_gpu_pipeline_request is now an internal struct in ofi_gpu_pipeline.c, rename to struct chunk_req. MPIDI_OFI_gpu_pipeline_recv_copy is now an internal function, rename to start_recv_copy.
Configuration menu - View commit details
-
Copy full SHA for 1756c22 - Browse repository at this point
Copy the full SHA 1756c22View commit details -
ch4/ofi: move all gpu pipeline code into ofi_gpu_pipeline.c
Move all gpu pipeline specific code into ofi_gpu_pipeline.c. Make a new function MPIDI_OFI_gpu_pipeline_recv that fills rreq with persistent pipeline_info data. Rename the original MPIDI_OFI_gpu_pipeline_recv into static function start_recv_chunk.
Configuration menu - View commit details
-
Copy full SHA for 4baf414 - Browse repository at this point
Copy the full SHA 4baf414View commit details -
ch4/ofi: refactor pipeline_info into a union
Make the code cleaner to separate the pipeline_info type into a union of send and recv.
Configuration menu - View commit details
-
Copy full SHA for 4ed7909 - Browse repository at this point
Copy the full SHA 4ed7909View commit details -
ch4/ofi: use explicit counters to track gpu pipeline
Don't mix the usage of cc_ptr, use separate and explicit counters to track the progress and completion of chunks.
Configuration menu - View commit details
-
Copy full SHA for 52b93ad - Browse repository at this point
Copy the full SHA 52b93adView commit details -
ch4/ofi: use internal tag for pipeline chunk match_bits
Follow a similar approach as nonblocking collectives, internal pipeline chunks use separate tag space (MPIDI_OFI_GPU_PIPELINE_SEND) and incrementing tags to avoid mismatch with regular messages.
Configuration menu - View commit details
-
Copy full SHA for 96988a1 - Browse repository at this point
Copy the full SHA 96988a1View commit details -
ch4/ofi: refactor gpu pipeline recv_alloc
Separate the recv tasks between the initial header and chunks since the paths clearly separates them. Use a single async item for all chunk recvs rather than unnecessarily enqueuing individual chunks since we can track the chunks in the state.
Configuration menu - View commit details
-
Copy full SHA for 3e741c7 - Browse repository at this point
Copy the full SHA 3e741c7View commit details -
ch4/ofi: include ofi_impl.h in ofi_gpu_pipeline.c
It is needed to compile under noinline configuration.
Configuration menu - View commit details
-
Copy full SHA for 2ff2048 - Browse repository at this point
Copy the full SHA 2ff2048View commit details -
ch4/ofi: move some inline util functions
Move these utility functions to ofi_impl.h since they are simple and non-specific. It also simplifies figuring out which file to include especially for .c files.
Configuration menu - View commit details
-
Copy full SHA for 8e5a2c2 - Browse repository at this point
Copy the full SHA 8e5a2c2View commit details -
ch4/ofi: remove limit in pipeline recv chunk progress
Remove the limit in posting gpu pipeline recv chunks. The limit can be controlled by the maximum chunks from MPIDI_OFI_global.gpu_pipeline_recv_pool or when the libfabric return EAGAIN. In progressing the recv_chunk_alloc, we'll issue as many chunks as we can instead of one at a time. Refactor the code to have single exit point.
Configuration menu - View commit details
-
Copy full SHA for 8aacd18 - Browse repository at this point
Copy the full SHA 8aacd18View commit details