From 36da5d4a0a0bb9fcbe2ed794e4b4709ef851ed7e Mon Sep 17 00:00:00 2001 From: Simon Byrne Date: Mon, 24 Jul 2023 11:43:31 -0400 Subject: [PATCH 1/2] move prefs to end --- .../juliacon_2023_presentation.md | 58 +++++++------------ 1 file changed, 22 insertions(+), 36 deletions(-) diff --git a/docs/juliacon_2023/juliacon_2023_presentation.md b/docs/juliacon_2023/juliacon_2023_presentation.md index f821b7dda..e4f497f72 100644 --- a/docs/juliacon_2023/juliacon_2023_presentation.md +++ b/docs/juliacon_2023/juliacon_2023_presentation.md @@ -337,51 +337,19 @@ Where are the compressed chunks and can we decompress them in parallel? --- -# Parallelization via Message Passing Interface (MPI) +# Parallelization via MPI - Message Passing Interface (MPI) is an interface for single-program, multiple-data (SPMD) parallelism. - Launch multiple processes running the same program - ```sh - mpiexec -n program ... - ``` + ```sh + mpiexec -n program ... + ``` - Programs determine what they should do based on their identifier (_rank_). - Each process determines what communication operations it should do (messages) - Multiple implementations (Open MPI, MPICH, vendor-specific) - Widely used in HPC for large-scale distributed parallelism. - MPI.jl provides Julia bindings ----- - -## Configuring HDF5 with MPI (in upcoming 0.17 release) - -- Now works with default MPI & HDF5 JLLs -- On HPC clusters, will typically want to use the system-provided MPI library - - Integrate with resource manager, make use of specialized network hardware, GPU-aware interfaces - -### Option 1: use MPItrampoline -Requires building a wrapper library around your MPI library. - ```julia - MPIPreferences.use_jll_binary("MPItrampoline_jll") - ``` - - HDF5.jl should work directly. - ----- -### Option 2: use system binary directly -Requires system-provided MPI + HDF5 libraries. - -```julia -using MPIPreferences -MPIPreferences.use_system_binary() -``` -Need to set corresponding preferences for HDF5 -```julia -using Preferences, HDF5 -set_preferences!(HDF5, - "libhdf5" => "/path/to/your/libhdf5.so", - "libhdf5_hl" => "/path/to/your/libhdf5_hl.so", - force = true) -``` - --- ## Using MPI + HDF5 @@ -409,6 +377,24 @@ Usage otherwise same as normal: --- +# Configuring HDF5 (in upcoming 0.17 release) + +- May want to use specific HDF5 library + - interoperability with other languages (e.g. h5py) + - linked against custom MPI binary + - specific hardware features (burst buffers) + +- Preferences.jl to specify custom HDF5 binary +```julia +using Preferences, HDF5 +set_preferences!(HDF5, + "libhdf5" => "/path/to/your/libhdf5.so", + "libhdf5_hl" => "/path/to/your/libhdf5_hl.so", + force = true) +``` + +--- + # Summary * HDF5 is a format, C library, and data model for storing hierarchical information. From e0d07a75b1dec46114997a076069748edab6d75d Mon Sep 17 00:00:00 2001 From: Simon Byrne Date: Mon, 24 Jul 2023 12:12:32 -0400 Subject: [PATCH 2/2] add virtual dataset --- .../juliacon_2023_presentation.md | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/juliacon_2023/juliacon_2023_presentation.md b/docs/juliacon_2023/juliacon_2023_presentation.md index e4f497f72..3da5271e9 100644 --- a/docs/juliacon_2023/juliacon_2023_presentation.md +++ b/docs/juliacon_2023/juliacon_2023_presentation.md @@ -335,6 +335,28 @@ Where are the compressed chunks and can we decompress them in parallel? * Currently, HDF5.jl allows contiguous datasets to be memory mapped into arrays allowing for multithreaded reads. * With efficient chunk iteration, could we perform parallel decompression in HDF5.jl by reading compressed chunks directly? +--- +# Virtual datasets + +- Maps multiple datasets into a single dataset + - Can be same or different files + - Supports patterns for sequentially numbered files/datasets + +- e.g. consider a dataset made up of 100×10 blocks, across 4 files + - `data00.h5`, `data01.h5`, etc. + +```julia +space = dataspace((100,40)) +create_dataset(h5f, "dataset", datatype, space; + virtual=[HDF5.VirtualMapping( + HDF5.hyperslab(space, (1:100, HDF5.BlockRange(1:10; count = -1))), # block pattern + "./data0%b.h5", # filenames (%b block pattern) + "data", # path to source dataset in file + dataspace((100,10)) # view into source dataset + )] +) +``` + --- # Parallelization via MPI