Skip to content

Commit

Permalink
add virtual dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
simonbyrne committed Jul 24, 2023
1 parent 36da5d4 commit e0d07a7
Showing 1 changed file with 22 additions and 0 deletions.
22 changes: 22 additions & 0 deletions docs/juliacon_2023/juliacon_2023_presentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,28 @@ Where are the compressed chunks and can we decompress them in parallel?
* Currently, HDF5.jl allows contiguous datasets to be memory mapped into arrays allowing for multithreaded reads.
* With efficient chunk iteration, could we perform parallel decompression in HDF5.jl by reading compressed chunks directly?

---
# Virtual datasets

- Maps multiple datasets into a single dataset
- Can be same or different files
- Supports patterns for sequentially numbered files/datasets

- e.g. consider a dataset made up of 100×10 blocks, across 4 files
- `data00.h5`, `data01.h5`, etc.

```julia
space = dataspace((100,40))
create_dataset(h5f, "dataset", datatype, space;
virtual=[HDF5.VirtualMapping(
HDF5.hyperslab(space, (1:100, HDF5.BlockRange(1:10; count = -1))), # block pattern
"./data0%b.h5", # filenames (%b block pattern)
"data", # path to source dataset in file
dataspace((100,10)) # view into source dataset
)]
)
```

---

# Parallelization via MPI
Expand Down

0 comments on commit e0d07a7

Please sign in to comment.