Skip to content

Commit

Permalink
added num_workers param in config
Browse files Browse the repository at this point in the history
  • Loading branch information
Saiyam26 committed Nov 7, 2024
1 parent 59e5623 commit ac30dce
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 0 deletions.
5 changes: 5 additions & 0 deletions config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ default: `50000`
Useful for low resource utilization. This will ensure all data is stored in multiple chunks of almost `sample_chunksize` samples. This does not hamper any logic in algorithms but simply ensures that the entire dataset is never loaded all at once on the RAM.
`null` value will disregard this optimization.

**num_workers** {int}: `int | null`
default: `1`
This param uses multiple workers in parallel to speed up the data writing to disk. Please use this
with careful consideration of the number of cores available in the device. *Note that this doesn't increase memory usage of pipeline*. Ideal increment found at `num_workers = 3`.

**train_val_test** {dict}:
This section splits the data using the mentioned splitting technique mentioned in `splitter_config` & required params like `split_ratio` and `stratify` options. Example below.

Expand Down
1 change: 1 addition & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ experiment:
# DATA CONFIG.
data:
sample_chunksize: 20000
num_workers: 1

train_val_test:
full_datapath: '/path/to/anndata.h5ad'
Expand Down

0 comments on commit ac30dce

Please sign in to comment.