CMLDask

Utility/Wrapper for interacting with the Computational Memory Lab's computing resources using the Dask distributed computing framework

Installation

Clone this repository:

git clone git@github.com:pennmem/cmldask.git (Using SSH)

Install using pip:

pip install -e cmldask -r cmldask/requirements.txt

Usage

See included notebooks for more detailed instructions and examples, but in short:

Initialize a new client with dask: client = CMLDask.new_dask_client_{sge,slurm}("job_name", "1GB") (depending on whether you're connecting to SGE or SLURM for job scheduling)
Optional: Open dashboard using instructions printed upon running (1)
Define some function func that takes an argument arg (or many arguments) and returns a result.
Define an iterable of arguments for func that you want to compute in parallel: arguments
Use the client to map these arguments to the function on different parallel workers: futures = client.map(func, arguments). Dask's Futures API launches parallel jobs and computes results immediately, but delays collecting the results from distributed memory.
Gather results into memory of your current process: result = client.gather(futures)

Dask Documentation

Refer to the Dask Futures API for great documentation on the preferred approach for parallel computing.

Some cases might warrant using Dask Delayed, which evaluates functions lazily.

Dask clients, once initialized in your kernel, will also automatically parallelize operations on a Dask DataFrame or a Dask Array. It is straightforward to convert between standard pandas, numpy, and xarray objects and these dask objects for distributed computing. As long as your data object fits easily in memory, though, it's unlikely that you will stand to benefit much from using these implementations (there is significant overhead).

Here is a good YouTube walkthrough of the Dask interactive monitoring dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
cmldask		cmldask
docs		docs
notebooks		notebooks
tests		tests
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_setup.py		test_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMLDask

Installation

Usage

Dask Documentation

About

Releases 2

Packages

Contributors 3

Languages

pennmem/cmldask

Folders and files

Latest commit

History

Repository files navigation

CMLDask

Installation

Usage

Dask Documentation

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages