Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start move to dask histEFT #422

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Start move to dask histEFT #422

wants to merge 16 commits into from

Conversation

btovar
Copy link
Contributor

@btovar btovar commented Jul 9, 2024

@bryates This is as far as I can confidently go. I tried to update the code in corrections.py, but it involves knowing the dimensions of the arrays with regards to physics, so I didn't want to introduce errors there.

The rough roadmap you want to follow is to replace numpy with dask.array, and awkard with dask_awkward. Also, the only time you should call dask.compute is in run_analysis.

My first roadblock in corrections.py is to create a random matrix with similar dimensions as a dask_awkward array. (Looking at the code, I think you don't need to generate the full matrix?) The issue is that the original code used to flatten/unflatten arrays, and such operations don't really make sense in dask when the arrays are not known. I think that probably someone that understand the physics can change that computation to use dak.mask, etc.

@bryates
Copy link
Contributor

bryates commented Jul 9, 2024

@bryates This is as far as I can confidently go. I tried to update the code in corrections.py, but it involves knowing the dimensions of the arrays with regards to physics, so I didn't want to introduce errors there.

The rough roadmap you want to follow is to replace numpy with dask.array, and awkard with dask_awkward. Also, the only time you should call dask.compute is in run_analysis.

My first roadblock in corrections.py is to create a random matrix with similar dimensions as a dask_awkward array. (Looking at the code, I think you don't need to generate the full matrix?) The issue is that the original code used to flatten/unflatten arrays, and such operations don't really make sense in dask when the arrays are not known. I think that probably someone that understand the physics can change that computation to use dak.mask, etc.

Thanks @btovar! I agree with you that the randomness in the Rochester can be done better. We can take a look at other things as well.

@btovar
Copy link
Contributor Author

btovar commented Jul 9, 2024

The futures test was not removed, it was renamed to test_dask. There is not a "futures" executor in the new coffea.

@bryates
Copy link
Contributor

bryates commented Jul 9, 2024

The futures test was not removed, it was renamed to test_dask. There is not a "futures" executor in the new coffea.

Yes, that's what I meant to put, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch processor to dask_awkward [example](https://github.com/cmstas/ewkcoffea/pull/14/files)
2 participants