You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am using the coffea.nanoevents.NanoEventsFactory.from_root function from coffea.2024.5.0 and I am specifying chunking as defined in https://github.com/scikit-hep/uproot5/blob/v5.1.2/src/uproot/_dask.py#L109-L132 (as suggested in coffea). I am running this on lxplus with files in the eos folder using xrootd. I am running into something that I find odd, though may just be behaving differently than I expect.
Initially, I arbitrarily chose to have chunks of 10000 events (which is equivalent to about 16MB in the root file). This worked until I was working with a larger number of files. With more total files my RAM would fill up and my script would crash when computing using dask.compute(). When I used smaller chunks, my RAM would fill up and it would crash faster (the smaller I made the chunks the faster it would crash). I ended up having to increase my chunk size by 10 for it not to crash.
Could this be happening because when working with this small of chunks the amount of file i/o required overwhelms the RAM? Or is this possibly a bug in either coffea or uproot?
Thanks for your help!
The text was updated successfully, but these errors were encountered:
Hello, can you post some of the code that causes this behavior? If you can isolate all this in a simple reproducer it'll help us identify the cause more quickly.
I saw a similar behavior while doing the coffea-casa scale tests a few weeks ago. Very small chunksizes (initially a bug where i accidentally passed a O(100) number in as step size instead of steps_per_file), presumably small fractions of TBasket sizes, seem to lead to a serious struggle. Didn't follow up on that yet (and can't for the next couple weeks probably), but intended to scan over it for v1.1 of my simple-benchmark code
Hello, I am using the
coffea.nanoevents.NanoEventsFactory.from_root
function fromcoffea.2024.5.0
and I am specifying chunking as defined in https://github.com/scikit-hep/uproot5/blob/v5.1.2/src/uproot/_dask.py#L109-L132 (as suggested in coffea). I am running this on lxplus with files in theeos
folder usingxrootd
. I am running into something that I find odd, though may just be behaving differently than I expect.Initially, I arbitrarily chose to have chunks of 10000 events (which is equivalent to about 16MB in the root file). This worked until I was working with a larger number of files. With more total files my RAM would fill up and my script would crash when computing using
dask.compute()
. When I used smaller chunks, my RAM would fill up and it would crash faster (the smaller I made the chunks the faster it would crash). I ended up having to increase my chunk size by 10 for it not to crash.Could this be happening because when working with this small of chunks the amount of file i/o required overwhelms the RAM? Or is this possibly a bug in either coffea or uproot?
Thanks for your help!
The text was updated successfully, but these errors were encountered: