Replies: 5 comments 1 reply
-
Right now, However, there is a full [1] https://github.com/cms-nanoAOD/correctionlib/blob/master/src/correctionlib/highlevel.py#L93 |
Beta Was this translation helpful? Give feedback.
-
You are stating that |
Beta Was this translation helpful? Give feedback.
-
I'll just mention that if you want to try using coffea 2023 with LPCJobQueue at the LPC cluster, you can use the standard container and then simply from distributed import Client
from lpcjobqueue import LPCCondorCluster
cluster = LPCCondorCluster(ship_env=True)
cluster.adapt(minimum=1, maximum=10)
client = Client(cluster) |
Beta Was this translation helpful? Give feedback.
-
As of dask-awkward 2023.7.0 and awkward 2.3.0
|
Beta Was this translation helpful? Give feedback.
-
Hey guys,
Note that the single quotes for the |
Beta Was this translation helpful? Give feedback.
-
I'll start building up here a migration guide for folks using coffea 0.7 to coffea 2023. There are significant differences in functionality due to the evolution of the awkward array package from v1 to v2, notably that all delayed computation is accomplished through use of the dask via using dask-awkward, dask-histogram via the hist.dask extension.
Using these packages we have been able to maintain the functionality and interfaces of coffea, but fully integrated with the dask task-graph building system. This change is well justified since it brings qualitatively new functionality to coffea (like on-demand skimming) and makes analysis design significantly more flexible (data exploration and analysis scaling that does not need the processor pattern). Usage of the dask-awkward, dask-histogram, and hist.dask packages is mandatory. The new, advanced functionality afforded by these packages is completely opt-in to ease migration through piecewise adoption, and the processors still function as they have in the past with some minor conversion required (wrapper is in preparation, watch #882).
In broad strokes, to migrate an analysis to coffea 2023 you will need to make the following changes:
array.compute()
,ahistogram.compute()
, ordask.compute({"some": array, "another": histogram})
.compute()
as it can drastically slow down your analysis code (and within a processor it is taken care of for you by coffea's executors).import hist.dask as hda
and you should perform array operations as you used to with bare awkward array (ak.some_function
will calldak.some_function
if passed adask-awkward
array) and the instantiate histograms using these package with the syntax you are accustomed to fromawkward
andhist
.import hist
for convenient definition of axes.import awkward as ak
and setpermit_dask=True
inNanoEventsFactory.from_root
dask_awkward
functionality (like getting the list of expected columns to read) you'll need toawkward
are available indask_awkward
, and if you encounter a problem or missing piece of functionality that you need you should open an issue at the dask-awkward github pagehist.dask
histograms behave like emptyhist
histograms, and will return a filledhist.hist.Hist
when.compute()
is called.coffea.lookup_tools.lookup_base
(through the evaluator interface or with objects you make yourself) this is done for you automatically. Otherwise you should wrap your correction in adask.delayed
object and on that object do.persist()
Beta Was this translation helpful? Give feedback.
All reactions