Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PlaceholderArray supports only trivial slices, not ndarray hit with vector addition after field addition to Jets #1231

Open
ikrommyd opened this issue Dec 13, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@ikrommyd
Copy link
Collaborator

This code:

import awkward as ak
import dask
from coffea.nanoevents import NanoEventsFactory

jet_field = "Jet"
delta_r_val = 0.3

events = NanoEventsFactory.from_root(
    {"../coffea/tests/samples/nano_dy.root": "Events"},
    delayed=True,
).events()

events_with_jet = events[ak.num(events[jet_field]) > 0]
events_with_jet[jet_field, "Delta_r_bb"] = events_with_jet[jet_field].metric_table(events_with_jet.GenPart)
nearest_jet = events_with_jet[jet_field][ak.argmin(events_with_jet[jet_field].Delta_r_bb, axis=1)]
dask.compute(nearest_jet + events_with_jet.Muon.sum(axis=1))[0]

fails with

TypeError: PlaceholderArray supports only trivial slices, not ndarray

This error occurred while calling

    numpy.add.__call__(
        repr-raised-TypeError
        <CandidateArray [{x: 0, y: 0, z: 0, ...}, ...] type='40 * Candidate...'>
    )

switching to delayed=False works fine. Also not masking events and doing events_with_jet = events or not attaching Delta_r_bb to the Jets and jeeping it as a separate variable solve the problem.

@ikrommyd ikrommyd added the bug Something isn't working label Dec 13, 2024
@ikrommyd
Copy link
Collaborator Author

This is also another example with the same error:

events = NanoEventsFactory.from_root(
    {"../coffea/tests/samples/nano_dy.root": "Events"},
    delayed=True,
).events()

events_with_muon = events[ak.num(events.Muon) > 0]
a, b = ak.unzip(ak.cartesian([events_with_muon.Muon, events_with_muon.Jet], nested=False))
metric = a.delta_r(b)
events_with_muon["Muon", "dr"] = ak.min(metric, axis=1)
nearest_muon = events_with_muon.Electron.nearest(events_with_muon.Muon)
dask.compute(nearest_muon + nearest_muon)[0]

In the above code, the dr calculation is actually useless. Commenting out events_with_muon["Muon", "dr"] = ak.min(metric, axis=1) will solve the problem or any of the other fixes in the previous comment (separate variable or not masking events) will solve it as well.

@pfackeldey
Copy link
Contributor

pfackeldey commented Dec 16, 2024

I'm looking at your second reproducer and so far I can tell that the problem is that not all necessary columns are loaded - or the form is not pruned correctly - for the nearest_muon + nearest_muon calculation, in particular I can see the following:

Placeholder: dxy
Placeholder: dxyErr
Placeholder: dz
Placeholder: dzErr
Loaded: eta
Placeholder: ip3d
Placeholder: jetPtRelv2
Placeholder: jetRelIso
Loaded: mass
Placeholder: miniPFRelIso_all
Placeholder: miniPFRelIso_chg
Placeholder: pfRelIso03_all
Placeholder: pfRelIso03_chg
Placeholder: pfRelIso04_all
Loaded: phi
Loaded: pt
Placeholder: ptErr
Placeholder: segmentComp
Placeholder: sip3d
Placeholder: softMva
Placeholder: tkRelIso
Placeholder: tunepRelPt
Placeholder: mvaLowPt
Placeholder: mvaTTH
Loaded: charge
Placeholder: jetIdx
Placeholder: nStations
Placeholder: nTrackerLayers
Placeholder: pdgId
Placeholder: tightCharge
Placeholder: fsrPhotonIdx
Placeholder: highPtId
Placeholder: inTimeMuon
Placeholder: isGlobal
Placeholder: isPFcand
Placeholder: isTracker
Placeholder: looseId
Placeholder: mediumId
Placeholder: mediumPromptId
Placeholder: miniIsoId
Placeholder: multiIsoId
Placeholder: mvaId
Placeholder: pfIsoId
Placeholder: softId
Placeholder: softMvaId
Placeholder: tightId
Placeholder: tkIsoId
Placeholder: triggerIdLoose
Placeholder: genPartIdx
Placeholder: genPartFlav
Placeholder: cleanmask
Placeholder: fsrPhotonIdxG
Placeholder: genPartIdxG
Placeholder: jetIdxG
Loaded: dr

For dask.compute(nearest_muon) it would mark all fields as "Loaded" in this test. There seems to be a problem with the determination of which columns are needed, you can see that yourself by comparing:

print(dak.necessary_columns(nearest_muon))
# {'from-uproot-5df425e0c71472fdf0835badd6bb7337': frozenset({'Muon_isTracker', 'Muon_ptErr', 'Muon_fsrPhotonIdx', 'Muon_mediumPromptId', 'Muon_mvaId', 'Muon_tkRelIso', 'Muon_mvaLowPt', 'Muon_pfRelIso04_all', 'Muon_genPartIdx', 'Muon_triggerIdLoose', 'Muon_dxy', 'Muon_nStations', 'nFsrPhoton', 'Jet_eta', 'nGenPart', 'Muon_miniIsoId', 'Muon_multiIsoId', 'Muon_ip3d', 'Muon_genPartFlav', 'Jet_phi', 'Muon_jetIdx', 'Muon_highPtId', 'Muon_tightId', 'Electron_phi', 'Muon_sip3d', 'Muon_segmentComp', 'Muon_dzErr', 'Muon_cleanmask', 'Muon_inTimeMuon', 'Muon_tkIsoId', 'Muon_pfRelIso03_chg', 'Electron_eta', 'Muon_tightCharge', 'Muon_jetRelIso', 'nMuon', 'Muon_mass', 'Muon_miniPFRelIso_chg', 'Muon_softId', 'Muon_dxyErr', 'nElectron', 'Muon_miniPFRelIso_all', 'Muon_pfRelIso03_all', 'Muon_mvaTTH', 'Muon_eta', 'Muon_isGlobal', 'Muon_mediumId', 'nJet', 'Muon_pt', 'Muon_dz', 'Muon_softMvaId', 'Muon_jetPtRelv2', 'Muon_charge', 'Muon_isPFcand', 'Muon_looseId', 'Muon_tunepRelPt', 'Muon_pfIsoId', 'Muon_pdgId', 'Muon_phi', 'Muon_nTrackerLayers', 'Muon_softMva'})}

print(dak.necessary_columns(nearest_muon + nearest_muon))
# {'from-uproot-5df425e0c71472fdf0835badd6bb7337': frozenset({'Muon_mass', 'Muon_pt', 'nElectron', 'Electron_phi', 'Muon_charge', 'Jet_eta', 'Muon_eta', 'Muon_phi', 'Electron_eta', 'nJet', 'Jet_phi', 'nMuon'})}

Obviously in the second case the reported necessary columns are not enough to fully build a MuonArray, which is why the computation fails.

I'm still investigating, but my current assumption is that the typetracer after the + thinks it's just a CandidateArray(?) that holds nothing except for x,y,z,t,charge, and thus upon compute doesn't want to load all other fields.

You can see this aswell when running dask.compute(nearest_muon + nearest_muon, optimize_graph=False)[0], or nearest_muon._meta + nearest_muon._meta.

@ikrommyd I assume returning a "CandidateArray" is expected in this case, as this happens in the eager case aswell. If this assumption is correct then my best guess currently is that the uproot.dask function is not updating the ak.form correctly (i.e. removing all fields from the form that are not necessary).

If you were expecting a "MuonArray" as a result, then I think that something needs to be updated in coffea's behaviors, because currently a add operation of any Candidate subclass (e.g. Muon) is not returning the subclass, but the Candidate class instance.

I still need to understand what the __setitem__ has to do with it. It may just be a coincidence that triggered what I described above.

@ikrommyd
Copy link
Collaborator Author

Yes, Muon + Muon is a CandidateArray

In [2]: events = NanoEventsFactory.from_root({"https://github.com/scikit-hep/coffea/raw/refs/heads/master/tests/samples/nano_dy.root": "Events"}, delayed=True).events()
/Users/iason/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_electronIdx => Electron
  warnings.warn(
/Users/iason/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_genPartIdx => GenPart
  warnings.warn(
/Users/iason/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_photonIdx => Photon
  warnings.warn(
/Users/iason/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(

In [3]: events.Muon + events.Muon
Out[3]: dask.awkward<add, npartitions=1>

In [4]: (events.Muon + events.Muon).compute()
Out[4]: <CandidateArray [[], [], [], [], ..., [], [], [], []] type='40 * var * Cand...'>

In [5]: events = NanoEventsFactory.from_root({"https://github.com/scikit-hep/coffea/raw/refs/heads/master/tests/samples/nano_dy.root": "Events"}, delayed=False).events()
/Users/iason/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_electronIdx => Electron
  warnings.warn(
/Users/iason/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_genPartIdx => GenPart
  warnings.warn(
/Users/iason/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_photonIdx => Photon
  warnings.warn(
/Users/iason/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(

In [6]: events.Muon + events.Muon
Out[6]: <CandidateArray [[], [], [], [], ..., [], [], [], []] type='40 * var * Cand...'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants