Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR 3119 introduced overtouching #3185

Closed
ikrommyd opened this issue Jul 17, 2024 · 13 comments
Closed

PR 3119 introduced overtouching #3185

ikrommyd opened this issue Jul 17, 2024 · 13 comments
Labels
bug The problem described is something that must be fixed

Comments

@ikrommyd
Copy link

ikrommyd commented Jul 17, 2024

Version of Awkward Array

>=2.6.5

Description and code to reproduce

Reproducer:

import awkward as ak
import numpy as np

from coffea.nanoevents import NanoEventsFactory
import dask_awkward as dak
import hist.dask as hda

def test_necessary_columns():
    events, report = NanoEventsFactory.from_root(
        {
            "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"
        },
        metadata={"dataset": "Test"},
        uproot_options={"allow_read_errors_with_report": True},
    ).events()

    events["Electron", "pdgId"] = -11 * events.Electron.charge
    events["Muon", "pdgId"] = -13 * events.Muon.charge
    events["leptons"] = ak.concatenate(
        [events.Electron, events.Muon],
        axis=1,
    )
    events = events[ak.num(events.leptons) >= 3]
    pair = ak.argcombinations(events.leptons, 2, fields=["l1", "l2"])
    pair = pair[(events.leptons[pair.l1].pdgId == -events.leptons[pair.l2].pdgId)]

    pair = pair[
        ak.singletons(
            ak.argmin(
                abs((events.leptons[pair.l1] + events.leptons[pair.l2]).mass - 91.2),
                axis=1,
            )
        )
    ]

    events = events[ak.num(pair) > 0]
    pair = pair[ak.num(pair) > 0][:, 0]

    l3 = ak.local_index(events.leptons)
    l3 = l3[(l3 != pair.l1) & (l3 != pair.l2)]
    l3 = l3[ak.argmax(events.leptons[l3].pt, axis=1, keepdims=True)]
    l3 = events.leptons[l3][:, 0]

    mt = np.sqrt(2 * l3.pt * events.MET.pt * (1 - np.cos(events.MET.delta_phi(l3))))
    q8_hist = (
        hda.Hist.new.Reg(100, 0, 200, name="mt", label="$\ell$-MET transverse mass [GeV]")
        .Double()
        .fill(mt)
    )
    print(dak.necessary_columns(q8_hist))
    if len(str(dak.necessary_columns(q8_hist))) > 1000:
        return 1
    else:
        return 0

exit(test_necessary_columns())

The bug is introduced by #3119

My git bisect report:

(work) ➜  awkward git:(v2.6.5) ✗ git bisect start
M	awkward-cpp/rapidjson
Previous HEAD position was 45c708d2 chore: the next release will be 2.6.5
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
status: waiting for both good and bad commits
(work) ➜  awkward git:(main) ✗ git bisect bad
status: waiting for good commit(s), bad commit known
(work) ➜  awkward git:(main) ✗ git bisect good 727ff9ca4759aed52e0b4f21d3daa66724582c61
Bisecting: 11 revisions left to test after this (roughly 4 steps)
[02858f895f89f7e6292dc3a8d7e079d596885286] feat: Adding awkward.semipublic submodule (#3152)
(work) ➜  awkward git:(02858f89) ✗ git bisect run python bug.py
running 'python' 'bug.py'
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
{'from-uproot-781bf991cccf86d40c7bf9a7cf6777ff': frozenset({'Muon_segmentComp', 'Electron_dxyErr', 'Muon_miniPFRelIso_chg', 'Electron_mvaFall17V2Iso_WP80', 'Muon_miniIsoId', 'Electron_cutBased', 'Muon_inTimeMuon', 'Electron_cutBased_Fall17_V1', 'Electron_tightCharge', 'Electron_jetRelIso', 'nGenPart', 'Muon_pfRelIso03_chg', 'Muon_tightId', 'Electron_cutBased_HEEP', 'Electron_mvaFall17V2noIso_WPL', 'Electron_vidNestedWPBitmap', 'Electron_mvaFall17V1Iso_WP90', 'Electron_mvaFall17V2Iso_WP90', 'Electron_dr03TkSumPt', 'Muon_nTrackerLayers', 'Electron_hoe', 'Electron_pfRelIso03_chg', 'Electron_miniPFRelIso_chg', 'Muon_isTracker', 'Muon_nStations', 'Muon_fsrPhotonIdx', 'Muon_dz', 'Muon_sip3d', 'Muon_genPartIdx', 'Electron_dz', 'Electron_dr03HcalDepth1TowerSumEt', 'Muon_tunepRelPt', 'Electron_dr03TkSumPtHEEP', 'nMuon', 'Electron_mvaFall17V2noIso_WP80', 'Muon_dzErr', 'Electron_dzErr', 'Muon_looseId', 'Muon_mvaLowPt', 'Muon_pfIsoId', 'Electron_energyErr', 'Electron_mvaFall17V1Iso', 'Electron_eta', 'Muon_softId', 'Muon_pfRelIso03_all', 'Electron_charge', 'Electron_mvaFall17V1noIso', 'Electron_miniPFRelIso_all', 'Electron_pt', 'Electron_mvaFall17V1noIso_WPL', 'Muon_mvaId', 'Muon_isPFcand', 'Muon_isGlobal', 'Electron_convVeto', 'Muon_mediumPromptId', 'Muon_tightCharge', 'nFsrPhoton', 'Muon_charge', 'Electron_mvaFall17V1Iso_WP80', 'Electron_sip3d', 'Electron_pfRelIso03_all', 'Electron_eInvMinusPInv', 'Muon_pfRelIso04_all', 'Electron_mvaFall17V1Iso_WPL', 'Muon_tkRelIso', 'Muon_mvaTTH', 'Muon_eta', 'Muon_cleanmask', 'nJet', 'Muon_ptErr', 'Electron_genPartFlav', 'Electron_lostHits', 'Muon_miniPFRelIso_all', 'Muon_dxy', 'Electron_vidNestedWPBitmapHEEP', 'Muon_phi', 'Electron_mvaFall17V1noIso_WP90', 'Electron_mass', 'MET_pt', 'Muon_multiIsoId', 'Electron_mvaFall17V1noIso_WP80', 'Muon_mass', 'Electron_r9', 'Muon_softMvaId', 'Electron_eCorr', 'Electron_jetPtRelv2', 'Electron_mvaTTH', 'Muon_jetIdx', 'Muon_mediumId', 'Electron_dr03EcalRecHitSumEt', 'Muon_pt', 'Muon_softMva', 'Electron_phi', 'Muon_triggerIdLoose', 'Electron_photonIdx', 'Electron_jetIdx', 'MET_phi', 'Electron_ip3d', 'Electron_seedGain', 'Electron_deltaEtaSC', 'Muon_genPartFlav', 'Electron_mvaFall17V2noIso_WP90', 'Muon_jetRelIso', 'nPhoton', 'Electron_cleanmask', 'Electron_mvaFall17V2Iso', 'Muon_jetPtRelv2', 'Muon_highPtId', 'Electron_dxy', 'Muon_dxyErr', 'Electron_mvaFall17V2Iso_WPL', 'nElectron', 'Electron_genPartIdx', 'Muon_tkIsoId', 'Muon_ip3d', 'Electron_isPFcand', 'Electron_sieie', 'Electron_mvaFall17V2noIso'})}
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[53dc0e266dc346146ef3c581deb98545f89270f4] ci: Ensure attestations for awkward-cpp sdist and wheels (#3135)
running 'python' 'bug.py'
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
{'from-uproot-81595a16d812d8e421ed4238952ce794': frozenset({'Electron_eCorr', 'Electron_mvaFall17V2Iso', 'Muon_nStations', 'Electron_deltaEtaSC', 'Electron_jetIdx', 'Muon_ptErr', 'Muon_softMvaId', 'Electron_cutBased', 'nFsrPhoton', 'Muon_tunepRelPt', 'Electron_vidNestedWPBitmapHEEP', 'Electron_seedGain', 'Electron_convVeto', 'Muon_tkIsoId', 'Electron_hoe', 'Electron_dxy', 'Electron_mvaFall17V1Iso_WP90', 'Electron_dr03HcalDepth1TowerSumEt', 'nGenPart', 'Electron_mvaFall17V2Iso_WP90', 'Muon_charge', 'Muon_pt', 'Electron_dz', 'Muon_ip3d', 'Muon_phi', 'Electron_genPartIdx', 'Muon_miniIsoId', 'Electron_mvaTTH', 'MET_pt', 'Electron_dr03TkSumPt', 'Muon_mvaId', 'Muon_dxyErr', 'Electron_mvaFall17V1Iso', 'Electron_mvaFall17V2noIso_WP90', 'Electron_mvaFall17V2Iso_WPL', 'Muon_dxy', 'MET_phi', 'Electron_jetRelIso', 'Electron_pfRelIso03_chg', 'Muon_nTrackerLayers', 'Muon_mvaLowPt', 'Electron_mvaFall17V1noIso_WP80', 'nJet', 'Muon_mediumPromptId', 'Electron_ip3d', 'Muon_segmentComp', 'Electron_pt', 'Electron_dr03TkSumPtHEEP', 'Muon_pfRelIso04_all', 'Muon_mediumId', 'Electron_jetPtRelv2', 'Electron_cutBased_HEEP', 'Electron_mvaFall17V1Iso_WPL', 'Muon_dzErr', 'Electron_mvaFall17V1noIso_WP90', 'Electron_sip3d', 'Muon_genPartIdx', 'Electron_phi', 'nMuon', 'Electron_mvaFall17V1Iso_WP80', 'Electron_miniPFRelIso_all', 'Muon_genPartFlav', 'Muon_tightId', 'Muon_pfIsoId', 'Electron_sieie', 'Electron_eta', 'Electron_isPFcand', 'Muon_jetRelIso', 'Muon_tkRelIso', 'Electron_genPartFlav', 'Electron_pfRelIso03_all', 'Muon_looseId', 'Electron_vidNestedWPBitmap', 'Muon_fsrPhotonIdx', 'Electron_r9', 'Electron_mvaFall17V2noIso_WP80', 'Muon_softId', 'Electron_cleanmask', 'Muon_jetPtRelv2', 'Electron_dr03EcalRecHitSumEt', 'Electron_cutBased_Fall17_V1', 'Muon_triggerIdLoose', 'Electron_mvaFall17V2noIso_WPL', 'Muon_isPFcand', 'Electron_mvaFall17V2noIso', 'Muon_jetIdx', 'Muon_multiIsoId', 'Muon_sip3d', 'Muon_mvaTTH', 'Electron_mass', 'Electron_mvaFall17V1noIso_WPL', 'Muon_mass', 'Electron_photonIdx', 'nPhoton', 'Muon_miniPFRelIso_chg', 'nElectron', 'Muon_tightCharge', 'Muon_isGlobal', 'Muon_highPtId', 'Muon_cleanmask', 'Electron_dzErr', 'Electron_mvaFall17V1noIso', 'Muon_eta', 'Electron_mvaFall17V2Iso_WP80', 'Muon_miniPFRelIso_all', 'Electron_energyErr', 'Electron_tightCharge', 'Muon_isTracker', 'Muon_dz', 'Muon_softMva', 'Electron_dxyErr', 'Muon_pfRelIso03_all', 'Electron_eInvMinusPInv', 'Muon_inTimeMuon', 'Electron_charge', 'Electron_lostHits', 'Electron_miniPFRelIso_chg', 'Muon_pfRelIso03_chg'})}
Bisecting: 2 revisions left to test after this (roughly 1 step)
[28a89daaa0ca2fbf4242c897eb6760cb2a9b4ebc] fix: prevent exponential memory growth in UnionArray (#3119)
running 'python' 'bug.py'
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
{'from-uproot-685afe5860956e841b4f6d433d7c6718': frozenset({'Electron_mvaTTH', 'Muon_multiIsoId', 'Electron_mvaFall17V1Iso_WP80', 'Electron_dr03EcalRecHitSumEt', 'Muon_dxyErr', 'Electron_convVeto', 'Muon_tightCharge', 'Electron_mvaFall17V2Iso_WP90', 'Electron_mvaFall17V1noIso_WP80', 'Muon_miniPFRelIso_chg', 'Electron_cleanmask', 'Electron_pt', 'Electron_mvaFall17V2noIso_WP90', 'Electron_mvaFall17V2Iso', 'Muon_mvaTTH', 'Muon_mvaId', 'Electron_cutBased_Fall17_V1', 'Muon_pfRelIso04_all', 'Muon_jetRelIso', 'Electron_dxy', 'Electron_mvaFall17V2noIso', 'Muon_ip3d', 'Muon_jetPtRelv2', 'Electron_cutBased_HEEP', 'Electron_vidNestedWPBitmap', 'Electron_jetIdx', 'Muon_genPartIdx', 'Electron_seedGain', 'Electron_lostHits', 'Muon_phi', 'Muon_mediumPromptId', 'Muon_sip3d', 'nGenPart', 'Muon_isTracker', 'Electron_isPFcand', 'Muon_miniPFRelIso_all', 'Electron_mvaFall17V2noIso_WPL', 'Electron_tightCharge', 'Electron_mvaFall17V2noIso_WP80', 'Muon_tkIsoId', 'Electron_r9', 'Electron_ip3d', 'Electron_pfRelIso03_chg', 'Electron_phi', 'Muon_isPFcand', 'Muon_nTrackerLayers', 'Electron_eInvMinusPInv', 'Electron_vidNestedWPBitmapHEEP', 'Electron_jetRelIso', 'nJet', 'Electron_mvaFall17V1Iso_WPL', 'Muon_triggerIdLoose', 'Muon_genPartFlav', 'Electron_genPartIdx', 'Electron_eta', 'Electron_mvaFall17V2Iso_WP80', 'Electron_mvaFall17V1noIso_WP90', 'Electron_dz', 'Electron_dr03HcalDepth1TowerSumEt', 'Electron_sip3d', 'Muon_mass', 'Electron_dzErr', 'Muon_miniIsoId', 'MET_pt', 'Muon_nStations', 'Electron_mass', 'Muon_eta', 'Muon_pfRelIso03_chg', 'Electron_dxyErr', 'nPhoton', 'Muon_pfIsoId', 'MET_phi', 'Muon_isGlobal', 'Electron_cutBased', 'Muon_softMvaId', 'Electron_jetPtRelv2', 'nElectron', 'Muon_looseId', 'Muon_softId', 'Muon_cleanmask', 'Muon_highPtId', 'Muon_segmentComp', 'Electron_photonIdx', 'Muon_pfRelIso03_all', 'nFsrPhoton', 'Muon_jetIdx', 'Electron_mvaFall17V1noIso', 'Muon_dzErr', 'Electron_energyErr', 'Muon_mediumId', 'Electron_mvaFall17V1Iso_WP90', 'Electron_mvaFall17V1noIso_WPL', 'Electron_miniPFRelIso_chg', 'Electron_charge', 'Muon_charge', 'Muon_pt', 'Muon_inTimeMuon', 'Electron_pfRelIso03_all', 'Electron_sieie', 'Muon_ptErr', 'nMuon', 'Electron_miniPFRelIso_all', 'Electron_dr03TkSumPtHEEP', 'Electron_deltaEtaSC', 'Muon_tightId', 'Muon_mvaLowPt', 'Muon_fsrPhotonIdx', 'Muon_tkRelIso', 'Muon_tunepRelPt', 'Muon_softMva', 'Electron_eCorr', 'Electron_hoe', 'Electron_genPartFlav', 'Electron_dr03TkSumPt', 'Muon_dz', 'Electron_mvaFall17V2Iso_WPL', 'Muon_dxy', 'Electron_mvaFall17V1Iso'})}
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[6cff8e939f8b9365bece63b59919b823cd35304c] fix: wait for Jitify performing a one-time only warm-up (#3113)
running 'python' 'bug.py'
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/work/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
{'from-uproot-bb74eb145870034e6b34d269e2b468e2': frozenset({'Muon_charge', 'nElectron', 'Electron_mass', 'Electron_pt', 'Electron_eta', 'Electron_phi', 'Muon_pt', 'MET_pt', 'nMuon', 'Muon_phi', 'Muon_eta', 'Muon_mass', 'MET_phi', 'Electron_charge'})}
28a89daaa0ca2fbf4242c897eb6760cb2a9b4ebc is the first bad commit
commit 28a89daaa0ca2fbf4242c897eb6760cb2a9b4ebc
Author: Jim Pivarski <jpivarski@users.noreply.github.com>
Date:   Tue May 28 20:13:46 2024 +0530

    fix: prevent exponential memory growth in UnionArray (#3119)

 src/awkward/contents/unionarray.py                 |  4 +--
 tests/test_2713_from_buffers_allow_noncanonical.py |  7 +++--
 ...vent_exponential_memory_growth_in_unionarray.py | 34 ++++++++++++++++++++++
 3 files changed, 40 insertions(+), 5 deletions(-)
 create mode 100644 tests/test_3118_prevent_exponential_memory_growth_in_unionarray.py
bisect found first bad commit
(work) ➜  awkward git:(6cff8e93) ✗
@ikrommyd ikrommyd added the bug (unverified) The problem described would be a bug, but needs to be triaged label Jul 17, 2024
@ikrommyd ikrommyd changed the title https://github.com/scikit-hep/awkward/pull/3119 introduced overtouching PR 3119 introduced overtouching Jul 17, 2024
@lgray
Copy link
Contributor

lgray commented Jul 24, 2024

a ping on this one @jpivarski @agoose77 ... this is rather badly noticeable.

@jpivarski
Copy link
Member

I'm trying to reproduce this using only Awkward, both to narrow the scope of search for the bug and to add a test to our test suite. Since you've narrowed in on #3119 as a cause, I think it must have something to do with RecordArrays in IndexedArrays, and ak.argcombinations or the application of it (i.e. ak.combinations) is also suspicious because it makes combinations of records by hanging them inside of IndexedArrays. I don't think the issue could be related to any of the calculations on scalars, since #3119 can't have anything to do with non-RecordArrays.

But so far, I haven't been able to reproduce it. I'm starting from a dataset that's already grouped into "electron" and "muon" records, for convenience:

dataset = ak.from_parquet("https://github.com/jpivarski-talks/2024-07-24-codas-hep-ml/raw/main/data/SMHiggsToZZTo4L.parquet")

I make leptons, so that we have a UnionArray:

leptons = ak.concatenate([dataset.electron, dataset.muon], axis=1)

then make a TypeTracerArray report to track touching:

layout, report = ak.typetracer.typetracer_with_report(leptons.layout.form_with_key())

and it's initially empty:

print(report.data_touched, report.shape_touched)
# [] []

Slicing through both the electrons and muons touches only the expected buffers:

ak.Array(layout).pt
print(report.data_touched, report.shape_touched)
# ['node1', 'node3', 'node14'] ['node0', 'node1', 'node3', 'node14']

The node0 is the outer ListOffsetArray's offsets, the node1 is the UnionArray tags and index, node3 is the electron pT and node14 is the muon pT.

None of the following touch any more buffers than the above:

  • ak.num(layout)
  • pair = ak.argcombinations(leptons, 2, fields=["l1", "l2"])
  • tmp = leptons[pair.l1]
  • ak.Array(layout)[pair.l1].pt
  • ak.Array(layout)[pair.l1].pt == ak.Array(layout)[pair.l2].pt
  • ak.Array(layout)[pair.l1][:, 0]

Printing arrays to the screen would touch more buffers (hence the precaution of assigning to tmp) and calculating masses would pull in all of the kinematics (node4, node5, node6, node15, node16, node17), but at that point, we're not working on RecordArrays anymore. I'm looking for something that you did that touches ~everything in this example.

Okay, ak.local_index touches everything:

ak.local_index(ak.Array(layout))

and I can't think of why that would be. I'm looking into it now...

@jpivarski
Copy link
Member

Actually, the above only touches the shape of everything. Is that consistent with what you were seeing? That the shape of everything was being touched (as opposed to the data)?

@jpivarski
Copy link
Member

But

ak.local_index(ak.Array(layout))

touching the shape of many buffers also happens if I revert #3119, so it might not be your issue.

@jpivarski
Copy link
Member

ak.local_index has a default argument of axis=-1, so determining the actual axis depth goes through ak._layout.maybe_posaxis, and this touches everybody's shape. It happens on this line:

is_branching, additional_depth = layout.branch_depth

Perhaps branch_depth shouldn't be touching shape. But before I dig deeper into this, is it your issue? For instance, is the issue caused by your

    l3 = ak.local_index(events.leptons)

line? (If you stop computing just before this line, do you get no error? If you stop computing just after this line, do you get the error?)

Also, the maybe_posaxis/branch_depth touching would go away with

    l3 = ak.local_index(events.leptons, axis=1)

(instead of the default axis=-1). Does that also make the error go away? If so, then we know the cause and I think I can find a way to fix it. If not, then this maybe_posaxis/branch_depth thing might or might not be an issue, but it isn't your issue.

@lgray
Copy link
Contributor

lgray commented Jul 29, 2024

No, this does not fix the issue. :-/

I'll see if I can synthesize an awkward only reproducer.

@ikrommyd
Copy link
Author

ikrommyd commented Jul 29, 2024

Got it a little more minimal for now. This is still overtouching

import awkward as ak
import numpy as np

from coffea.nanoevents import NanoEventsFactory
import dask_awkward as dak

events = NanoEventsFactory.from_root(
    {
        "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"
    },
).events()

events["leptons"] = ak.concatenate(
    [events.Electron, events.Muon],
    axis=1,
)
pair = ak.argcombinations(events.leptons, 2, fields=["l1", "l2"])

pair = pair[
    ak.argmin(
        (events.leptons[pair.l1] + events.leptons[pair.l2]).pt,
        axis=1,
    )
]

events = events[ak.num(pair) > 0]
l3 = events.leptons

print(dak.necessary_columns(l3.pt))
print(l3.pt.compute())

Everything here seems improtant. If I try do remove anything, like the events = events[ak.num(pair) > 0], the concatenation and just do events["leptons"] = evens.Electron or within the argmin have something like events.leptons[pair.l1].pt that doesn't include both l1 and l2, overtouchin goes away.

@ikrommyd ikrommyd reopened this Jul 29, 2024
@ikrommyd
Copy link
Author

This is also happening. There's overtouching on events.Muon.pt

import awkward as ak
import numpy as np

from coffea.nanoevents import NanoEventsFactory
import dask_awkward as dak

events = NanoEventsFactory.from_root(
    {
        "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"
    },
).events()

events["leptons"] = ak.concatenate(
    [events.Electron, events.Muon],
    axis=1,
)
pair = ak.argcombinations(events.leptons, 2, fields=["l1", "l2"])

pair = pair[
    ak.argmin(
        (events.leptons[pair.l1] + events.leptons[pair.l2]).pt,
        axis=1,
    )
]

events = events[ak.num(pair) > 0]

print(dak.necessary_columns(events.Muon.pt))
{'from-uproot-85eb9d204817e27e080b72855fb7c5d3': frozenset({'Muon_isTracker', 'Electron_mass', 'Electron_mvaFall17V1noIso', 'nJet', 'Electron_eta', 'Muon_dxy', 'Electron_mvaTTH', 'Muon_softMvaId', 'Muon_sip3d', 'Muon_pfRelIso03_chg', 'nMuon', 'Electron_pfRelIso03_all', 'Muon_isGlobal', 'Muon_tunepRelPt', 'Muon_miniPFRelIso_chg', 'Electron_isPFcand', 'Muon_jetIdx', 'Electron_dz', 'nFsrPhoton', 'Muon_pfRelIso03_all', 'nGenPart', 'Electron_dzErr', 'nPhoton', 'Electron_pdgId', 'Muon_genPartIdx', 'Muon_highPtId', 'Muon_jetPtRelv2', 'Muon_tightCharge', 'Electron_mvaFall17V2Iso_WP80', 'Electron_mvaFall17V2noIso_WP80', 'Electron_convVeto', 'Electron_mvaFall17V1Iso', 'Electron_lostHits', 'Muon_tkIsoId', 'Electron_eCorr', 'Electron_dxy', 'Electron_genPartFlav', 'Muon_mediumPromptId', 'Muon_miniPFRelIso_all', 'Electron_miniPFRelIso_all', 'Muon_pfRelIso04_all', 'Muon_inTimeMuon', 'Electron_jetRelIso', 'Muon_pt', 'Muon_mvaLowPt', 'Electron_cutBased', 'Electron_cutBased_HEEP', 'Muon_fsrPhotonIdx', 'Electron_charge', 'Muon_miniIsoId', 'Muon_mass', 'Electron_cleanmask', 'Electron_mvaFall17V2Iso_WPL', 'Electron_mvaFall17V1noIso_WP90', 'Electron_vidNestedWPBitmapHEEP', 'Electron_pfRelIso03_chg', 'Muon_dz', 'Muon_isPFcand', 'Muon_triggerIdLoose', 'Electron_miniPFRelIso_chg', 'Electron_dr03EcalRecHitSumEt', 'Electron_sieie', 'Muon_tightId', 'Electron_photonIdx', 'Muon_pfIsoId', 'Electron_r9', 'Electron_mvaFall17V1noIso_WP80', 'Electron_mvaFall17V1noIso_WPL', 'Electron_mvaFall17V1Iso_WP90', 'Electron_mvaFall17V1Iso_WP80', 'Muon_multiIsoId', 'Electron_mvaFall17V1Iso_WPL', 'Muon_eta', 'Electron_mvaFall17V2noIso_WP90', 'Electron_phi', 'Muon_tkRelIso', 'Muon_ip3d', 'Electron_jetPtRelv2', 'Electron_cutBased_Fall17_V1', 'Muon_softMva', 'Electron_hoe', 'Muon_jetRelIso', 'Muon_dzErr', 'Muon_genPartFlav', 'Muon_pdgId', 'Electron_jetIdx', 'Muon_dxyErr', 'Electron_energyErr', 'Muon_cleanmask', 'Electron_deltaEtaSC', 'Electron_dxyErr', 'Muon_looseId', 'Muon_charge', 'Muon_mvaTTH', 'Electron_seedGain', 'Electron_mvaFall17V2Iso', 'Electron_ip3d', 'nElectron', 'Electron_dr03HcalDepth1TowerSumEt', 'Electron_sip3d', 'Muon_softId', 'Electron_dr03TkSumPt', 'Electron_dr03TkSumPtHEEP', 'Electron_mvaFall17V2Iso_WP90', 'Electron_mvaFall17V2noIso', 'Muon_ptErr', 'Electron_tightCharge', 'Muon_mvaId', 'Muon_nStations', 'Muon_phi', 'Muon_segmentComp', 'Muon_mediumId', 'Electron_pt', 'Electron_eInvMinusPInv', 'Muon_nTrackerLayers', 'Electron_vidNestedWPBitmap', 'Electron_mvaFall17V2noIso_WPL', 'Electron_genPartIdx'})}

@ikrommyd
Copy link
Author

Could dask-contrib/dask-awkward#526 be somehow part of the problem as well since concatenation and argcombinations are required?

@lgray
Copy link
Contributor

lgray commented Jul 29, 2024

Here is a minimal reproducer using awkward and one coffea behavior:

import awkward as ak

from coffea.nanoevents.methods import nanoaod

def test_necessary_columns():

    with open("nanoevents_form.json", "r") as fin:
        form = ak.forms.from_json(fin.read())

    events_layout, report = ak.typetracer.typetracer_with_report(form)

    events = ak.Array(events_layout, behavior=nanoaod.behavior)

    events["leptons"] = ak.concatenate(
        [events.Electron, events.Muon],
        axis=1,
    )
    pair = ak.argcombinations(events.leptons, 2, fields=["l1", "l2"])

    # looks like all that is needed to trigger it is the sum with behaviors?
    ptsum = (events.leptons[pair.l1] + events.leptons[pair.l2]).pt

    if len(str(report.data_touched)) > 1000:
        print("bad!")
        return 1
    else:
        print("good!")
        return 0

exit(test_necessary_columns())

I have attached the json of the input form to this post.
nanoevents_form.json

@jpivarski
Copy link
Member

@pfackeldey, @maxgalli, and I looked into this. The over-touching problem comes from broadcasting through a union (the leptons) to apply a ufunc (the np.add) here:

nextinputs.append(x[mask].project(next(it_j_contents)))

Before PR #3119, UnionArray.project replaced a RecordArray with an IndexedArray of RecordArray ("lazily projecting"), such that if the UnionArray only exposed a subset of records, the IndexedArray would only have indexes for those records. Whatever fields the record contained were not touched.

After PR #3119, UnionArray.project became eager: replacing a RecordArray with a RecordArray of only the records that the UnionArray exposed—the change is here. Since we're slicing the records non-lazily, it must touch all the fields of the record.

PR #3119 was necessary because assignments like cat["another", "w"] = three.x were causing the memory to explode. As in tests/test_3118_prevent_exponential_memory_growth_in_unionarray.py, every time nested records in a union were assigned to with

cat["another", "w"] = three.x

the size of an internal buffer doubled. (So imagine assigning 20 fields into such a record.) The exponential growth of that internal buffer had something to do with the interplay between broadcasting and lazy projection, and so I fixed it by removing lazy projection.

@pfackeldey is looking into the problem, to see if there's some other way to avoid exponential memory growth of that buffer, while keeping this one project operation lazy so that it doesn't touch all fields.

Both of these problems are tightly coupled to the fact that these are UnionArrays. They wouldn't happen with any other array type.

@ikrommyd
Copy link
Author

Thanks for investigating!

@jpivarski jpivarski added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Jul 29, 2024
pfackeldey added a commit that referenced this issue Aug 1, 2024
* [Indexed*Array] adding '_trim()' method to fix exponential memory growth

* [test] enhance test_3118 with a third array

* [Indexed*Array] _trim: safer index treating

* [test] add two tests, one with 2 and one with 3 arrays

* style: pre-commit fixes

* [Indexed*Array] _trim: properly handle typetracers (and all negative indices for IndexOptionArrays)

* [test] revert commit 698b015 for 'tests/test_2713_from_buffers_allow_noncanonical.py'

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@jpivarski
Copy link
Member

Fixed by #3193.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

No branches or pull requests

3 participants