Replies: 5 comments 11 replies
-
Reproducing the lines of code here for less clicking, etc. Here's the nested ak.where stuff from above: jj_mass = ak.where(
jjb_triplets.i0.btagDeepFlavB > btagwpl,
(jjb_triplets.i1 + jjb_triplets.i2).mass,
(
ak.where(
jjb_triplets.i1.btagDeepFlavB > btagwpl,
(jjb_triplets.i0 + jjb_triplets.i2).mass,
(jjb_triplets.i0 + jjb_triplets.i1).mass,
)
),
) Then in the argcombo / local_index: # Attach the local index to the lepton objects
lep_collection["idx"] = ak.local_index(lep_collection, axis=1)
# Make all pairs of leptons
ll_pairs = ak.combinations(lep_collection, 2, fields=["l0", "l1"])
ll_pairs_idx = ak.argcombinations(lep_collection, 2, fields=["l0", "l1"])
# Check each pair to see how far it is from the Z
dist_from_z_all_pairs = abs((ll_pairs.l0 + ll_pairs.l1).mass - 91.2)
# Mask out the pairs that are not SFOS (so that we don't include them when finding the one that's closest to Z)
# And then of the SFOS pairs, get the index of the one that's cosest to the Z
sfos_mask = ll_pairs.l0.pdgId == -ll_pairs.l1.pdgId
dist_from_z_sfos_pairs = ak.mask(dist_from_z_all_pairs, sfos_mask)
sfos_pair_closest_to_z_idx = ak.argmin(dist_from_z_sfos_pairs, axis=-1, keepdims=True)
# Construct a mask (of the shape of the original lep array) corresponding to the leps that are part of the Z candidate
mask = lep_collection.idx == ak.flatten(ll_pairs_idx.l0[sfos_pair_closest_to_z_idx])
mask = mask | (
lep_collection.idx == ak.flatten(ll_pairs_idx.l1[sfos_pair_closest_to_z_idx])
)
mask = ak.fill_none(mask, False) |
Beta Was this translation helpful? Give feedback.
-
For case 1, this reminds me also of the FuncADL Q6 problem:
The latter plot is implemented with maxBtag = np.maximum(
trijet.j1.btag, np.maximum(trijet.j2.btag, trijet.j3.btag,),
) much like the jjb_triplets = ak.stack(ak.combinations(jet_collection, 3))
# remove any triplets with more or less than 1 b
jjb_triplets = jjb_triplets[ak.sum(jjb_triplets.btagDeepFlavB > btagwpl, axis=2) == 1]
# select b jet (flatten the singletons array after masking)
is_b_candidate = jjb_triplets.btagDeepFlavB > btagwpl
b_cand = ak.flatten(jjb_triplets[is_b_candidate], axis=2)
# compute mass of remaining two jets (sum() is a coffea vector mixin method)
jj_mass = jjb_triplets[~is_b_candidate].sum().mass An implementation for tmp = ak.combinations(jet_collection, 3)
jjb_triplets = ak.concatenate([tmp.i0, tmp.i1, tmp.i2], axis=-1) |
Beta Was this translation helpful? Give feedback.
-
There's a general class of "iterate until converged" problems that are all hard for array-oriented programming. Here is my write-up of it and a set of exercises for working through it, all in pure NumPy/no Awkward. |
Beta Was this translation helpful? Give feedback.
-
Just to bring this out of slack: I was thinking about combinatorics embedded DSLs and it occurred to be that something that might make sense is an extension to (or something heavily inspired by) the einsum language. ak.combexpr(
leptons,
"f((i,j)),g((m,n,k))->k != i,j, m,n == i,j",
f=select_dileptons,
g=select_third_lepton,
) could easily be the majority of code in ADL benchmark #8. I think this could be expanded to multiple inputs: |
Beta Was this translation helpful? Give feedback.
-
Brining another problem to here from Slack: say you want to select a two jets in each event with different requirements. For example,
The catch is that instead of throwing out events that don't have such qualifying jets in the zeroth and first indices, you would move on to the next indices to check the requirements there. @nsmith- helped solve this one... |
Beta Was this translation helpful? Give feedback.
-
This post is meant to start a discussion about problems that are challenging to solve in columnar ways, as discussed at the 6/5/23 coffea users' meeting.
To start things off, here are two examples of cases that have been challenging for me during various projects over the past couple years:
ak.where
statements. This came up for me a few times when I was working on a project that involved trying to identify hadronic tops. An example line is here (conceptually, this line is just trying to find the mass of the two jets in the triplet that are not the b jet). For me, nestedak.where
statements are difficult because there is too much happening on a single line (it's hard to write, hard to read, and hard to debug). Might have been better to find the index of the b in each triplet and construction a mask based on the indices, but then we run into the challenge described in 2.arg
) to keep track of. An example came up when I was trying to implement some WWZ 4l selection. I was trying to get a hold of the leptons that are consistent with coming from the Z and get a hold of the leptons that are consistent with coming from the Ws. This might be just me, but for me columnar stuff is much easier for cases where you want to ask something like "is there a same flavor opposite sign pair close to the Z" than it is for cases where you actually want to get a hold of those leptons. For the former case, you can use things likemax
ormin
orany
. But for the latter case, you have to get a hold of the indices and to me, that makes the code both harder to think about and harder to read. E.g. for the WWZ stuff, this usage ofak.argcombinations
andak.local_index
is not very intuitive or readable for me (even though I implemented it just a few weeks ago, at this point I'd have to put in a bunch of print statements to remind myself which indices correspond to what). But again, I might have been missing a more straightforward solution to this problem.Looking forward to hearing if anyone has any thoughts on these challenges and to hearing about other problems that others find to be difficult to solve with the columnar approach.
Beta Was this translation helpful? Give feedback.
All reactions