Add functionality for array-event-wise aggregation of dl1 image parameters #2497

LukasBeiske · 2024-01-19T15:08:34Z

To define event types based on the quality of the direction reconstruction, machine learning models which predict the error of the direction reconstruction for array events are needed. Such models use the mean, standard deviation, maximum, and minimum value of some DL1 image parameters as input. This adds the functionality to compute these input features, while #2503 adds the reconstruction of the error itself.

To be more specific:

This adds a new Component and a Tool for aggregating the telescope-wise dl1 image parameters for each array event.
BaseStatisticsContainer is introduced which does not contain the higher order moments present in StatisticsContainer
The helper functions for vectorizing numpy calculations in ctapipe.reco.stereo_combination are refactored and expanded in a new module called ctapipe.vectorization

TODO:

Add/ update unit tests

codecov · 2024-01-23T13:19:29Z

Codecov Report

Attention: Patch coverage is 91.04478% with 30 lines in your changes are missing coverage. Please review.

Project coverage is 92.47%. Comparing base (e2a848e) to head (0fcecaa).
Report is 15 commits behind head on main.

❗ Current head 0fcecaa differs from pull request most recent head 16f184e. Consider uploading reports for the commit 16f184e to get more accurate results

Files	Patch %	Lines
src/ctapipe/vectorization/aggregate.py	63.49%	23 Missing ⚠️
src/ctapipe/image/statistics.py	91.07%	5 Missing ⚠️
src/ctapipe/tools/aggregate_features.py	94.87%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2497      +/-   ##
==========================================
- Coverage   92.66%   92.47%   -0.19%     
==========================================
  Files         232      242      +10     
  Lines       20220    20268      +48     
==========================================
+ Hits        18736    18743       +7     
- Misses       1484     1525      +41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kosack · 2024-01-25T10:25:22Z

Can you explain better what this is and why is needed in the description? What does it mean to "Aggregate DL1 parameters"? and what is being aggregated over (over telescopes in an event, or over time, both?)

Tobychev · 2024-01-25T12:13:06Z

What does it mean to "Aggregate DL1 parameters"? and what is being aggregated over (over telescopes in an event, or over time, both?)

It looks like it groups things by trigger, so I guess it is for getting average intensity and width and such, maybe?

kosack · 2024-01-25T15:46:01Z

I don't quite understand how this will be used, but perhaps I need to see an example. I imagine when you say "mean", you really mean "weighted mean", weighted by the impact parameter and intensity, so basically the same as the "mean reduced scaled {width, length}" used in standard non-machine-learning gamma-hadron separation methods in HESS and VERITAS (but not MARS, if I understand correctly)? If that's true, we've basically come full-circle and re-invented those.

I only ask because the original reason we decided not to compute these mean-reduced-scaled parameters was that they were not needed, as using telescope-wise machine-learning models were better for predicting such a quantity (like "single-telescope reconstruction error"), which can then be merged in a weighted average like we do for energy and gammaness. So are we now undoing that design decision?

LukasBeiske · 2024-01-25T16:28:28Z

which can then be merged in a weighted average like we do for energy and gammaness. So are we now undoing that design decision?

AFAIK these averaged features are only to be used as input for the model estimating the error of the direction reconstruction. While the event type based on the background contamination will be using the averaged gammaness scores predicted by the telescope models.
Therefore, it would also be possible to only compute these averaged features on-the-fly when training and applying this direction reconstruction error estimator, similar to the existing FeatureGenerator.

The separate tool to calculate and write these averaged features into files is only meant to keep this separate from the implementation of the direction reconstruction error estimator itself. If there really is no other use for these averaged features, the on-the-fly solution might be better.

maxnoe · 2024-01-25T17:26:16Z

@kosack this is part of the implementation of the event types based approach developed by @TarekHC and @JBernete.

It works by training a machine learning regressor on feature table for subarray events. Part of that feature table are multiplicities, the reconstructed values themselves but also these aggregated image features over the telescopes.

kosack · 2024-01-25T17:33:56Z

AFAIK these averaged features are only to be used as input for the model estimating the error of the direction reconstruction. While the event type based on the background contamination will be using the averaged gammaness scores predicted by the telescope models.

Yes, that I understand, the question is more why is being done using mean-scaled array parameters rather than using per-telescope estimators that are then combined, which is how we do the other machine learning methods. At least then all ML methods would be symmetric and there would be no need for this extra complexity.

In fact, I really expected to see something that leverages how the reconstruction algorithm actually works, for example computing some aggregate parameters not on the telescopes, but on the telescope pairs used in reconstruction. For example, computing the mean/std/min/max of the delta-phi angle between image pairs. Wouldn't that have much more power to estimate the uncertainty? To first order, the reconstruction error for a 2-telescope pair is just related to this angle: small is poorly reconstructed, and large is better reconstructed, which is why we weight by cos(delta_phi)--implicitly via the cross-product--in the algorithm itself, which leads to lower reconstruction uncertainty.

Anyhow, it might not be too hard to add pair-wise aggregation to this as well if needed.

kosack · 2024-01-25T17:41:12Z

src/ctapipe/image/statistics.py

+ """Fill event container with aggregated image parameters."""
+ table = None
+ for tel_id in event.trigger.tels_with_trigger:
+ t = collect_features(event, tel_id)


It's possible I missed it, but you are collecting features from all telescope in the event, but shouldn't it include only those used in reconstruction? In other words, instead of tels_with_trigger, you should have something like event.dl2.stereo.geometry.telescopes

Right now this is done via a quality query in aggregate_table, which should use the same quality criteria as the reconstruction for which the error will be estimated. But if this is integrated into the train/apply tool for the error estimator, doing it that way would be better, I agree.

LukasBeiske · 2024-01-29T17:03:16Z

codecov fails because of the new numba jitted code and the already existing numba functions in image.statistics

LukasBeiske · 2024-01-30T11:49:31Z

In fact, I really expected to see something that leverages how the reconstruction algorithm actually works, for example computing some aggregate parameters not on the telescopes, but on the telescope pairs used in reconstruction. For example, computing the mean/std/min/max of the delta-phi angle between image pairs. Wouldn't that have much more power to estimate the uncertainty? To first order, the reconstruction error for a 2-telescope pair is just related to this angle: small is poorly reconstructed, and large is better reconstructed, which is why we weight by cos(delta_phi)--implicitly via the cross-product--in the algorithm itself, which leads to lower reconstruction uncertainty.

Since this is specific to the reconstruction algorithm, it might be better to do something like this separately, as part of the reconstruction algorithm itself, or as part of the error estimator in #2503. In the latter case, delta_phi_{mean,std,max,min} could than be used as additional input features for the estimator and it then might be interesting to see how much more accurate the output of the estimator is compared to just using delta_phi_mean as error estimate.

I can't think of a way to calculate these angles efficiently on tabular data, so integrating it into the reconstructor might be best.

LukasBeiske · 2024-02-23T14:04:58Z

flake8 fails now because of already existing lines being to long. Is this something I should fix here, or would it be better to fix this everywhere in a separate PR?

…torization functions

LukasBeiske marked this pull request as draft January 19, 2024 15:08

LukasBeiske force-pushed the feature_aggregator branch from aabf989 to 9289020 Compare January 24, 2024 19:23

kosack mentioned this pull request Jan 25, 2024

Angular error regressor #2503

Draft

LukasBeiske force-pushed the feature_aggregator branch from 9289020 to b194d58 Compare January 25, 2024 15:20

kosack reviewed Jan 25, 2024

View reviewed changes

LukasBeiske marked this pull request as ready for review January 29, 2024 16:54

LukasBeiske force-pushed the feature_aggregator branch from 0fcecaa to 270ffda Compare February 23, 2024 10:58

LukasBeiske force-pushed the feature_aggregator branch 2 times, most recently from 270ffda to 5d2294f Compare February 23, 2024 16:22

LukasBeiske added 11 commits April 16, 2024 09:42

Add component for aggregating dl1 features

14a7638

Refactor numpy vectorization functions into own module

4874ce4

Add dl1 aggregates to io; add separete tool; add it to ctapipe-process

81f6104

Add changelog

d7b8650

Add BaseStatisticsContainer to __all__

7072b10

Add module docstrings; do not overwrite python built-ins

c49cd53

Update TableLoader

07c3bda

Update DataWriter

679fc60

Move collect_features into vectorization module

4a442dc

Add numba function to replace np.unique and other optimization of vec…

b7d4c9f

…torization functions

Iterate over dl1 tels not tels_with_trigger

43cb241

LukasBeiske added 3 commits April 16, 2024 09:42

Add tests

7f258fc

Move error for empty image_parameters trait into FeatureAggregator

df268de

Add additional test for process tool

65791a6

LukasBeiske force-pushed the feature_aggregator branch from 5d2294f to 65791a6 Compare April 16, 2024 07:50

Fix typos

16f184e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality for array-event-wise aggregation of dl1 image parameters #2497

Add functionality for array-event-wise aggregation of dl1 image parameters #2497

LukasBeiske commented Jan 19, 2024 •

edited

Loading

codecov bot commented Jan 23, 2024 •

edited

Loading

kosack commented Jan 25, 2024 •

edited

Loading

Tobychev commented Jan 25, 2024 •

edited

Loading

kosack commented Jan 25, 2024 •

edited

Loading

LukasBeiske commented Jan 25, 2024 •

edited

Loading

maxnoe commented Jan 25, 2024

kosack commented Jan 25, 2024 •

edited

Loading

kosack Jan 25, 2024

LukasBeiske Jan 25, 2024 •

edited

Loading

LukasBeiske commented Jan 29, 2024

LukasBeiske commented Jan 30, 2024 •

edited

Loading

LukasBeiske commented Feb 23, 2024 •

edited

Loading

Add functionality for array-event-wise aggregation of dl1 image parameters #2497

Are you sure you want to change the base?

Add functionality for array-event-wise aggregation of dl1 image parameters #2497

Conversation

LukasBeiske commented Jan 19, 2024 • edited Loading

codecov bot commented Jan 23, 2024 • edited Loading

Codecov Report

kosack commented Jan 25, 2024 • edited Loading

Tobychev commented Jan 25, 2024 • edited Loading

kosack commented Jan 25, 2024 • edited Loading

LukasBeiske commented Jan 25, 2024 • edited Loading

maxnoe commented Jan 25, 2024

kosack commented Jan 25, 2024 • edited Loading

kosack Jan 25, 2024

Choose a reason for hiding this comment

LukasBeiske Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

LukasBeiske commented Jan 29, 2024

LukasBeiske commented Jan 30, 2024 • edited Loading

LukasBeiske commented Feb 23, 2024 • edited Loading

LukasBeiske commented Jan 19, 2024 •

edited

Loading

codecov bot commented Jan 23, 2024 •

edited

Loading

kosack commented Jan 25, 2024 •

edited

Loading

Tobychev commented Jan 25, 2024 •

edited

Loading

kosack commented Jan 25, 2024 •

edited

Loading

LukasBeiske commented Jan 25, 2024 •

edited

Loading

kosack commented Jan 25, 2024 •

edited

Loading

LukasBeiske Jan 25, 2024 •

edited

Loading

LukasBeiske commented Jan 30, 2024 •

edited

Loading

LukasBeiske commented Feb 23, 2024 •

edited

Loading