-
Notifications
You must be signed in to change notification settings - Fork 0
Histogram Helpers #9
Comments
I've put a lot of thought into this and I think it is a good idea to have a clear delineation on when the user should use Python-based interaction with the data files and when they should use C++-based interaction. I think the clearest separation is on filling histograms. At this point in analysis, we transition from "heavy-duty" calculations to making plots "pretty" and so I think it is a good idea to intentionally avoid implementing a C++-based histogram filling tool. Instead, I think a Python module that helps the user fill histograms with numpy and serialize them with h5py is appropriate. This enforces the separation where C++ processors should be used to calculate new event objects while Python is used to fill histograms, merge them, and plot them. Notice that some calculations would be classified as "analysis", but instead of enforcing a binning decision at the Cpp level, we can encourage users to calculate their final analysis variables and put those variables into the event. Then fill and plot them later like Python. In the HEP arena, many users call this "ntuplizing" where the hierarchical data is falttened in order to make python analysis easier. The method with which fire serializes hierarchical data makes it already "flattened" but users can still have Cpp processors do analysis tasks like filtering, summing, etc... and create new event objects that can be accessed by a Python plotter. |
What type of histograms would be pooled here? Numpy?
…On Wed, Feb 2, 2022, 7:53 AM Tom Eichlersmith ***@***.***> wrote:
I've put a lot of thought into this and I think it is a good idea to have
a clear delineation on when the user should use Python-based interaction
with the data files and when they should use C++-based interaction. I think
the clearest separation is on filling histograms. At this point in
analysis, we transition from "heavy-duty" calculations to making plots
"pretty" and so I think it is a good idea to intentionally avoid
implementing a C++-based histogram filling tool.
Instead, I think a Python module that helps the user fill histograms with
numpy and serialize them with h5py is appropriate. This enforces the
separation where C++ processors should be used to calculate *new* event
objects while Python is used to fill histograms, merge them, and plot them.
Notice that some calculations would be classified as "analysis", but
instead of enforcing a binning decision at the Cpp level, we can encourage
users to calculate their final analysis variables and put those variables
into the event. Then fill and plot them later like Python. In the HEP
arena, many users call this "ntuplizing" where the hierarchical data is
falttened in order to make python analysis easier. The method with which
fire serializes hierarchical data makes it already "flattened" but users
can still have Cpp processors do analysis tasks like filtering, summing,
etc... and create new event objects that can be accessed by a Python
plotter.
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4JMXC5SF4HMTRSTVLMJF3UZFHORANCNFSM5MEXJFNA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sorry, to be clear, this issue was focused on potentially implementing a HistogramPool in the C++ processing chain. My comments above would shift the focus to having Python helpers for serializing and merging numpy histograms to/from hdf5 files. This would handle the use case of parallel histogram filling over a large data set and then merging the resulting histograms for final plotting. |
Name: TBD Goals:
Features:
Strict Dependencies
Optional Dependencies
|
I need to determine if fire should support a HistogramPool. This would significantly affect how a merging program #4 would operate and may not even be beneficial given how efficent h5py and numpy are on the analysis end.
The text was updated successfully, but these errors were encountered: