Skip to content
This repository has been archived by the owner on Sep 6, 2024. It is now read-only.

Histogram Helpers #9

Open
tomeichlersmith opened this issue Jan 17, 2022 · 4 comments
Open

Histogram Helpers #9

tomeichlersmith opened this issue Jan 17, 2022 · 4 comments
Labels
enhancement New feature or request
Milestone

Comments

@tomeichlersmith
Copy link
Member

I need to determine if fire should support a HistogramPool. This would significantly affect how a merging program #4 would operate and may not even be beneficial given how efficent h5py and numpy are on the analysis end.

@tomeichlersmith
Copy link
Member Author

I've put a lot of thought into this and I think it is a good idea to have a clear delineation on when the user should use Python-based interaction with the data files and when they should use C++-based interaction. I think the clearest separation is on filling histograms. At this point in analysis, we transition from "heavy-duty" calculations to making plots "pretty" and so I think it is a good idea to intentionally avoid implementing a C++-based histogram filling tool.

Instead, I think a Python module that helps the user fill histograms with numpy and serialize them with h5py is appropriate. This enforces the separation where C++ processors should be used to calculate new event objects while Python is used to fill histograms, merge them, and plot them.

Notice that some calculations would be classified as "analysis", but instead of enforcing a binning decision at the Cpp level, we can encourage users to calculate their final analysis variables and put those variables into the event. Then fill and plot them later like Python. In the HEP arena, many users call this "ntuplizing" where the hierarchical data is falttened in order to make python analysis easier. The method with which fire serializes hierarchical data makes it already "flattened" but users can still have Cpp processors do analysis tasks like filtering, summing, etc... and create new event objects that can be accessed by a Python plotter.

@omar-moreno
Copy link
Member

omar-moreno commented Feb 2, 2022 via email

@tomeichlersmith
Copy link
Member Author

Sorry, to be clear, this issue was focused on potentially implementing a HistogramPool in the C++ processing chain.

My comments above would shift the focus to having Python helpers for serializing and merging numpy histograms to/from hdf5 files. This would handle the use case of parallel histogram filling over a large data set and then merging the resulting histograms for final plotting.

@tomeichlersmith tomeichlersmith changed the title Histogram Pool Histogram Helpers Feb 2, 2022
@tomeichlersmith tomeichlersmith added the enhancement New feature or request label Feb 11, 2022
@tomeichlersmith tomeichlersmith added this to the v1.0.0 milestone Aug 19, 2022
@tomeichlersmith
Copy link
Member Author

Name: TBD

@AnmolS1Z

Goals:

  • Python package that is "easily" distributed
  • Some executables that do common tasks
  • Library available for more complicated tasks

Features:

  • "Extend" numpy.histogram to allow h5py.DataSet as an input
  • Merge 2+ numpy.histograms checking that they have the same bin edges
  • Write and read histogram objects to/from HDF5 files (via h5py)

Strict Dependencies

  • numpy
  • h5py

Optional Dependencies

  • matplotlib
  • pickle

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants