Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool to generate a DL2 file from a DL1 file #599

Closed
wants to merge 20 commits into from
Closed

Tool to generate a DL2 file from a DL1 file #599

wants to merge 20 commits into from

Conversation

Bultako
Copy link
Collaborator

@Bultako Bultako commented Feb 23, 2021

This PR is the continuation of #573 developed in a forked repo and now with the code placed in a branch of lstchain repo.

It adds a new Tool lstchain_create_dl2_file adapting the existing code found in script scripts/lstchain_dl1_to_dl2.py

Generate a HDF5 file with reconstructed energy, disp and gammaness of events

Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--source-dependent
    Perform source dependent analysis
    Equivalent to: [--ReconstructionHDF5Writer.source_dependent=True]
-q, --quiet
    Disable console logging.
    Equivalent to: [--Tool.quiet=True]
-i, --input=<Path>
    Path to a DL1 HDF5 file
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.input]
-o, --output=<Path>
    Path where to store the reconstructed DL2
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.output_dir]
--energy-model=<Path>
    Path where to find the Energy trained RF model file
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.path_energy_model]
--disp-model=<Path>
    Path where to find the Disp trained RF model file
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.path_disp_model]
--gh-model=<Path>
    Path where to find the Gammaness trained RF model file
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.path_gh_model]
--config=<Path>
    name of a configuration file with parameters to load in addition to command-
    line parameters
    Default: None
    Equivalent to: [--Tool.config_file]
--log-level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
    Equivalent to: [--Tool.log_level]
-l, --log-file=<Path>
    Filename for the log
    Default: None
    Equivalent to: [--Tool.log_file]
--log-file-level=<Enum>
    Logging Level for File Logging
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 'INFO'
    Equivalent to: [--Tool.log_file_level]

Class options
=============
The command-line option below sets the respective configurable class-parameter:
    --Class.parameter=value
This line is evaluated in Python, so simple expressions are allowed.
For instance, to set `C.a=[0,1,2]`, you may type this:
    --C.a='range(3)'

Application(SingletonConfigurable) options
------------------------------------------
--Application.log_datefmt=<Unicode>
    The date format used by logging formatters for %(asctime)s
    Default: '%Y-%m-%d %H:%M:%S'
--Application.log_format=<Unicode>
    The Logging format template
    Default: '[%(name)s]%(highlevel)s %(message)s'
--Application.log_level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
--Application.show_config=<Bool>
    Instead of starting the Application, dump configuration to stdout
    Default: False
--Application.show_config_json=<Bool>
    Instead of starting the Application, dump configuration to stdout (as JSON)
    Default: False

Tool(Application) options
-------------------------
--Tool.config_file=<Path>
    name of a configuration file with parameters to load in addition to command-
    line parameters
    Default: None
--Tool.log_config=<key-1>=<value-1>...
    Default: {'version': 1, 'disable_existing_loggers': False, 'formatters...
--Tool.log_datefmt=<Unicode>
    The date format used by logging formatters for %(asctime)s
    Default: '%Y-%m-%d %H:%M:%S'
--Tool.log_file=<Path>
    Filename for the log
    Default: None
--Tool.log_file_level=<Enum>
    Logging Level for File Logging
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 'INFO'
--Tool.log_format=<Unicode>
    The Logging format template
    Default: '[%(name)s]%(highlevel)s %(message)s'
--Tool.log_level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
--Tool.provenance_log=<Path>
    Default: None
--Tool.quiet=<Bool>
    Default: False
--Tool.show_config=<Bool>
    Instead of starting the Application, dump configuration to stdout
    Default: False
--Tool.show_config_json=<Bool>
    Instead of starting the Application, dump configuration to stdout (as JSON)
    Default: False

ReconstructionHDF5Writer(Tool) options
--------------------------------------
--ReconstructionHDF5Writer.classification_features=<list-item-1>...
    List of classification features
    Default: []
--ReconstructionHDF5Writer.config_file=<Path>
    name of a configuration file with parameters to load in addition to command-
    line parameters
    Default: None
--ReconstructionHDF5Writer.events_filters=<key-1>=<value-1>...
    Dictionary with information to filter events
    Default: {}
--ReconstructionHDF5Writer.input=<Path>
    Path to a DL1 HDF5 file
    Default: None
--ReconstructionHDF5Writer.log_config=<key-1>=<value-1>...
    Default: {'version': 1, 'disable_existing_loggers': False, 'formatters...
--ReconstructionHDF5Writer.log_datefmt=<Unicode>
    The date format used by logging formatters for %(asctime)s
    Default: '%Y-%m-%d %H:%M:%S'
--ReconstructionHDF5Writer.log_file=<Path>
    Filename for the log
    Default: None
--ReconstructionHDF5Writer.log_file_level=<Enum>
    Logging Level for File Logging
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 'INFO'
--ReconstructionHDF5Writer.log_format=<Unicode>
    The Logging format template
    Default: '[%(name)s]%(highlevel)s %(message)s'
--ReconstructionHDF5Writer.log_level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
--ReconstructionHDF5Writer.output_dir=<Path>
    Path where to store the reconstructed DL2
    Default: None
--ReconstructionHDF5Writer.path_disp_model=<Path>
    Path where to find the Disp trained RF model file
    Default: None
--ReconstructionHDF5Writer.path_energy_model=<Path>
    Path where to find the Energy trained RF model file
    Default: None
--ReconstructionHDF5Writer.path_gh_model=<Path>
    Path where to find the Gammaness trained RF model file
    Default: None
--ReconstructionHDF5Writer.provenance_log=<Path>
    Default: None
--ReconstructionHDF5Writer.quiet=<Bool>
    Default: False
--ReconstructionHDF5Writer.regression_features=<list-item-1>...
    List of regression features
    Default: []
--ReconstructionHDF5Writer.show_config=<Bool>
    Instead of starting the Application, dump configuration to stdout
    Default: False
--ReconstructionHDF5Writer.show_config_json=<Bool>
    Instead of starting the Application, dump configuration to stdout (as JSON)
    Default: False
--ReconstructionHDF5Writer.source_dependent=<Bool>
    Is the analysis source dependent?
    Default: False

@codecov
Copy link

codecov bot commented Feb 23, 2021

Codecov Report

Attention: Patch coverage is 78.04878% with 27 lines in your changes missing coverage. Please review.

Project coverage is 83.16%. Comparing base (a217b5d) to head (78f1287).
Report is 2817 commits behind head on main.

Files with missing lines Patch % Lines
lstchain/tools/lstchain_create_dl2_file.py 77.17% 21 Missing ⚠️
lstchain/reco/dl1_to_dl2.py 70.00% 6 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main     #599    +/-   ##
========================================
  Coverage   83.16%   83.16%            
========================================
  Files          54       56     +2     
  Lines        4353     4455   +102     
========================================
+ Hits         3620     3705    +85     
- Misses        733      750    +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Bultako Bultako changed the title Tool to generate a DL2 file from a DL1 file. Tool to generate a DL2 file from a DL1 file Feb 23, 2021
@Bultako Bultako marked this pull request as draft February 25, 2021 15:34
@Bultako Bultako force-pushed the tool-dl2 branch 2 times, most recently from 0187642 to 96b1b7b Compare March 4, 2021 22:29

def setup(self):

self.log.info("Reading configuration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool already supports reading configuration files. Please don't add any new configuration file logic. Adapt the tool / config to work with the existing configuration file logic.

There should absolutely be no code to deal with configurations in here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We loose most of the advantages of using Tools if we again are not using it how intendend but reimplement a new configuration system on top of it.

self.log.info("Reading DL1 file")
self.data = pd.read_hdf(self.input, key=dl1_params_lstcam_key)
self.data = add_delta_t_key(self.data)
if self.configuration["source_dependent"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. this should be solved by adding a traitlet source_dependent = Bool().tag(config=True). and then just checking if self.source_dependent here.

Copy link
Member

@maxnoe maxnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not reimplement the configuration loading system.

Since many of the options are for functions / things further down, the best way would do convert those into Components as well. But that could be left for another PR. Having the tool is a good start, but it should use traitlets / the config system correctly, otherwise we're not much better off than with the scripts.

("i", "input"): "ReconstructionHDF5Writer.input",
("o", "output"): "ReconstructionHDF5Writer.output_dir",
"models": "ReconstructionHDF5Writer.path_models",
"config": "ReconstructionHDF5Writer.config_file",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this, Tool already comes with config traitlet for setting the configuration file

output_dir = traits.Path(
help="Path where to store the reconstructed DL2", file_ok=False
).tag(config=True)
config_file = traits.Path(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed, as Tool already implements the configuration handling.

self.data = pd.concat([self.data, data_src_dep], axis=1)

self.log.info("Reading RF models")
self.reg_energy = joblib.load(self.path_models / "reg_energy.sav")
Copy link
Member

@maxnoe maxnoe Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am not a big fan of setting an input directory and then have hard coded names for files in that input directory. I would go for three Path traitlets for each of the models. This is much more flexible for e.g. debugging / benchmarking different models / what not.

self.cls_gh = None
self.dl2 = None

def setup(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setup should only initialize components that need the configuration. Any actual work, like loading input data and so on should happen in run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you suggest moving everything that is now in setup to start?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the code checking the traitlets and so on belongs here, but I would move the loading of the data and the models.

Any actual work should be done in start rather than setup

self.data.az_tel = -np.pi / 2.0
self.data = filter_events(
self.data,
filters=self.configuration["events_filters"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a List traitlet for this and give filters=self.filters

self.data = filter_events(
self.data,
filters=self.configuration["events_filters"],
finite_params=self.configuration["regression_features"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, add a List traitlet and add self.regression_features.

self.data,
filters=self.configuration["events_filters"],
finite_params=self.configuration["regression_features"]
+ self.configuration["classification_features"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And again. Do not use self.configuration!

@Bultako Bultako force-pushed the tool-dl2 branch 2 times, most recently from 4f2caf3 to d1d009e Compare March 6, 2021 21:24
@Bultako
Copy link
Collaborator Author

Bultako commented Mar 8, 2021

Thanks @maxnoe for the code review.

I have tried to set the config file parameter as mandatory, and worked with and internal self._dict_conf dict instead. For event filters, regression features, classification features, I think it is better to declare that info in a config file than in traits input params. I have set up as traits input params the three model files used in training.

Otherwise, tests with data have been added and we have green light in the CI.
I'm marking this PR as ready for review.

@Bultako Bultako marked this pull request as ready for review March 8, 2021 09:23
@maxnoe
Copy link
Member

maxnoe commented Mar 8, 2021

and worked with and internal self._dict_conf dict instead.

This is exactly what I wanted to avoid! Still using a separate configuration system.

For event filters, regression features, classification features, I think it is better to declare that info in a config file than in traits input params

Traits define what can be in a config file! That's the whole point of them. That you can also set these via the command line is a bonus.

self.reg_disp_vector = None
self.cls_gh = None

self._dict_conf = replace_config(standard_config, self.get_current_config())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still using a separete config system. Please remove and use traitlets properly. Otherwise the switch to Tools is kind of pointless.

self.cls_gh,
self.reg_energy,
self.reg_disp_vector,
custom_config=self._dict_conf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be minimally invasive, construct custom_config from the traitlets here, but the better approach would be to implement a Component for apply_models.

@Bultako Bultako force-pushed the tool-dl2 branch 2 times, most recently from 5cd77c8 to 994a17a Compare March 30, 2021 16:48
@vuillaut
Copy link
Member

Closing as this is stalled.

@vuillaut vuillaut closed this Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants