Tool to generate a DL2 file from a DL1 file #599

Bultako · 2021-02-23T13:21:03Z

This PR is the continuation of #573 developed in a forked repo and now with the code placed in a branch of lstchain repo.

It adds a new Tool lstchain_create_dl2_file adapting the existing code found in script scripts/lstchain_dl1_to_dl2.py

Generate a HDF5 file with reconstructed energy, disp and gammaness of events

Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--source-dependent
    Perform source dependent analysis
    Equivalent to: [--ReconstructionHDF5Writer.source_dependent=True]
-q, --quiet
    Disable console logging.
    Equivalent to: [--Tool.quiet=True]
-i, --input=<Path>
    Path to a DL1 HDF5 file
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.input]
-o, --output=<Path>
    Path where to store the reconstructed DL2
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.output_dir]
--energy-model=<Path>
    Path where to find the Energy trained RF model file
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.path_energy_model]
--disp-model=<Path>
    Path where to find the Disp trained RF model file
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.path_disp_model]
--gh-model=<Path>
    Path where to find the Gammaness trained RF model file
    Default: None
    Equivalent to: [--ReconstructionHDF5Writer.path_gh_model]
--config=<Path>
    name of a configuration file with parameters to load in addition to command-
    line parameters
    Default: None
    Equivalent to: [--Tool.config_file]
--log-level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
    Equivalent to: [--Tool.log_level]
-l, --log-file=<Path>
    Filename for the log
    Default: None
    Equivalent to: [--Tool.log_file]
--log-file-level=<Enum>
    Logging Level for File Logging
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 'INFO'
    Equivalent to: [--Tool.log_file_level]

Class options
=============
The command-line option below sets the respective configurable class-parameter:
    --Class.parameter=value
This line is evaluated in Python, so simple expressions are allowed.
For instance, to set `C.a=[0,1,2]`, you may type this:
    --C.a='range(3)'

Application(SingletonConfigurable) options
------------------------------------------
--Application.log_datefmt=<Unicode>
    The date format used by logging formatters for %(asctime)s
    Default: '%Y-%m-%d %H:%M:%S'
--Application.log_format=<Unicode>
    The Logging format template
    Default: '[%(name)s]%(highlevel)s %(message)s'
--Application.log_level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
--Application.show_config=<Bool>
    Instead of starting the Application, dump configuration to stdout
    Default: False
--Application.show_config_json=<Bool>
    Instead of starting the Application, dump configuration to stdout (as JSON)
    Default: False

Tool(Application) options
-------------------------
--Tool.config_file=<Path>
    name of a configuration file with parameters to load in addition to command-
    line parameters
    Default: None
--Tool.log_config=<key-1>=<value-1>...
    Default: {'version': 1, 'disable_existing_loggers': False, 'formatters...
--Tool.log_datefmt=<Unicode>
    The date format used by logging formatters for %(asctime)s
    Default: '%Y-%m-%d %H:%M:%S'
--Tool.log_file=<Path>
    Filename for the log
    Default: None
--Tool.log_file_level=<Enum>
    Logging Level for File Logging
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 'INFO'
--Tool.log_format=<Unicode>
    The Logging format template
    Default: '[%(name)s]%(highlevel)s %(message)s'
--Tool.log_level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
--Tool.provenance_log=<Path>
    Default: None
--Tool.quiet=<Bool>
    Default: False
--Tool.show_config=<Bool>
    Instead of starting the Application, dump configuration to stdout
    Default: False
--Tool.show_config_json=<Bool>
    Instead of starting the Application, dump configuration to stdout (as JSON)
    Default: False

ReconstructionHDF5Writer(Tool) options
--------------------------------------
--ReconstructionHDF5Writer.classification_features=<list-item-1>...
    List of classification features
    Default: []
--ReconstructionHDF5Writer.config_file=<Path>
    name of a configuration file with parameters to load in addition to command-
    line parameters
    Default: None
--ReconstructionHDF5Writer.events_filters=<key-1>=<value-1>...
    Dictionary with information to filter events
    Default: {}
--ReconstructionHDF5Writer.input=<Path>
    Path to a DL1 HDF5 file
    Default: None
--ReconstructionHDF5Writer.log_config=<key-1>=<value-1>...
    Default: {'version': 1, 'disable_existing_loggers': False, 'formatters...
--ReconstructionHDF5Writer.log_datefmt=<Unicode>
    The date format used by logging formatters for %(asctime)s
    Default: '%Y-%m-%d %H:%M:%S'
--ReconstructionHDF5Writer.log_file=<Path>
    Filename for the log
    Default: None
--ReconstructionHDF5Writer.log_file_level=<Enum>
    Logging Level for File Logging
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 'INFO'
--ReconstructionHDF5Writer.log_format=<Unicode>
    The Logging format template
    Default: '[%(name)s]%(highlevel)s %(message)s'
--ReconstructionHDF5Writer.log_level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
--ReconstructionHDF5Writer.output_dir=<Path>
    Path where to store the reconstructed DL2
    Default: None
--ReconstructionHDF5Writer.path_disp_model=<Path>
    Path where to find the Disp trained RF model file
    Default: None
--ReconstructionHDF5Writer.path_energy_model=<Path>
    Path where to find the Energy trained RF model file
    Default: None
--ReconstructionHDF5Writer.path_gh_model=<Path>
    Path where to find the Gammaness trained RF model file
    Default: None
--ReconstructionHDF5Writer.provenance_log=<Path>
    Default: None
--ReconstructionHDF5Writer.quiet=<Bool>
    Default: False
--ReconstructionHDF5Writer.regression_features=<list-item-1>...
    List of regression features
    Default: []
--ReconstructionHDF5Writer.show_config=<Bool>
    Instead of starting the Application, dump configuration to stdout
    Default: False
--ReconstructionHDF5Writer.show_config_json=<Bool>
    Instead of starting the Application, dump configuration to stdout (as JSON)
    Default: False
--ReconstructionHDF5Writer.source_dependent=<Bool>
    Is the analysis source dependent?
    Default: False

codecov · 2021-02-23T13:29:55Z

Codecov Report

Attention: Patch coverage is 78.04878% with 27 lines in your changes missing coverage. Please review.

Project coverage is 83.16%. Comparing base (a217b5d) to head (78f1287).
Report is 2817 commits behind head on main.

Files with missing lines	Patch %	Lines
lstchain/tools/lstchain_create_dl2_file.py	77.17%	21 Missing ⚠️
lstchain/reco/dl1_to_dl2.py	70.00%	6 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main     #599    +/-   ##
========================================
  Coverage   83.16%   83.16%            
========================================
  Files          54       56     +2     
  Lines        4353     4455   +102     
========================================
+ Hits         3620     3705    +85     
- Misses        733      750    +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lstchain/tools/lstchain_create_dl2_file.py

maxnoe · 2021-03-05T08:41:34Z

lstchain/tools/lstchain_create_dl2_file.py

+
+ def setup(self):
+
+ self.log.info("Reading configuration")


Tool already supports reading configuration files. Please don't add any new configuration file logic. Adapt the tool / config to work with the existing configuration file logic.

There should absolutely be no code to deal with configurations in here.

We loose most of the advantages of using Tools if we again are not using it how intendend but reimplement a new configuration system on top of it.

maxnoe · 2021-03-05T08:42:57Z

lstchain/tools/lstchain_create_dl2_file.py

+ self.log.info("Reading DL1 file")
+ self.data = pd.read_hdf(self.input, key=dl1_params_lstcam_key)
+ self.data = add_delta_t_key(self.data)
+ if self.configuration["source_dependent"]:


E.g. this should be solved by adding a traitlet source_dependent = Bool().tag(config=True). and then just checking if self.source_dependent here.

maxnoe

Please do not reimplement the configuration loading system.

Since many of the options are for functions / things further down, the best way would do convert those into Components as well. But that could be left for another PR. Having the tool is a good start, but it should use traitlets / the config system correctly, otherwise we're not much better off than with the scripts.

maxnoe · 2021-03-05T08:44:46Z

lstchain/tools/lstchain_create_dl2_file.py

+ ("i", "input"): "ReconstructionHDF5Writer.input",
+ ("o", "output"): "ReconstructionHDF5Writer.output_dir",
+ "models": "ReconstructionHDF5Writer.path_models",
+ "config": "ReconstructionHDF5Writer.config_file",


No need for this, Tool already comes with config traitlet for setting the configuration file

maxnoe · 2021-03-05T08:45:11Z

lstchain/tools/lstchain_create_dl2_file.py

+ output_dir = traits.Path(
+ help="Path where to store the reconstructed DL2", file_ok=False
+ ).tag(config=True)
+ config_file = traits.Path(


This is not needed, as Tool already implements the configuration handling.

maxnoe · 2021-03-05T08:46:47Z

lstchain/tools/lstchain_create_dl2_file.py

+ self.data = pd.concat([self.data, data_src_dep], axis=1)
+
+ self.log.info("Reading RF models")
+ self.reg_energy = joblib.load(self.path_models / "reg_energy.sav")


In general, I am not a big fan of setting an input directory and then have hard coded names for files in that input directory. I would go for three Path traitlets for each of the models. This is much more flexible for e.g. debugging / benchmarking different models / what not.

maxnoe · 2021-03-05T08:47:47Z

lstchain/tools/lstchain_create_dl2_file.py

+ self.cls_gh = None
+ self.dl2 = None
+
+ def setup(self):


I think setup should only initialize components that need the configuration. Any actual work, like loading input data and so on should happen in run.

do you suggest moving everything that is now in setup to start?

No, the code checking the traitlets and so on belongs here, but I would move the loading of the data and the models.

Any actual work should be done in start rather than setup

maxnoe · 2021-03-05T08:48:39Z

lstchain/tools/lstchain_create_dl2_file.py

+ self.data.az_tel = -np.pi / 2.0
+ self.data = filter_events(
+ self.data,
+ filters=self.configuration["events_filters"],


Add a List traitlet for this and give filters=self.filters

maxnoe · 2021-03-05T08:49:07Z

lstchain/tools/lstchain_create_dl2_file.py

+ self.data = filter_events(
+ self.data,
+ filters=self.configuration["events_filters"],
+ finite_params=self.configuration["regression_features"]


Same here, add a List traitlet and add self.regression_features.

maxnoe · 2021-03-05T08:49:25Z

lstchain/tools/lstchain_create_dl2_file.py

+ self.data,
+ filters=self.configuration["events_filters"],
+ finite_params=self.configuration["regression_features"]
+ + self.configuration["classification_features"],


And again. Do not use self.configuration!

Bultako · 2021-03-08T09:23:41Z

Thanks @maxnoe for the code review.

I have tried to set the config file parameter as mandatory, and worked with and internal self._dict_conf dict instead. For event filters, regression features, classification features, I think it is better to declare that info in a config file than in traits input params. I have set up as traits input params the three model files used in training.

Otherwise, tests with data have been added and we have green light in the CI.
I'm marking this PR as ready for review.

maxnoe · 2021-03-08T09:31:33Z

and worked with and internal self._dict_conf dict instead.

This is exactly what I wanted to avoid! Still using a separate configuration system.

For event filters, regression features, classification features, I think it is better to declare that info in a config file than in traits input params

Traits define what can be in a config file! That's the whole point of them. That you can also set these via the command line is a bonus.

maxnoe · 2021-03-08T09:33:32Z

lstchain/tools/lstchain_create_dl2_file.py

+ self.reg_disp_vector = None
+ self.cls_gh = None
+
+ self._dict_conf = replace_config(standard_config, self.get_current_config())


Still using a separete config system. Please remove and use traitlets properly. Otherwise the switch to Tools is kind of pointless.

lstchain/tools/lstchain_create_dl2_file.py

maxnoe · 2021-03-08T09:37:36Z

lstchain/tools/lstchain_create_dl2_file.py

+ self.cls_gh,
+ self.reg_energy,
+ self.reg_disp_vector,
+ custom_config=self._dict_conf,


If you want to be minimally invasive, construct custom_config from the traitlets here, but the better approach would be to implement a Component for apply_models.

vuillaut · 2024-09-10T15:52:16Z

Closing as this is stalled.

rlopezcoto mentioned this pull request Feb 23, 2021

Tool to generate a DL2 file from a DL1 file #573

Closed

Bultako changed the title ~~Tool to generate a DL2 file from a DL1 file.~~ Tool to generate a DL2 file from a DL1 file Feb 23, 2021

Bultako force-pushed the tool-dl2 branch from 6043b60 to 8313c38 Compare February 25, 2021 15:19

Bultako marked this pull request as draft February 25, 2021 15:34

rlopezcoto mentioned this pull request Mar 3, 2021

Script for creating run summary #614

Merged

Bultako force-pushed the tool-dl2 branch 2 times, most recently from 0187642 to 96b1b7b Compare March 4, 2021 22:29

maxnoe reviewed Mar 5, 2021

View reviewed changes

lstchain/tools/lstchain_create_dl2_file.py Outdated Show resolved Hide resolved

maxnoe reviewed Mar 5, 2021

View reviewed changes

maxnoe requested changes Mar 5, 2021

View reviewed changes

maxnoe reviewed Mar 5, 2021

View reviewed changes

Bultako force-pushed the tool-dl2 branch 2 times, most recently from 4f2caf3 to d1d009e Compare March 6, 2021 21:24

Bultako marked this pull request as ready for review March 8, 2021 09:23

maxnoe reviewed Mar 8, 2021

View reviewed changes

lstchain/tools/lstchain_create_dl2_file.py Outdated Show resolved Hide resolved

maxnoe reviewed Mar 8, 2021

View reviewed changes

lstchain/tools/lstchain_create_dl2_file.py Outdated Show resolved Hide resolved

maxnoe reviewed Mar 8, 2021

View reviewed changes

lstchain/tools/lstchain_create_dl2_file.py Outdated Show resolved Hide resolved

maxnoe reviewed Mar 8, 2021

View reviewed changes

lstchain/tools/lstchain_create_dl2_file.py Outdated Show resolved Hide resolved

maxnoe reviewed Mar 8, 2021

View reviewed changes

chaimain mentioned this pull request Mar 17, 2021

Adds an event filter Component #656

Merged

Bultako marked this pull request as draft March 20, 2021 17:29

Bultako force-pushed the tool-dl2 branch from ce271b2 to 092840c Compare March 22, 2021 08:21

Bultako added 19 commits March 30, 2021 16:56

add initial skeleton file

9b0359f

add setup

b9f5a62

add apply models

4bff53d

adding generation of dl2 file

b464be9

use tuples in aliases

0a8ac46

adapt tool to last modifs done in dl1_to_dl2 script

b58bfe0

adapt code for source dependent analysis

91db809

add simple test

ff7141a

add traits instead of additional config vars

038dc72

add flag param and remove control on events filters

dea930a

remove __init__

673009b

adapt config json files

e547c31

adapt scripts using dl1_to_dl2 params

af5171b

use self.param instead of self.config["param"]

dbccdb2

address code review

1391b51

add specific config file for tool

1b0d9e7

fix test coverage in CI

064fe36

add source dependent param to config files

cf46731

add focal_length param to apply_models

b79cc4b

Bultako force-pushed the tool-dl2 branch 2 times, most recently from 5cd77c8 to 994a17a Compare March 30, 2021 16:48

use EventSelector component

78f1287

Bultako force-pushed the tool-dl2 branch from 994a17a to 78f1287 Compare March 30, 2021 17:03

Bultako mentioned this pull request Sep 8, 2021

Tool to produce a pedestal FITS file #601

Closed

rlopezcoto mentioned this pull request Nov 22, 2021

Keep only relevant config metadata for each stage #795

Open

vuillaut closed this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool to generate a DL2 file from a DL1 file #599

Tool to generate a DL2 file from a DL1 file #599

Bultako commented Feb 23, 2021 •

edited

Loading

codecov bot commented Feb 23, 2021 •

edited

Loading

maxnoe Mar 5, 2021

maxnoe Mar 5, 2021

maxnoe Mar 5, 2021

maxnoe left a comment •

edited

Loading

maxnoe Mar 5, 2021

maxnoe Mar 5, 2021

maxnoe Mar 5, 2021 •

edited

Loading

maxnoe Mar 5, 2021

Bultako Mar 8, 2021

maxnoe Mar 9, 2021

maxnoe Mar 5, 2021

maxnoe Mar 5, 2021

maxnoe Mar 5, 2021

Bultako commented Mar 8, 2021 •

edited

Loading

maxnoe commented Mar 8, 2021 •

edited

Loading

maxnoe Mar 8, 2021

maxnoe Mar 8, 2021

vuillaut commented Sep 10, 2024

Tool to generate a DL2 file from a DL1 file #599

Tool to generate a DL2 file from a DL1 file #599

Conversation

Bultako commented Feb 23, 2021 • edited Loading

codecov bot commented Feb 23, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxnoe left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxnoe Mar 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bultako commented Mar 8, 2021 • edited Loading

maxnoe commented Mar 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuillaut commented Sep 10, 2024

Bultako commented Feb 23, 2021 •

edited

Loading

codecov bot commented Feb 23, 2021 •

edited

Loading

maxnoe left a comment •

edited

Loading

maxnoe Mar 5, 2021 •

edited

Loading

Bultako commented Mar 8, 2021 •

edited

Loading

maxnoe commented Mar 8, 2021 •

edited

Loading