Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to write default config to Tool and Component. #2378

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ccossou
Copy link
Contributor

@ccossou ccossou commented Jul 6, 2023

After discussions wit Karl and Tomas, I worked on a way to extract default configuration .yaml file, recursively, directly from the class, without having to instanciate anything.

This is not yet ready for merging, I create the pull request more for discussing what are your thoughts and requests to make this usefull.

Current issues are:

  • We use the default value of EVERY traits in every class in the hierarchy of a given Tool or Component.
  • When default value is undefined, the default config is then non valid and make it difficult to understand what value we're supposed to use, on top of making the yaml file invalid too, because Undefined can't be a valid input.

Knowing that:

  • Will this functionality ever be useful?
  • If so, how do we deal with undefined, does that mean that, apart some key parameters such as input file, we force a valid parameter as default?

Here is how you can create a yaml config file from a given class:

from ctapipe.tools.process import ProcessorTool
from ctapipe.reco.sklearn import EnergyRegressor, ParticleClassifier

ProcessorTool.write_default_config()
EnergyRegressor.write_default_config()
ParticleClassifier.write_default_config()

ProcessorTool.yml
Example config file for cta-process is attached. The filename is the default name attributed if none is given to .write_default_config()

TODO:

  • Another thing missing is that the options for on-the-fly constructed classes do not appear, e.g. there is an option image_cleaner_type (Add method to write default config to Tool and Component. #2378 (comment))
  • better default value using dynamic generator "get_default_generator")
  • probably a bug for quality query (need to check), but I need to write a list of list, and not a list of tuple (that is not correctly read by yaml

Done:

  • each class has help info as well (a docstring), so you could add that as a comment? (request by Karl)
  • Add parameter type at the top of the corresponding help section above.

@ccossou ccossou requested a review from kosack July 6, 2023 11:40
@ccossou ccossou marked this pull request as draft July 6, 2023 11:44
@ccossou
Copy link
Contributor Author

ccossou commented Jul 7, 2023

In the new commits, I commented every parameter that is "traitlets.Undefined" by default. So this gives the user an idea of the possibilities, without having to changes too many parameters for no added value.

@kosack
Copy link
Contributor

kosack commented Jul 7, 2023

can you add a space before each class? Or better yet, each class has help info as well (a docstring), so you could add that as a comment?

@ccossou
Copy link
Contributor Author

ccossou commented Jul 7, 2023

What do you mean by "space"? In the recent commits, I added a newline before every class to make sure it's not close to the previous parameter.

As for the class comment, this is certainly possible but a bit more difficult to do. I'll add that to the todo-list.

@kosack
Copy link
Contributor

kosack commented Jul 7, 2023

In the attached file above, there is a space missing before each class:
image
Though I haven't tried running it, so maybe that is out of date?

@kosack
Copy link
Contributor

kosack commented Jul 7, 2023

Another thing missing is that the options for on-the-fly constructed classes do not appear, e.g. there is an option image_cleaner_type which is defined like:

  image_cleaner_type = ComponentName(
        ImageCleaner, default_value="TailcutsImageCleaner"
    ).tag(config=True)

You should then show the options for ImageCleaner and all of it's non-abstract subclasses (e.g. [ImageCleaner, non_abstract_subclasses(ImageCleaner)])

@ccossou
Copy link
Contributor Author

ccossou commented Jul 7, 2023

I've updated the example file at the top.

Now:

  • Newline before classes
  • summary line of docstring is now help message of class section
  • help string (either docstring or trait.help is now cleaned of indentation and newline for better formatting of the yml file).

@maxnoe
Copy link
Member

maxnoe commented Jul 11, 2023

The traitlets application already has a method generate_config_file that seems to do the same, why do we need custom code here?

@ccossou
Copy link
Contributor Author

ccossou commented Jul 12, 2023

This is a good question.

Because:

  1. This function from traitlets generate a string that correspond to a Python script, not a yaml file. To my knowledge, there's no config generator in .yaml in traitlets.
    Example:
# Configuration file for frac-eff-plots.

c = get_config()  #noqa

#------------------------------------------------------------------------------
# Application(SingletonConfigurable) configuration
#------------------------------------------------------------------------------
## This is an application.

## The date format used by logging formatters for %(asctime)s
#  Default: '%Y-%m-%d %H:%M:%S'
# c.Application.log_datefmt = '%Y-%m-%d %H:%M:%S'

## The Logging format template
#  Default: '[%(name)s]%(highlevel)s %(message)s'
# c.Application.log_format = '[%(name)s]%(highlevel)s %(message)s'
  1. To a lesser degree, because the function you cited need an instance, while this shouldn't be necessary, a class should be enough.
  2. It has the same default that I'm still trying to solve w.r.t default values as it doesn't handle dynamic default values or dynamic generators (I solved the first, but not the second issue yet)

For comparison, here is a short example of what this branch create (as opposed to the default traitlet output)/


# Create benchmark plots to show how much survives from a certain cut.
FractionSurvivingTool:

  # [Unicode] Name used to separate the benchmark output
  analysis_name: ''

  # [List(Path)]
  config_files: 

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ccossou
Copy link
Contributor Author

ccossou commented Nov 24, 2023

Discussed today at the ctapipe meeting, the workaround with dynamic default is to detect it's dynamic, then write "it's dynamic, check documentation".

I need to dig into this again because I've mostly forgotten where I was with this.

@ccossou
Copy link
Contributor Author

ccossou commented Nov 24, 2023

For now, I have an issue that I don't know how to solve.

I comment traits when the default values is invalid. Problem is, for EventSource for instance, all traits are invalid, and I end up with an empty Section that fail on read with this error:

ValueError: values whose keys begin with an uppercase char must be Config instances: 'EventSource', None

I tried to parse the string after creating it, but not only is this ugly design, this is also trickier than expected, because you need to check per indentation level, and that means the function need to be recursive to be doable.

But this is also tricky to do when reading the traits, because you need to check ALL traits for that sections before even writing the section, to know if you need to comment the section or not.

I'm open to suggestions on this one.

PS: I already solved the dynamic default apparently, months ago, I didn't even recall doing it.

@maxnoe
Copy link
Member

maxnoe commented Nov 24, 2023

I think something went wrong in your last push...

@maxnoe
Copy link
Member

maxnoe commented Nov 24, 2023

I comment traits when the default values is invalid. Problem is, for EventSource for instance,

While I understand the general issue, I don't understand why it happens for EventSource.
The defaults for max_events and allowed_tels should be perfectly valid.

@ccossou
Copy link
Contributor Author

ccossou commented Nov 28, 2023

@maxnoe What went wrong? What is the issue that you noticed?

@ccossou
Copy link
Contributor Author

ccossou commented Nov 28, 2023

In EventSource:

  • default for max_events is None. While this is allowed, this is not a value accepted in the .yaml (or else I'm missing something)
  • allowed_tels also has a default set to None in EventSource (ctapipe.io.eventsource.py line 104)

But in case, another example is QualityQuery. The only parameter, quality_criteria, has no default, and I end up with an empty Section that crash my code.

Issues are: when default value is undefined, the default config is then non valid and make it difficult to understand what value we're supposed to use.
…mented but not crashing everything in case they're optional.
In practice, it's the first line of its docstring. Formatting is done in a separate function because I also need that for trait help, as it's sometime a multiline string that is very badly rendered otherwise.
…undefined.

Note that get_default_value() was deprecated anyway, and they advise to use .default_value instead. But this is the static value, meaning you'll get more undefined that what really is.
…st of values is encountered (e.g. quality criteria)
isinstance(obj, list) was not working because traitlets.List and derivatives are not considered list type.

config_files trait is also commented because it crashed with the empty list as default.
@maxnoe
Copy link
Member

maxnoe commented Nov 29, 2023

  • default for max_events is None. While this is allowed, this is not a value accepted in the .yaml (or else I'm missing something)
  • allowed_tels also has a default set to None in EventSource (ctapipe.io.eventsource.py line 104)

The yaml representation of None is null or even just missing data:

In [1]: import yaml
In [2]: config = {"EventSource": {"max_events": None, "allowed_tels": None}}
In [3]: print(yaml.dump(config))
EventSource:
  allowed_tels: null
  max_events:

Copy link

codecov bot commented Nov 29, 2023

Codecov Report

Attention: 85 lines in your changes are missing coverage. Please review.

Comparison is base (b175f69) 92.46% compared to head (8642e35) 92.07%.

Files Patch % Lines
ctapipe/core/config_writer.py 16.66% 65 Missing ⚠️
ctapipe/core/tool.py 31.25% 11 Missing ⚠️
ctapipe/core/component.py 41.66% 7 Missing ⚠️
ctapipe/io/metadata.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2378      +/-   ##
==========================================
- Coverage   92.46%   92.07%   -0.39%     
==========================================
  Files         234      235       +1     
  Lines       19868    19980     +112     
==========================================
+ Hits        18370    18397      +27     
- Misses       1498     1583      +85     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maxnoe
Copy link
Member

maxnoe commented Dec 5, 2023

Checking the ruamel.yaml docs again, this is how you can programatically create a commented yaml document:

from ruamel.yaml import YAML, CommentedMap
import sys

yaml = YAML()

doc = CommentedMap()
doc["EventSource"] = CommentedMap()
doc.yaml_set_comment_before_after_key("EventSource", before="Config of EventSources")

doc["EventSource"]["max_events"] = None
doc["EventSource"].yaml_add_eol_comment("Maximum number of events", key="max_events")
doc["EventSource"]["allowed_tels"] = None
doc["EventSource"].yaml_add_eol_comment("Allowed telescopes", key="allowed_tels")

Output:

# Config of EventSources
EventSource:
  max_events:  # Maximum number of events
  allowed_tels: # Allowed telescopes

@ccossou
Copy link
Contributor Author

ccossou commented Dec 5, 2023

Thanks. It took me time to understand that what I needed was CommentedMap, because I couldn't find that information from the documentation alone.

But I've used yaml_set_comment_before_after_key even for the traits, as sometimes the comments are big, and assumed it would be problematic at the end of the line.

I'm still in the debug phase, but even after that, comments are not indented as the key they are attached to, so that will be my next item on the list, once I can get it to run at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants