Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose CEP-4: module structure #2489

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

maxnoe
Copy link
Member

@maxnoe maxnoe commented Jan 10, 2024

Hi all,

I made a first rough draft of CEP-4, to address the module structure. Please comment on the proposed modules.

Copy link

codecov bot commented Jan 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8d61c03) 92.47% compared to head (46430c8) 92.47%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2489   +/-   ##
=======================================
  Coverage   92.47%   92.47%           
=======================================
  Files         234      234           
  Lines       19965    19965           
=======================================
  Hits        18462    18462           
  Misses       1503     1503           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


- ``.image`` is split into ``.image_cleaning`` and ``.image_features``
- ``.calib`` is split into ``.camera_calibration``, ``.data_volume_reduction``, ``.gain_selection``
- ``.image.extractor`` is moved to ``.camera_calibration``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like that image extraction is in camera_calibration: it's not a calibration, but rather a way to reduce noise by integration in the time domain, so it's closer to image cleaning. Calibration should really be limited to things that correct for instrumental differences between simulations and observations. Maybe have a waveform_processing module instead?

Copy link
Contributor

@kosack kosack Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more clear:

  • camera calibration should contain things that relate to:
    • DC to PE conversion
    • flat-fielding
    • pedestals
    • noisy pixel detection
    • pointing correction? → maybe in its own module, as this is not camera, but structure related.
  • waveform processing: generation of images from waveforms
    • ImageExtractors
    • related code for peak detection and pulse characterization in general: time over threshold,FWHM, risetime, falltime, etc
    • gain channel selection (or should that be in camera calibration? It's still basically a waveform algorithm looking for an exceeded threshold)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointing correction? → maybe in its own module, as this is not camera, but structure related.

Definitely it's own module, there won't be any code overlap, so it would again lead to a situation where people only wanting to import the pointing correction module would import all the low-level calibrations resulting in large import times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataVolumeReducers

I added a module specifically for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed that and updated the comment...

- ``.calib`` is split into ``.camera_calibration``, ``.data_volume_reduction``, ``.gain_selection``
- ``.image.extractor`` is moved to ``.camera_calibration``
- ``.image.muon`` is moved to ``.muon``
- ``.containers`` is renamed to ``.data_model``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean core.container.py is moved to its own module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, containers.py as written

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see!

Then I would add I think it is quite confusing that some of the modules are in their own sub-dirs and others are just files hanging out in the root src/ directory.

I suggest we put everything that is supposed to be a proper module into its own subdirectory, leaving only things like version.py and conftest.py in the root. This also makes a natural place for a dedicated tests/ subdirectory for things like atmosphere and containers instead of having a central one collecting test for various, but not all, modules.

- ``.image.muon`` is moved to ``.muon``
- ``.containers`` is renamed to ``.data_model``
- the configuration related (``Tool``, ``Component``, ``.traits``) parts of ``.core`` are moved to ``.config``
- ``.coordinates``, ``.instrument``, ``.visualization``, ``.reconstruction`` remain as is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably say that reco is renamed reconstruction.

.visualization


Looking at it from the current structure, we'd have the following changes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.io and .utlis looks to be missing?

Also, maybe not technically a module, but I wonder what happens to tests/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add .fitting and .exceptions from to the list of top level modules missing from the list

@ccossou
Copy link
Contributor

ccossou commented Feb 7, 2024

I've read the document but my understanding could make my comment not very usefull (take it as what a new developper would see if it looked at the ctapipe structure (old and new). My concern is about the different categories. It's difficult to understand, from the list alone, the hierarchy of the various tools. I know this is not the current plan, but I would have added a layer of sub-module to further categorise the functions based on the datalevel they are supposed to use as input, i.e (waveform functions, image functions, other?). From a quick look at the structure, it's not easy (at least for me) to understand what type of analysis and input data I am expecting in the function inside).

It is fine for me to keep a sub layer, even in core, so as to have as little sub-module in the top level. Tool and Component, in particular, are part of core, so either in a core.config, or in .core.

Does it make sense to rename .processors into just .api or something to clearly identify that inside we'll find the high level functions?

To rephrase my comment, if I take the list of new submodules, I don't really know what to expect in the following submodules
.atmosphere: is it processing related to atmosphere, or classes to define or read an atmosphere file?
.coordinates: I wouldn't mix a submodule about coordinates with another submodule with processing functions about cleaning,...
.data_volume_reduction: this is related to IO to me, and should be, along IO function, in an IO submodule
.gain_selection: what level are we talking about, waveform?
.instrument: if we define instrument type, I would put that into core
.processors: this is a high level API, but Tool are also in the API I think and could be regrouped
.reconstruction: I assume it's machine learning but it's a bit generic, what's the idea behind this category, we put every machine learning code in it? Even if they are all reconstruction models, in practice, they require different data level and are not interchangeable, so at the very least I would split further in that submodule to reflect the hierarchy of the different models.
.tools: the word tools usually mean it's a submodule to regroup random things compared to the rest in general (like utils), while here it's the top level interface and should be visible in a quick glance
.visualization: is it for plot? specific kind of plot, or any plot like visualisation of telescope shape, image, etc...?

My general idea would be to have a very high level organisation for the first level, this would help to immediately visualise what every sub-module of each section will be about. e.g. data_model/muon is a muon datamodel, but io/muon is about reading or writing muons (just a random useless example):
.core
config
.data_model
.io
data_volume_reduction

.api: everything that can be called from the command line)
processors
tools
.steps
dl0: not pushing hard for this list of names...
dl1
dl2a
dl2b
dl2c
...

My view is external to ctapipe still. I haven't worked enough on ctapipe to have a clear understanding of it just by looking at the code structure, and these are the comments that comes to mind when I look at it, judging from other code base I've seen before.

And I wouldn't be afraid of adding a lot of folders, so that each folder can be imported separately as an entity. I would add a folder for each analysis step for instance, and each high level processor so that you can further split into sub files if needed, and not have one gigantic file with multiple things in it.

For instance the pipeline I know is like this:
├── core
├── datamodels
│   ├── datamodels
│   │   ├── metaschema
│   │   ├── schemas
│   │   └── tests
│   └── tests
│   └── schemas
├── lib
├── pipeline
│   ├── detector1.cfg
│   ├── detector1.py
│   ├── init.py
└── steps
├── dark_current
│   └── tests
├── dq_init
│   └── tests
├── gain_scale
│   └── tests
├── group_scale
│   └── tests
├── ipc
├── jump
│   └── tests
├── linearity
│   └── tests
├── ramp_fitting
│   └── tests
│   │   ├── test_ramp_fit_cases.py
    │   ├── test_ramp_fit.py
   │   └── test_ramp_fit_step.py
├── refpix
│   └── tests
├── saturation
│   └── tests
└── superbias
└── tests

the pipeline submodule are the high level interface, they just reuse steps that are defined in the steps submodule. Each step (see the ramp_fitting folder that I did not strip of its internal structure) need a *_step.py that define the main class, the rest is implementation specific and up to the developper. Every abstract class is defined in core (Pipeline, Step, ...)
Every step is self contained in its folder. You only need to import pipeline if you want just the interface.

I'm not saying this is the best view, but I figured I could provide another type of input, in the hope it's usefull.

Best,
Christophe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants