Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A time clustering algorithm for image cleaning #2401

Open
wants to merge 81 commits into
base: main
Choose a base branch
from

Conversation

clara-escanuela
Copy link
Contributor

No description provided.


X = np.column_stack((time[precut_mask] / t_scale, pix_x, pix_y))

db = DBSCAN(eps=eps, min_samples=minpts).fit(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly shorter: labels = DBSCAN(...).fit_predict(X).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also use sample_weights=image? Otherwise you are using image only for the initial cut

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried sample_weight before. I used this weighting:
$\frac{1}{1+exp\left(-\frac{image+s}{d}\right)}$
It introduces two new parameters but I can add weights as an option

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply weight with the image itself? I.e. cluster the photo electrons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this and I think we could introduce a fourth dimension in the cluster, which is 1/SNR, SNR is the signal-to-noise ratio, image/pedestal. Using just image does not work as the algorithm finds clusters with similar data points but if you plot 1/image, we find clusters:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem to make any difference though

@maxnoe
Copy link
Member

maxnoe commented Sep 22, 2023

Docs failure is fixed in main, please rebase

@kosack
Copy link
Contributor

kosack commented Sep 26, 2023

Not sure if it's interesting, but they just introduced an improved DBSCAN algorithm in scikit-learn

n_noise=self.noise_cut.tel[tel_id],
d_scale=self.d_scale.tel[tel_id],
t_scale=self.t_scale.tel[tel_id],
pedestal=self.pedestal.tel[tel_id],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you should really use the correct pedestal, not some hard-coded value. There is for example

ped = event.mon.tel[7].calibration.pedestal_per_sample

That is per channel, so you need to use the high/low gain switch mask. We probably should properly process simulated pedestals into DL1 information though, so you don't have to do this.

Copy link
Contributor Author

@clara-escanuela clara-escanuela Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should change the name but what I need is the variance of the reconstructed noise per pixel in units of pe. Is that available?

Copy link
Contributor

@kosack kosack Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i realized that just after posting... the pedestals themselves are not what you need. In the simulations currently we don't have the variances, but this is something we really need to fix. Either we should properly compute pedestals and not rely on the ones given in the sims (which are technically inputs, not outputs), or see if somewhere the variances are already included or ask for them to be included. That's not just or this PR, but for any "realistic" cleaning method, e.g. the Whipple-10m-style cleaning that is just tailcuts but with the thresholds expressed in pedestal variance units.

This should probably be a separate issue and PR. For now, what you can maybe do is just don't have these pedestal dispersions hard-coded in the class, but rather make them a TelescopeParameter that is configurable. That way we could adapt to multiple prods without having to edit the code. Once we have a "correct" way to read the pedestal dispersions for sims, we can then properly fix it.

Copy link
Contributor

@kosack kosack Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should definitely change the name, since "pedestal" is the mean value of the noise, not the dispersion. Here, you are assuming it's Gaussian, right (even though it's not)? And you are using it as a standard deviation, not variance, so maybe pedestal_std

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also occurs to me that this threshold could also just be a threshold in PE for now (like the tailcuts), which would work fine for simulations, where the pedestal variances don't vary across the FOV or in time. Then later we upgrade it to use pedestal variances once we have those properly implemented for simulations.

ctapipe/image/cleaning.py Outdated Show resolved Hide resolved
ctapipe/image/cleaning.py Outdated Show resolved Hide resolved
@clara-escanuela
Copy link
Contributor Author

This is the basic implementation of the method. I think we could finish it soon and then if we need to add some more details, which is possible, we could do so. I am just wondering about unit tests

Copy link

codecov bot commented Jan 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@18832ea). Click here to learn what that means.

❗ Current head 1f94003 differs from pull request most recent head afc1704. Consider uploading reports for the commit afc1704 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2401   +/-   ##
=======================================
  Coverage        ?   92.48%           
=======================================
  Files           ?      234           
  Lines           ?    20005           
  Branches        ?        0           
=======================================
  Hits            ?    18502           
  Misses          ?     1503           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kosack
kosack previously approved these changes Jan 18, 2024
ctapipe/image/cleaning.py Outdated Show resolved Hide resolved
ctapipe/image/cleaning.py Outdated Show resolved Hide resolved
ctapipe/image/cleaning.py Show resolved Hide resolved
ctapipe/image/cleaning.py Show resolved Hide resolved
ctapipe/image/cleaning.py Outdated Show resolved Hide resolved
"""

space_scale_m = FloatTelescopeParameter(
default_value=0.25, help="Pixel space scaling parameter in m"
Copy link
Member

@maxnoe maxnoe Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we determine the scale parameters from the information in CameraDescription?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. That parameter needs optimization, it could 2x or 3x the pixel spacing of the camera and teh optimized number depends on the telescope

@Tobychev
Copy link
Contributor

@kosack @maxnoe This PR seems to have picked up quite a few unrelated changes, presumably from an attempt to rebase on main, what is the preferred way to deal with this?

@maxnoe
Copy link
Member

maxnoe commented Feb 22, 2024

I am not sure what happened here.

@clara-escanuela What git commands did you run?

I see issues like this one from time to time but never could figure out what mistake leads to these.

The proper way of updating a branch is (assuming upstream points to cta-observatory/ctapipe and origin points to clara-escanuela/ctapipe)

$ git fetch upstream # get latest changes from main repo
$ git rebase upstream/main
# fix any conflicts that might occure by editing and then running git add / git rebase --continue

I think could locally salvage this branch here by cherry-picking the commits that seem to be originally from @clara-escanuela on this branch after resetting the branch hard to upstream/main.

$ git reset --hard upstream/main
$ git cherry-pick 37e5ea864 55cf8bafb 7e8daf8d4 895ff9774 a9f81e99d 357d02b51 722dfcf17 3c799aaee b9ab65e67 afc170480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants