Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool to generate a HDF table of sampling interval coefficients #690

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

SeiyaNozaki
Copy link
Collaborator

@SeiyaNozaki SeiyaNozaki commented Apr 12, 2021

This PR is a continuation of #255.
I added a new tool and class to create sampling interval coefficients tables, which is used for TimeSamplingCorrection.

This tool has three stages:

  • count pulse peak on each capacitor

    • we usually use 1 million events, so better to submit a subrun-wise job to save time. It seems some modules have some problems with this method (almost synchronization between test pulse and the first capacitor id), but this trouble appears randomly for all of the modules. So we take 2-3 runs for a single gain to generate sampling interval coefficients.
  • stack single fits file, convert into sampling interval coefficients, calculate charge resolution with new coefficients for the verification

    • stack multiple fits files generated by subrun-wise jobs and convert the stucked peak counts into sampling interval coefficients for each run. Then, charge resolutions are calculated with new coefficients for the verification. If a given module has trouble in a specific run, charge resolution would be not good than expected. So the final coefficient values would be what achieved the best resolutions. The final coefficients and used run for the generation of coefficients on each module are saved for each gain.
  • merge coefficients of both gains into a single HDF table

The SamplingIntervalCalculate class to create a sampling interval coefficient table for LST readout system using chip DRS4.
"""
def __init__(self):
self.peak_count = np.zeros([N_PIXELS, N_CAPACITORS_CHANNEL])
Copy link
Member

@maxnoe maxnoe Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zeros creates a float64 array by default, counts should be integers, right? So specify dtype=np.int64 or another appropriate int dtype.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I set the proper data type (uint16 is enough).

"""
def __init__(self):
self.peak_count = np.zeros([N_PIXELS, N_CAPACITORS_CHANNEL])
self.fc_count = np.zeros([N_PIXELS, N_CAPACITORS_CHANNEL])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

self.charge_reso_array_after_corr ={}

self.charge_reso_final = np.zeros(N_PIXELS)
self.used_run = np.zeros(N_PIXELS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run is also integer

self.fc_count[np.arange(N_PIXELS), fc % N_CAPACITORS_CHANNEL] += 1


def stuck_single_sampling_interval(self, file_list, gain):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does stuck mean here? Are you sure that's the right word?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, it is a typo... it should be 'stack'


if gain_in_file == gain:
if not run_id in self.peak_count_stuck.keys():
self.peak_count_stuck[run_id] = np.zeros([N_PIXELS, N_CAPACITORS_CHANNEL])
Copy link
Member

@maxnoe maxnoe Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype?

if gain_in_file == gain:
if not run_id in self.peak_count_stuck.keys():
self.peak_count_stuck[run_id] = np.zeros([N_PIXELS, N_CAPACITORS_CHANNEL])
self.fc_count_stuck[run_id] = np.zeros([N_PIXELS, N_CAPACITORS_CHANNEL])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype?

@codecov
Copy link

codecov bot commented Apr 12, 2021

Codecov Report

Merging #690 (51970ce) into master (640eb41) will increase coverage by 0.69%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #690      +/-   ##
==========================================
+ Coverage   84.35%   85.05%   +0.69%     
==========================================
  Files          61       61              
  Lines        5004     5285     +281     
==========================================
+ Hits         4221     4495     +274     
- Misses        783      790       +7     
Impacted Files Coverage Δ
lstchain/version.py 43.93% <0.00%> (-1.52%) ⬇️
lstchain/irf/tests/test_hdus.py 100.00% <0.00%> (ø)
lstchain/reco/r0_to_dl1.py 94.21% <0.00%> (+0.73%) ⬆️
lstchain/irf/hdu_table.py 92.14% <0.00%> (+2.28%) ⬆️
lstchain/tools/lstchain_create_irf_files.py 92.65% <0.00%> (+3.94%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 640eb41...51970ce. Read the comment docs.

self.sampling_interval_coefficient[run_id] = np.zeros([N_PIXELS, N_CAPACITORS_CHANNEL + N_SAMPLES])

for pixel in range(N_PIXELS):
self.sampling_interval_coefficient[run_id][pixel, :N_CAPACITORS_CHANNEL] = \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use implicit line continuations inside parentheses instead of these line breaks with \.

Also, it looks like this loop is not necessary but could be avoided by using the axis keyword for the numpy functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the comment. done.
(Concerning the loop, please let me know if there is a more straightforward way...!)

r1_sample_end = r0_r1_calibrator.r1_sample_end.tel[tel_id]

# Check pulse
pulse_event_flag = np.sum(np.max(waveform[gain], axis=1) > 100) > 1800
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more context and probably the unnamed constants here should be transformed into either global constants or configurable traitlets.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added contexts, and set global constants for them



def set_charge_array(self, gain):
self.charge_array_before_corr = np.zeros([N_PIXELS, 60000])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 60000?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified related codes. If max_events is given, this given value will be used to generate a new array.
If not (max_events is None by default), just use a large enough number (55000) to cover all events in a single subrun.
(A single subrun usually contains ~53000 events.)
I'm sure it is not the best way... so I'm very happy if you have any good idea...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually get the number of events by doing len(source.multi_file), I will also add __len__ to the LSTEventSource, but that will require a new release.

A more flexible solution would be to use a python list and append to it, converting to a 2d numpy array at the end.

self.charge_reso_array_before_corr = np.zeros(N_PIXELS)

for run_id in self.peak_count_stuck.keys():
self.charge_array_after_corr[run_id] = np.zeros([N_PIXELS, 60000])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above


__all__ = ['SamplingIntervalCalculate']

N_PIXELS = 1855
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can import these from ctapipe_io_lst.constants.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

self.log.debug('start peak counting')

for i, event in enumerate(self.eventsource):
if event.index.event_id % 500 == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use tqdm for a proper progress bar instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I didn't understand tqdm before:) Now tqdm is used during even loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this line is not needed anymore

from lstchain.calib.camera.sampling_interval_coefficient_calculate import SamplingIntervalCalculate
from lstchain.paths import run_info_from_filename

class SamplingIntervalCoefficientHDFWriter(Tool):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems just to be three tools in one that don't really share code and even use different traitlets. I think really doing three smaller tools would be better and less confusing (since the help will only show the options for each step, not for all steps).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Actually, I just wanted to gather all of the procedures in a single file. Is it possible to put three smaller tools in a single file and call one of the tools when executing the command...? or one file per one tool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do multiple Tools per file and have different entrypoints, but I would just make three files

lstchain_create_sampling_interval_coefficient_file --help
"""

name = "SamplingIntervalCoefficientHDFWriter"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Seiya,
if I understand correctly you are writing fits files, instead of a HDF5 file.
Would be it possible to write indeed a h5 file? (we decided to prefer this format)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget my comment, I see that the final file is in h5 and also that you put together the gains at the ends.
Perhaps a bit misleading the fact that you have single tool for the three steps

@SeiyaNozaki
Copy link
Collaborator Author

Thank you for the comments, @maxnoe and @FrancaCassol !
I refactored the codes to make them simple and easy to understand.
Now there are two steps (lstchain_create_peak_count_file_for_sampling_interval_calib and lstchain_create_sampling_interval_coefficient_file): the first one is to count the pulse peak on each capacitor and the second is to stack, convert and verify the sampling interval coefficients.
The summary of the procedure is below:
image

I also added a function to produce a result pdf file like below:
sampling_interval_coefficient_results_HG.pdf

It would be very appreciated to have a look again.

@SeiyaNozaki
Copy link
Collaborator Author

Hi @FrancaCassol @maxnoe,
Sorry for the delay, but came back again.
I'm happy if you can check the updated codes again when you have time!

def plot_results(self, verify_data_path, output_file):

plt.rcParams['font.size']=15
camera = CameraGeometry.from_name('LSTCam-002')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an outdated camera (we have now version 4).
Better to use:

from ctapipe_io_lst import load_camera_geometry
camera = load_camera_geometry()
camera = camera.transform_to(EngineeringCameraFrame())

input_fits = traits.Path(
help="Path to the generated peak count fits files",
input_peak_count_fits = traits.Path(
help="Path to the generated peak count fits files by lstchain_create_peak_count_file_for_sampling_interval_calib",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SeiyaNozaki ,

I do not see this tool in your PR, am I wrong?

self.sampling_interval_coefficient_calculate.convert_to_samp_interval_coefficient()
self.sampling_interval_coefficient_calculate.set_charge_array()

for i, event in enumerate(tqdm(
Copy link
Member

@maxnoe maxnoe Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for enumerate, the event already has event.count

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants