Update rates-based statistics to be modular #4608

GarethCabournDavies · 2024-01-24T12:04:02Z

Overview of changes

The exponential fit statistics are all similar, with simple factors added or subtracted, as discussed in #4594

I have refactored the ExpFitStatistic to be able to use the different features using a --statistic-features option. These then all use exp_fit as the --ranking-statistic option

The available features are:

Feature	Description	Notes
`phasetd`	Apply a factor of how the phase, time and amplitude differences match up to what is expected for signals.
`kde`	Apply a factor according to a kernel density estimate of the ratio of signal and noise distributions for each template.
`dq`	Apply a factor according to any data quality channel information.
`chirp_mass`	Apply a reweighting according to the chirp mass of the template.
`sensitive_volume`	Apply a factor of the log of sensitive volume (compared to a median for the template). This means that the statistic takes into account any changes in the detector sensitivity and that, e.g., we expect to see more events in the HL network than HV coincidences.
`normalize_fit_rate`	Normalize the rates fits by the analysis time. This is needed so that the statistic is comparable over chunks of different lengths.	This was done for all `exp_fit` statistics, but is implemented here explicitly so that the `exp_fit_csnr` statistic can reuse the `lognoiserate` function from the `ExpFitStatistic`

I have also removed the different treatment of triggers with sngl_ranking below threshold; this is now required explicitly as --statistic-keywords alpha_below_threshold:6. Again this was so that the exp_fit_csnr statistic can reuse the lognoiserate function from the ExpFitStatistic

In addition, there are some minor changes as well to fix some of the statistics which didn't work at all on previous master. There is a (just for reference) PR at #4607 to show these.

Testing

I have tested all existing (and working) statistics to check that given the appropriate features, the output remains identical.

The testing is done against the codes in #4607, so that we can test statistics against what they should be, rather than what they are.

Initial tests with a very small fraction of the bank have shown that the SNR-like statistics output files have identical hashes. The exp_fit statistics outputs are all the same to within a numpy.isclose test, i.e. ~1e-6 difference for values O(1). But I will add the results of more stringent testing here.

Statistic	New statistic	Features	Keywords	pycbc_sngls_findtrigs max stat difference	pycbc_coinc_findtrigs max stat difference
quadsum	quadsum			File hash the same	File hash the same
single_ranking_only	single_ranking_only	--	--	File hash the same	File hash the same
phasetd	phasetd	--	--	File hash the same	File hash the same
exp_fit_csnr	exp_fit_csnr	--	--	File hash the same	File hash the same
phasetd_exp_fit_fgbg_norm	exp_fit	`phasetd sensitive_volume normalize_fit_rate`	`alpha_below_thresh:6`	9.5e-7	3.8e-6
phasetd_exp_fit_fgbg_bbh_norm	exp_fit	`phasetd sensitive_volume normalize_fit_rate chirp_mass`	`alpha_below_thresh:6`	1.9e-6	3.8e-6
phasetd_exp_fit_fgbg_kde	exp_fit	`phasetd sensitive_volume normalize_fit_rate kde`	`alpha_below_thresh:6`	1.9e-6	3.8e-6
dq_phasetd_exp_fit_fgbg_norm	exp_fit	`phasetd sensitive_volume normalize_fit_rate dq`	`alpha_below_thresh:6`	9.5e-7	3.8e-6
dq_phasetd_exp_fit_fgbg_kde	exp_fit	`phasetd sensitive_volume normalize_fit_rate dq kde`	`alpha_below_thresh:6`	1.9e-6	3.8e-6

GarethCabournDavies · 2024-01-25T11:33:08Z

pycbc/events/stat.py

+ for feature in opts.statistic_features:
+ if feature not in _allowed_statistic_features:
+ err_msg = f"--statistic-feature {feature} not recognised"
+ raise NotImplementedError(err_msg)


This shouldn't actually happen due to the argparse choices, but safety is best

GarethCabournDavies · 2024-01-25T11:39:51Z

bin/all_sky_search/pycbc_sngls_findtrigs

- **extra_kwargs)
- trigger_times = sds['end_time']
+ stat_t = rank_method.rank_stat_single((ifo, sds))
+ trigger_times = trigs['end_time'][:][trigger_keep_ids]


Some singles objects don't have the end time included

GarethCabournDavies · 2024-01-25T11:40:32Z

pycbc/events/stat.py

@@ -33,9 +33,18 @@
 from .eventmgr_cython import logsignalrateinternals_computepsignalbins
 from .eventmgr_cython import logsignalrateinternals_compute2detrate

+_allowed_statistic_features = [


I'm not sure where is best to describe each feature here to be honest

GarethCabournDavies · 2024-01-25T11:43:52Z

pycbc/events/stat.py

 # Assume best case scenario and use maximum signal rate
 s1 -= 2. * self.hist_max
 s1[s1 < 0] = 0
 return s1 ** 0.5


-class ExpFitStatistic(QuadratureSumStatistic):
+class ExpFitStatistic(PhaseTDStatistic):


Subclassing PhaseTDStatistic here in order to get the phasetd stuff in init

GarethCabournDavies · 2024-02-06T09:16:54Z

The numbers in the comparison table have been updated

I am sure it is not a coincidence that the error in the coincs is double that of the singles, but I think a ~4e-6 difference is not too important given the dynamic range of the statistic.

GarethCabournDavies · 2024-02-06T10:33:56Z

I am adding a description of the statistics to the docs - I am writing it in the pycbc_make_offline_search_workflow documentation at the moment, but that can be moved if requested.

GarethCabournDavies · 2024-02-06T14:48:42Z

Note that I found and fixed a bug in the way that sngl_ranking_ keywords are handled and passed through to the ranking module

GarethCabournDavies · 2024-05-21T08:16:43Z

I thought it best to check the memory usage, and for pycbc_coinc_findtrigs with 1/140 of the bank and the dq_phasetd_exp_fit_fgbg_norm (and modular equivalent) statistic, we see:
NEW:

	User time (seconds): 3586.64
	System time (seconds): 197.45
	Maximum resident set size (kbytes): 432688

OLD:
`` User time (seconds): 4094.07
System time (seconds): 184.80
Maximum resident set size (kbytes): 414844


For the same statistic with `pycbc_sngls_findtrigs` and 1/1400 of the bank, we get:
NEW:

User time (seconds): 74.30
System time (seconds): 13.22
Maximum resident set size (kbytes): 277556


OLD:

User time (seconds): 58.19
System time (seconds): 13.82
Maximum resident set size (kbytes): 279236


The increase in user time seems to be because of a lower % of CPU (27 vs 75)

So basically this looks like it doesn't change anything with regard to performance, as I would expect

maxtrevor · 2024-05-21T17:23:34Z

Noting here that it was suggested on today's call that we should wait to merge this until after creating the new PyCBC Live branch intended for the rest of O4

…ha_below_thresh keyword

… to be quiet

…eclimate to be quiet" This reverts commit 4f082ea.

GarethCabournDavies added the offline search label Jan 24, 2024

GarethCabournDavies self-assigned this Jan 24, 2024

GarethCabournDavies commented Jan 25, 2024

View reviewed changes

GarethCabournDavies mentioned this pull request Jan 29, 2024

REFERENCE: fixes to the some broken statistics #4607

Closed

GarethCabournDavies force-pushed the modular_stat branch from 3b1dd80 to bfa528a Compare February 5, 2024 16:28

GarethCabournDavies requested a review from spxiwh February 6, 2024 10:34

GarethCabournDavies force-pushed the modular_stat branch from 5f7cd92 to 9c9bbee Compare March 11, 2024 13:27

GarethCabournDavies force-pushed the modular_stat branch 2 times, most recently from 2af01a3 to 2d7148a Compare April 18, 2024 13:18

GarethCabournDavies force-pushed the modular_stat branch from 2d7148a to e0214a1 Compare May 20, 2024 10:25

GarethCabournDavies added 15 commits June 28, 2024 05:20

fixes to the snr-like statistics

e2e91fe

Move exp_fit statistics into a modular framework

63c7d96

remove unused statistic

6aeeec7

use keyword:value rather than feature for alpha below

df4ede5

Codeclimate complaints

f172965

use new-style statistic in CI

0afa54d

fix in case teh fit_by_templte is not stored in the fit_over file

8361ab2

remove testing change

66ed84e

fix usage of parse_statistic_feature_options in test

2b604ab

Docstrings for various functions

950849e

Add back in the changes from gwastro#4603

43e23e9

Add description of the statistics to the documentation

5129c6b

fix error if passing keywords which need to be floats, rework the alp…

b8fd602

…ha_below_thresh keyword

Allow sngl_ranking keywords to actually be used

2da730e

CC

7052925

GarethCabournDavies added 14 commits June 28, 2024 05:21

try this

c8af586

maybe

d28f93a

single-word titles

824991e

Fix a bunch of line-too-long errors

c51fcaf

lines-too-long

714977f

These tables are annoying me

2997d5a

CC again

bd7834f

Fix errors in the tables

f40198c

run black on pycbc/events/stat.py

9047322

Start getting recent stat changes into module

c177b11

fixes post-rebase

934a7c1

run black on pycbc/events/ranking in order to try and get codeclimate…

2085dd8

… to be quiet

Revert "run black on pycbc/events/ranking in order to try and get cod…

67ad84b

…eclimate to be quiet" This reverts commit 4f082ea.

minor fixes

33678e4

GarethCabournDavies force-pushed the modular_stat branch from 55fb556 to 33678e4 Compare June 28, 2024 12:45

GarethCabournDavies mentioned this pull request Jul 16, 2024

Add mechanism for re-loading the statistic files in live #4816

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update rates-based statistics to be modular #4608

Update rates-based statistics to be modular #4608

GarethCabournDavies commented Jan 24, 2024 •

edited

Loading

GarethCabournDavies Jan 25, 2024

GarethCabournDavies Jan 25, 2024

GarethCabournDavies Jan 25, 2024

GarethCabournDavies Jan 25, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented May 21, 2024

maxtrevor commented May 21, 2024 •

edited

Loading

Update rates-based statistics to be modular #4608

Are you sure you want to change the base?

Update rates-based statistics to be modular #4608

Conversation

GarethCabournDavies commented Jan 24, 2024 • edited Loading

Overview of changes

Testing

GarethCabournDavies Jan 25, 2024

Choose a reason for hiding this comment

GarethCabournDavies Jan 25, 2024

Choose a reason for hiding this comment

GarethCabournDavies Jan 25, 2024

Choose a reason for hiding this comment

GarethCabournDavies Jan 25, 2024

Choose a reason for hiding this comment

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented May 21, 2024

maxtrevor commented May 21, 2024 • edited Loading

GarethCabournDavies commented Jan 24, 2024 •

edited

Loading

maxtrevor commented May 21, 2024 •

edited

Loading