Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism for re-loading the statistic files in live #4816

Merged

Conversation

GarethCabournDavies
Copy link
Contributor

Statistic files will be changed by a supervisor script in live for future uses. This PR adds the ability to check the sha1 hashes of the file and compare it to a stored copy so that if the file has changed, it can be reloaded.

In this usage, the file in the command line arguments would be overwritten (or a symlink which is updated), and then the statistic would refresh at an appropriate rate.

Standard information about the request

This is a new feature
This change affects the live search; Offline code is touched but not affected
This change changes scientific output

This change follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines

Motivation

We want to be able to use a proper ranking statistic in Live, but do not want to have to use static files, or restart the search regularly in order to update them. This change means that the ranking statistic files can be reloaded

Contents

  • Add functions to check the hashes of all files associated with the statistic object
  • Add threads for each statistic object which will monitor if the files have changed. These sleep until the next refresh time
  • Add functions to each statistic which detects whether the files needed by that statistic have been updated, and runs the appropriate function needed to reload the information from that file.

Links to any issues or associated PRs

Link to draft PR: https://github.com/GarethCabournDavies/pycbc/pull/5/files
#4689, #4527, #4813

Note that this will significantly clash with #4608

Testing performed

  • Started pycbc live - printed some values from the fits files.

  • Updated fits files, checked that the values are updated after this, and that the updated files were logged

  • The author of this pull request confirms they will adhere to the code of conduct

Comment on lines 113 to 132
def files_changed(self):
"""
Compare hashes of files now with the ones we have cached
"""
changed_file_hashes = self.get_file_hashes()
for stat, old_hash in self.file_hashes.items():
if changed_file_hashes[stat] != old_hash:
logger.info(
"%s statistic file %s has changed",
''.join(self.ifos),
stat,
)
else:
logger.debug(
"No %s statistic files have changed",
''.join(self.ifos)
)
return []

return list(changed_file_hashes.keys())
Copy link
Contributor

@maxtrevor maxtrevor Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not seem right to me.

My understanding of the

for:
    ...
else:

code is that the else block will always be executed unless the loop ends by a break command. So this seems like it will always return an empty list?

Additionally, if it somehow avoids executing the else block, it seems like the returned list will always be the full set of files, and not just the files that were actually changed?

Everything else in this MR makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out - for some reason I had it in my head that if you had if statements which were never met, then the else would be implemented. Kind of backwards to what it actually is I suppose.

I have updated to correct functionality with the latest commit

@GarethCabournDavies
Copy link
Contributor Author

I think the failing tests are in a dependency, I'll ask in the code slack channel

Copy link
Contributor

@maxtrevor maxtrevor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving now that Gareth has fixed the faulty code. CI is still blocking merge but I don't think its related to any of the changes made in this merge request.

@maxtrevor
Copy link
Contributor

@GarethCabournDavies Looks like all of the required checks have now passed. Are you ready to merge?

@GarethCabournDavies
Copy link
Contributor Author

Yup, I just hadn't noticed that the latest reattempt had worked!

@GarethCabournDavies GarethCabournDavies merged commit 3d9f7e2 into gwastro:master Jul 23, 2024
32 checks passed
@GarethCabournDavies GarethCabournDavies deleted the stat_object_reload branch July 23, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants