Skip to content

Audits (how to run as needed)

Naomi Dushay edited this page Dec 16, 2022 · 7 revisions

Information about how to run audits

  • Moab to Catalog (M2C) existence/version check
  • Catalog to Moab (C2M) existence/version check
  • Checksum Validation (CV)
  • (TBD) Catalog to Archive / replication audit
  • (TBD) Moab Validation (which is in the moab-versioning gem, but called by PreservationCatalog)

Moab to Catalog (M2C) existence/version check

See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/[Validations-for-Moabs](https://github.com/sul-dlss/preservation_catalog/wiki/Audits-(basic-info)) for basic info about M2C validation.

Rake task for Single Root

  • You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
  • You do NOT need quotes for the root name
  • Checks will be run asynchronously via MoabToCatalogJob
RAILS_ENV=production bundle exec rake prescat:audit:m2c[root_name]

Via Rails Console

In console, first locate a MoabStorageRoot, then call m2c_check! to enqueue asynchronous executions via MoabToCatalogJob. Storage root information is available from settings.yml (shared_configs for deployments).

Single Root

msr = MoabStorageRoot.find_by!(storage_location: '/path/to/storage')
msr.m2c_check!

All Roots

MoabStorageRoot.find_each { |msr| msr.m2c_check! }

Single Druid

To M2C a single druid synchronously, in console:

CatalogUtils.check_existence_for_druid('jj925bx9565')

Druid List

For a predetermined list of druids, a convenience wrapper for the above command is check_existence_for_druid_list.

  • The parameter is the file path of a CSV file listing the druids.
    • The first column of the csv should contain druids, without prefix.
    • File should not contain headers.
CatalogUtils.check_existence_for_druid_list('/file/path/to/your/csv/druid_list.csv')

Note: it should not typically be necessary to serialize a list of druids to CSV. Just iterate over them and use the "Single Druid" approach.

Catalog to Moab (C2M) existence/version check

See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/[Validations-for-Moabs](https://github.com/sul-dlss/preservation_catalog/wiki/Audits-(basic-info)) for basic info about C2M validation.

Rake task for Single Root

  • You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
  • You do NOT need quotes for the root name.
  • You cannot provide a date threshold: it will perform the validation for every MoabRecord prescat has for the root.
  • Checks will be run asynchronously via CatalogToMoabJob
RAILS_ENV=production bundle exec rake prescat:audit:c2m[root_name]

Via Rails Console

In console, first locate a MoabStorageRoot, then call c2m_check! to enqueue asynchronous executions for the MoabRecords associated with that root via CatalogToMoabJob. Storage root information is available from settings.yml (shared_configs for deployments).

  • The (date/timestamp) argument is a threshold: it will run the check on all catalog entries which last had a version check BEFORE the argument. You can use string format like '2018-01-22 22:54:48 UTC' or ActiveRecord Date/Time expressions like 1.week.ago. The default is anything not checked since right now.

Single Root

This enqueues work for all the objects associated with the first MoabStorageRoot in the database, then the last:

MoabStorageRoot.first.c2m_check!
MoabStorageRoot.last.c2m_check!

This enqueues work from a given root not checked in the past 3 days.

msr = MoabStorageRoot.find_by!(storage_location: '/path/to/storage')
msr.c2m_check!(3.days.ago)

All Roots

This enqueues the checks from all roots similarly.

MoabStorageRoot.find_each { |msr| msr.c2m_check!(3.days.ago) }

Checksum Validation (CV)

See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/[Validations-for-Moabs](https://github.com/sul-dlss/preservation_catalog/wiki/Audits-(basic-info)) for basic info about CV validation.

Rake task for Single Root

  • You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
  • You do NOT need quotes for the root name.
  • It will perform checksum validation for every MoabRecord prescat has for the root, ignoring the "only older than fixity_ttl threshold" (which is currently 90 days)
  • Checks will be run asynchronously via ChecksumValidationJob
RAILS_ENV=production bundle exec rake prescat:audit:cv[root_name]

Via Rails Console

In console, first locate a MoabStorageRoot, then call validate_expired_checksums! to enqueue asynchronous executions for the MoabRecords associated with that root via ChecksumValidationJob. Storage root information is available from settings.yml (shared_configs for deployments).

Single Root

From console, this queues objects on the named storage root for asynchronous CV:

msr = MoabStorageRoot.find_by!(name: 'fixture_sr3')
msr.validate_expired_checksums!

All Roots

This is also asynchronous, for all roots:

MoabStorageRoot.find_each { |msr| msr.validate_expired_checksums! }

Single Druid

Synchronously, from Rails console (will take a long time for very large objects):

Audit::ChecksumValidatorUtils.validate_druid(druid)

Druid List

  • Give the file path of the csv as the parameter. The first column of the csv should contain druids, without the prefix, and contain no headers.

Synchronously, from Rails console:

Audit::ChecksumValidatorUtils.validate_list_of_druids('/file/path/to/your/csv/druid_list.csv')

Druids with a particular status on a particular storage root

For example, if you wish to run CV on all the "validity_unknown" druids on storage root 15, from console:

Audit::ChecksumValidatorUtils.validate_status_root(:validity_unknown, 'services-disk15')

Valid status strings

Clone this wiki locally