-
Notifications
You must be signed in to change notification settings - Fork 2
Audits (how to run as needed)
Information about how to run audits
- Moab to Catalog (M2C) existence/version check
- Catalog to Moab (C2M) existence/version check
- Checksum Validation (CV)
- (TBD) Catalog to Archive / replication audit
- (TBD) Moab Validation (which is in the moab-versioning gem, but called by PreservationCatalog)
See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/[Validations-for-Moabs](https://github.com/sul-dlss/preservation_catalog/wiki/Audits-(basic-info)) for basic info about M2C validation.
- You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
- You do NOT need quotes for the root name
- Checks will be run asynchronously via MoabToCatalogJob
RAILS_ENV=production bundle exec rake prescat:audit:m2c[root_name]
In console, first locate a MoabStorageRoot
, then call m2c_check!
to enqueue asynchronous executions via MoabToCatalogJob. Storage root information is available from settings.yml (shared_configs for deployments).
msr = MoabStorageRoot.find_by!(storage_location: '/path/to/storage')
msr.m2c_check!
MoabStorageRoot.find_each { |msr| msr.m2c_check! }
To M2C a single druid synchronously, in console:
CatalogUtils.check_existence_for_druid('jj925bx9565')
For a predetermined list of druids, a convenience wrapper for the above command is check_existence_for_druid_list
.
- The parameter is the file path of a CSV file listing the druids.
- The first column of the csv should contain druids, without prefix.
- File should not contain headers.
CatalogUtils.check_existence_for_druid_list('/file/path/to/your/csv/druid_list.csv')
Note: it should not typically be necessary to serialize a list of druids to CSV. Just iterate over them and use the "Single Druid" approach.
See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/[Validations-for-Moabs](https://github.com/sul-dlss/preservation_catalog/wiki/Audits-(basic-info)) for basic info about C2M validation.
- You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
- You do NOT need quotes for the root name.
- You cannot provide a date threshold: it will perform the validation for every MoabRecord prescat has for the root.
- Checks will be run asynchronously via CatalogToMoabJob
RAILS_ENV=production bundle exec rake prescat:audit:c2m[root_name]
In console, first locate a MoabStorageRoot
, then call c2m_check!
to enqueue asynchronous executions for the MoabRecords associated with that root via CatalogToMoabJob. Storage root information is available from settings.yml (shared_configs for deployments).
- The (date/timestamp) argument is a threshold: it will run the check on all catalog entries which last had a version check BEFORE the argument. You can use string format like '2018-01-22 22:54:48 UTC' or ActiveRecord Date/Time expressions like
1.week.ago
. The default is anything not checked since right now.
This enqueues work for all the objects associated with the first MoabStorageRoot
in the database, then the last:
MoabStorageRoot.first.c2m_check!
MoabStorageRoot.last.c2m_check!
This enqueues work from a given root not checked in the past 3 days.
msr = MoabStorageRoot.find_by!(storage_location: '/path/to/storage')
msr.c2m_check!(3.days.ago)
This enqueues the checks from all roots similarly.
MoabStorageRoot.find_each { |msr| msr.c2m_check!(3.days.ago) }
See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/[Validations-for-Moabs](https://github.com/sul-dlss/preservation_catalog/wiki/Audits-(basic-info)) for basic info about CV validation.
- You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
- You do NOT need quotes for the root name.
- It will perform checksum validation for every MoabRecord prescat has for the root, ignoring the "only older than fixity_ttl threshold" (which is currently 90 days)
- Checks will be run asynchronously via ChecksumValidationJob
RAILS_ENV=production bundle exec rake prescat:audit:cv[root_name]
In console, first locate a MoabStorageRoot
, then call validate_expired_checksums!
to enqueue asynchronous executions for the MoabRecords associated with that root via ChecksumValidationJob. Storage root information is available from settings.yml (shared_configs for deployments).
From console, this queues objects on the named storage root for asynchronous CV:
msr = MoabStorageRoot.find_by!(name: 'fixture_sr3')
msr.validate_expired_checksums!
This is also asynchronous, for all roots:
MoabStorageRoot.find_each { |msr| msr.validate_expired_checksums! }
Synchronously, from Rails console (will take a long time for very large objects):
Audit::ChecksumValidatorUtils.validate_druid(druid)
- Give the file path of the csv as the parameter. The first column of the csv should contain druids, without the prefix, and contain no headers.
Synchronously, from Rails console:
Audit::ChecksumValidatorUtils.validate_list_of_druids('/file/path/to/your/csv/druid_list.csv')
For example, if you wish to run CV on all the "validity_unknown" druids on storage root 15, from console:
Audit::ChecksumValidatorUtils.validate_status_root(:validity_unknown, 'services-disk15')
- Replication errors
- Validate moab step fails during preservationIngestWF
- ZipmakerJob failures
- Moab Audit Failures
- Ceph Errors
- Job queues
- Deposit bag was missing
- ActiveRecord and Replication intro
- 2018 Work Cycle Documentation
- Fixing a stuck Moab
- Adding a new cloud provider
- Audits (how to run as needed)
- Extracting segmented zipfiles
- AWS credentials, S3 configuration
- Zip Creation
- Storage Migration Additional Information
- Useful ActiveRecord queries
- IO against Ceph backed preservation storage is hanging indefinitely (steps to address IO problems, and follow on cleanup)