Releases: UNC-Libraries/MARC-record-set-wrangler
Releases · UNC-Libraries/MARC-record-set-wrangler
ruby 3 compatibility
Travis uses ruby 2.5
Merge pull request #23 from UNC-Libraries/fix-travis Travis uses ruby 2.5
Bump rake to 12.3.3
Addresses CVE-2020-8130
Handle comparing non-utf8 strings/fields
- Fix crash when detecting changes to a field whose contents are not valid utf-8. Now when the normalization routine encounters a field that is not valid utf-8, it tries to transcode the contents from marc-8 to utf-8 and continue with normalization. If that also fails, it skips normalization and uses the un-normalized string for comparison.
Add subcollections
- Incoming records can be grouped into subcollections based on whether specified fields match the subcollection's pattern.
- Subcollections can have id affix's that are added to the institution/workflow/collection affix chain
- Incoming files are allowed to have duplicate ids when records are in separate subcollections
- Subcollections can define parameters (e.g. "provider_param: SPIE"). Specs adding fields can reference those parameters (e.g. "value: 'Content provider: provider_param.'). Fields are created for each record using the paramter values of the record's subcollection.
Performance improvements
- Stops writing each existing record to an individual mrc file on disk. Instead caches each record's file and start/stop byte offsets in that file. This allows quick retrieval without all of the disk writing and still without holding everything in memory. This seems to speed up processing by about ~30%.
- Hashes marc.to_s for each record early on. When comparing incoming/existing records, first compares the hash values. Only when the hash values differ to we need to do more elaborate comparisons (e.g. omitting fields from comparison per specs, normalization etc.) to determine whether the record has changed. This might speed up processing of a set by another ~30%.
- Allows for conditionally adding MARC fields with parameters, using MarcEdit#add_conditional_field_with_parameters. See lines in config.yaml for example spec. The example adds a 590 to records when a 773 is found containing certain values. Further, the spec maps the 773 value into a parameter to include in the 590.
- Fixes gh-7. When dupe records exist in the existing set, they are no longer reported as being in the incoming set.
Process WCM holdings, plus
- Processes holdings information output by OCLC WorldShare Collection Manager in 996 field. Processes "fulltext" 996s only. Ignores 996s for other formats
- Significant restructuring of code begun to support eventually refactoring for greater testability
- Some changes made to UNC/example config -- ignores 6XXs where vocabulary is not specified (i2 = 4) or where i2 = 7 and $2 is not one of the vocabularies we care about and retain locally