These are tools developed by the Metadata Librarian at Florida State University, University Libraries. They are intended to ease the creation of metadata for Florida State University's DigiNole. All are released with MIT licenses.
This repo is under frequent development. It's probably best to think of this as a training space and testing ground for ideas, or the scripts might be one-off solutions to problems. Some ideas might be repackaged as stand-alone utilities:
python3
Appends the children of root in an XML snippet to a targeted MODSXML file.
python3
Identical to addFragment.
python3
Updates Research Repository records with subject terms stored as keywords with controlled subject terms and URIs from the id.loc.gov interface.
python3
Add PIDs to objects loaded via the Islandora ZIP loader which does not append PIDs to records at load.
- clean-up - python3 utility for managing some XML namespace prefix issues
- hpLocationFragment -
mods:physicalLocation
XML snippet - lc_vacab - python3 utilities for querying and parsing id.loc.gov for controlled subject terms and URIs, appending the results to a MODSXML document, and writing out the modified file
- oaiDefaultNamespace -
sed
pattern for adding default OAI-PMH namespace to OAI-PMH harvested XML - pymods depreciated - python3 utility bundle for working with MODSXML
- SpCLocationFragment -
mods:physicalLocation
XML snippet
python3
in process Utility to parse MODS files out into appropriate object directory structure for ingest packaging.
python3
csv4cat is a simple python3 script for generating the spreadsheets shared with Cataloging for the creation of MARC records. An additional function creates collection content CSVs for digital series in Archon.
python3
Queries mods:subject @authority='lcsh'
values against id.loc.gov for linked data attributes and updates the records. Can parse out complex subject terms using LCSH double-hyphen delimination and bundle terms in appropriate mods subject types.
python3
Renames IID.xml to PID.xml
python3
Creates METS documents for newspaper objects and packages them with TIFFs and MODS for ingest into DigiNole.
python3
python3
QA tool for identifying objects by PID without .//mods:physicalLocation
elements. Takes local OAI-PMH document as input.
python3
QA tool for identifying objects by PID without .//mods:physicalLocation
elements. Queries DigiNole's OAI-PMH feed.
shell script & python
Plow is a metadata reporting tool utilizing the digital library's OAI-PMH feed for harvest. Basic analysis of the DL's description is reported in an CSV. Plow and plowReport make use of two tools developed by Mark Phillips:
- pyoaiharvester
- metadata_breakers.
Both are available at his Github account.
The utility xmlstarlet is also required.
python3
Preservation tool for DigiNole ETDs & faculty publications. Harvests all necessary datastreams from DigiNole for packaging for preservation ingests.
python3
Downloads local copies of MODS datastreams for editing.
See separate README.
python2
Copy of Mark Phillips' pyoaiharvester.
python3
Rename files by PID.xml.
Sysnum is a basic shell script that generates an Aleph-friendly string of system numbers for easy record export from the catalog.
The XSLT folder contains eXtensible Stylesheets for transforming XML as it moves through various metadata workflows.
- alephtoMODS - modified version of LC's MARCXML to MODS transformation
- assets - files needed other XSLT documents
- inc - files used by LC's transformations
- langEdit - find and
mods:language @type="term"
for matchingmods:languageTerm @type="code"
within MODS document - LCtransformations - XSLT published by the Library of Congress
- multiOutAleph - split out individual MODS files from MODScollection document generated from MARCXML process
- multiOutMODSOAI - split out individual MODS files from OAI-PMH harvest
- multiOutORefinePIDS - split out individual MODS files from MODScollection document generated from OpenRefine process. Name files by PID rather than IID.
- multiOutORefine-splitOnFilename - split out individual MODS files from MODScollection document generated from OpenRefine process. Name files to match source object filename rather than IID.
- multiOutORefine - split out individual MODS files from MODScollection document generated from OpenRefine process
- omekaDC-RDF -
- stripEmptyElements01 - clean up empty elements and attributes
- stripEmptyElements02 - clean up remaining empty elements
- stripFilenameIdentifier -
- stripLocationFromPURLs -
- stripQuotes -