Skip to content
/ dc2Solr Public

Utility to harvest records from Digital Commons via OAI-PMH and index them in Solr

Notifications You must be signed in to change notification settings

WSULib/dc2Solr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dc2Solr

Utility to harvest records from Digital Commons, via OAI-PMH, and index them in Apache Solr.

Instructions for Use:

  1. Clone dc2Solr repository: git clone http://github.com/WSULib/dc2Solr
  2. Set system specific variables in dc2Solr.py
    • baseURL = Location of Solr core for indexing records
    • baseOAI = Digital Commons URL + "do/oai/?" suffix (e.g. http://digitalcommons.wayne.edu/do/oai/?)
    • saxonLocation = This utility uses the Saxon Java command line program to perform XSL transformations, which can be downloaded here. This variable must point to the location of the Saxon jar file (likely "Saxon9he.jar")
  3. Configure Solr - an rough example schema is located in the /SolrConfig directory, this can surely be optimized for faceting and memory consumption.
  4. Change permissions on directories "setsXML" and "solrXML" such that python and Saxon can download and write to them.
  5. Finally, run "python dc2Solr.py" with the desired actions to perform:
    • download = Download all OAI sets from Digital Commons
    • transform = Transforms OAI XML to Solr ready XML via the XSLT stylesheet "dc2solr.xsl"
    • index = Indexes all Solr ready XML documents in "solrXML" into Solr
    • all = Performs all three actions, in order. This can be used to run this utility as a fully automated cron job.

Wayne State University Libraries, 2013

About

Utility to harvest records from Digital Commons via OAI-PMH and index them in Solr

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published