MARC parsing into Bookworm

This repo should build metadata for a HathiTrust bookworm from original MARC records. It builds off of the pymarc module to pull metadata likely to be useful in analysis. This includes:

Dates
Author information
Classification information
Titles
Language information
Holding library information
Scanner information.

How the code is organized.

The heart of the module is the new BRecord class in bookwormMARC/bookwormMARC.py. This extends the existing pymarc.Record class with several new methods analogous to the original Record.title() that just parses out the first title field.

Certain MARC fields, like 100 (creator) have so much useful information that rather than overload the main class, I've created a few new classes (Author for field 100; F008 for field 008) that can return a little dictionary suitable for tacking onto the main record.

At a first pass, I'm opting to create methods for the first author, date, etc.; although MARC allows for multiple fields in all of these, Bookworm queries frequently make more sense with a single location. This will likely require some refactoring later.

Hathi-MARC vs MARC in general

There are a number of particularities around Hathi MARC records that I (Ben) don't yet fully understand. In particular, this relies a lot on the 974 field.

The files in bookwormMARC should be generalizable classes for reading in MARC records, although they may have hathi-specific methods. I want this code to be portable to at least the Medical Heritage library.

Done right, this could be a quick and easy way to take any library catalog and build a Bookworm around at least the metadata. That would be nice to have.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
bookwormMARC		bookwormMARC
vocabularies		vocabularies
.gitignore		.gitignore
Bookworm-Hathi-MARC_demo.ipynb		Bookworm-Hathi-MARC_demo.ipynb
Cleanup publishers.ipynb		Cleanup publishers.ipynb
Hathi Catalog Builder.ipynb		Hathi Catalog Builder.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARC parsing into Bookworm

How the code is organized.

Hathi-MARC vs MARC in general

About

Releases

Packages

Contributors 2

Languages

License

Bookworm-project/Bookworm-MARC

Folders and files

Latest commit

History

Repository files navigation

MARC parsing into Bookworm

How the code is organized.

Hathi-MARC vs MARC in general

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages