-
Notifications
You must be signed in to change notification settings - Fork 3
Data requirements
Cory Lown edited this page Sep 3, 2024
·
14 revisions
POD accepts data in several different formats for MARC full dumps/deltas and deletes.
- All data MUST be syntactically valid.
- All data provided by data contributors is understood to comply with the POD Data Provider & Usage Framework.
- Data uploads are packaged as files, and placed in streams. See Streams and their files for more details on the interpretation of files in streams.
- We recommend contributing files that contain no more than 200,000 records per file for more efficient processing by POD.
- All MARC records MUST contain a
001
field as with a unique record identifier. The record identifier MUST be unique within an institution. The number should be your ILS's system number for the record and not an OCLC number. - MARC data MAY contain non-standard fields or subfield codes. These fields or subfields MAY be removed or normalized for downstream use.
Format | Notes |
---|---|
MARC21 binary | Records SHOULD use the UTF-8 character set Records longer than the 99,999 byte limit MAY be split into multiple MARC records as long as they are physically adjacent in the file and use the have the same MARC 001 value |
MARC21 binary; gzipped | See MARC21 binary |
MARC21 binary; chunked | Multiple files MAY be concatenated together into a single file or uploaded as separate files |
MARCXML | The file MUST be valid XML and use the MARC21 XML namespace (http://www.loc.gov/MARC21/slim ).The file MUST start with an XML declaration (e.g. <?xml version="1.0" ?> ) |
MARCXML; gzipped | See MARCXML |
MARCXML; chunked | See MARCXML |
Error message | Description |
---|---|
Records count is 0 | If you provided a MARCXML file, check that the document declares and uses the MARCXML namespace (http://www.loc.gov/MARC21/slim) |
XML parsing error: XML declaration allowed only at the start of the document | Some systems export "MARCXML" as concatenated XML files. Ensure your XML file is valid. |
XML parsing error: Unescaped '<' not allowed in attributes values | Some systems fail to perform XML encoding on tags or subfield codes. Ensure your XML file is valid. |
XML parsing error: Input is not proper UTF-8, indicate encoding !Bytes: 0xA0 0x4D 0x75 0x73 | This is likely caused by non-UTF8 data appearing in MARC21 records (that claim to use UTF-8, even.). Correct any character encoding issues present in the file. |
MARC::DataField objects can't have ControlField tag '000') | MARC fields 000 - 009 MUST be control fields, and MARC fields 010 - 999 MUST be data fields |
unacceptable file format | File cannot be identified as MARCXML, MARC21, or a delete. Ensure the files you upload conform to the data specifications. |
The Data Lake also supports the ability to upload a "delete" file for a stream. This delete file will specify MARC records that have been deleted.
Format | Notes |
---|---|
text/plain |
A new-line delimited text file, uploaded via the application user interface with a file name ending in .del.txt , .del , .delete OR uploaded via the API (e.g. using curl) with the text/plain mime type (e.g. curl -F 'upload[files][]=@20201209-deletes.del;type=text/plain' ) Each line should consist of a marc001 identifier that should be deleted. POD does not support compressed text/plain deletes. |
MARC21 binary (application/marc ) |
File with .mrc file extension. Record specified as deleted by using d in position 05 in the MARC LeaderAt a minimum, each record should also contain a MARC 001 field identifying the record to delete Deletes sent as MARC21 binary records may be included in a file with added and updated records See other MARC data notes above |
MARCXML (application/marcxml+xml ) |
File with .xml file extension. Record specified as deleted by using d in position 05 in the MARC LeaderAt a minimum, each record should also contain a MARC 001 field identifying the record to delete Deletes sent as MARC XML records may be included in a file with added and updated records See other MARC data notes above |