National Transportation Library (NTL). Bureau of Transportation Statistics (BTS), U.S. Department of Transportation (USDOT). ROR ID: https://ror.org/00snbrd52
2024-08-02
Archive Link: https://github.com/ptvrdy/doi-parser
A. General Information
B. Sharing/Access & Policies Information
C. Data and Related Files Overview
D. Software Information
E. File Specific Information
F. Update Log
Title of Program: DOI Parser Version 2.0 for DataCite
Description of the Program: This program takes CSV metadata files and transforms them into DataCite Schema JSON files based on the CSV headings. This version of the program is compliant with the DataCite Schema Version 4.5. The purpose of the program is convert existing metadata to the DataCite schema and post to the DataCite API to update/create DOI metadata. The program then returns the API Response and generates a CSV of the DOIs updated/created and that item's title. This program aims to streamline DOI metadata management by converting metadata to the DataCite schema, posting to their API, and returning relevant response data.
Special Features of This Program:
- Maps CSV headings to DataCite Schema, crosswalking the metadata to the DataCite schema
- For organizations that are contributors, creators, or publishers, searches the ROR API to retrieve ROR Display Names and ROR IDs
- If the organization has a ROR ID, the program asks the user to confirm the match provided by the ROR API. If the API is down or cannot confirm a match, the user can manually add ROR information. If there is no ROR ID for an organization, the user can skip ROR input
- The program then takes the confirmed ROR information or the user input of ROR information and appends it to the organization creator, contributor, or publisher as a 'nameIdentifier' or 'publisherIdentifier'
- The program then saves confirmed matches, either user input or ROR matches, to a CSV called 'confirmed_matched_ror.csv' so that users will not have to match the same metadata/inputs between sessions
- Once all organizations have been matched to their ROR IDs or skipped, the program displays the first data row of the CSV converted to DataCite JSON schema so the user can validate that the information provided is correct. The user has the opportunity to continue or abort.
- If the user would like to continue, the program then asks if they would like to post/put to the DataCite API. The user has the opportunity to continue or abort.
- If the user decides to post/put to the DataCite API, each JSON object is posted to the DataCite API. The program then lets the user know the API response.
- If the API response is 201, the API response is logged in the a .log file named after the input CSV.
- The DOIs and the titles of each item submitted to and returned by the API is then recorded in a CSV named after the input CSV + doi_results. This makes for easy retrieval of draft/reserved DOIs.
- The program finishes and prints "Done!"
Dataset Archive Link: https://github.com/ptvrdy/doi-parser
DataCite Schema version: https://schema.datacite.org/meta/kernel-4.5
Authorship Information:
Co-Author Contact Information
Name: Peyton Tvrdy (0000-0002-9720-4725)
Institution: National Transportation Library (ROR ID: https://ror.org/00snbrd52)
Email: peyton.tvrdy.ctr@dot.gov
Co-Author Contact Information
Name: Joseph Lambeth
Email: josephwlambeth@gmail.com
Recommended citation for the data:
Tvrdy, Peyton and Joseph Lambeth. (2024). DOI Parser Version 2.0 for DataCite. https://github.com/ptvrdy/doi-parser
Licenses/restrictions placed on the data: https://creativecommons.org/licenses/by/4.0
File List for doi-parser
- Filename:
config.txt
Short Description: This folder contains the authentication information for this program to use the DataCite API. Please put in your own authentication information in the format of "Basic YourIdentificationHere" for this program to work.
- Filename:
confirmed_matches_ror.csv
Short Description: This file contains the ROR ID matches you have made using this program.
- Filename:
confirmed_matches.py
Short Description: This python file saves and loads confirmed_matches_ror.csv to load/save ROR data.
- Filename:
constants.py
Short Description: This file contains the constant values needed for the program to run properly, including ISO-639 Language codes, the ROR API link, frequently used ROR IDs, NTL collection DOIs, NTL series DOIs, and a mapping of NTL resource type values to DataCite resource types.
- Filename:
doi_parser.py
Short Description: This is the main python file that loads the CSV, conducts the transformation, and posts the DataCite API.
- Filename:
LICENSE
Short Description: This is the license file.
- Filename:
post_processes.py
Short Description: This python file has all the functions used on the CSV data for processing. To implement this program at your institution, this file will need to be heavily edited to change the input CSV headings and change some constant values. Please contact the author if you would like assistance in mapping DataCite headings to your institution's metadata.
- Filename:
README.md
Short Description: This file is the README file you are reading now. It contains helpful background information about the program its function.
- Filename:
requirements.txt
Short Description: This file contains the python libraries that are required for this program. You can install these libraries on your own or use the pip command found in Software Information.
- Filename:
utils.py
Short Description: This file contains functions for searching the ROR API and manually confirming ROR information. It also deletes unnecessary columns from the input CSV. The functions and column names will need to be adjusted for your institution. Additionally, it includes a function to determine whether the program will reserve/draft DOIs using the 'POST' method or update them to the 'findable' state using the 'PUT' method.
Instrument or software-specific information needed to interpret the data: This software is best run through command prompt. It is best edited with Visual Studio Code. Microsoft Excel was used to create the CSV files. To run this software, open the command prompt and navigate to the folder that contains this program. Then, type the following command:
python doi_parser.py
+ CSV file
Example:
python doi_parser.py CSV_1_20240101.csv
Required Python Libraries: For this software to work correctly, please install the python libraries of 'colorama' and 'requests.' To install these automatically, please run the following command in command prompt. Ensure you have pip already installed.
pip install -r requirements.txt
-
constants.py
This file contains information that is relevant to my organization, NTL. This dictionary should be changed with values that are relevant to your institution. -
post-processes.py
This file is where you would make adjustments to my functions and add your own. If implementing these functions at your institution, make sure that you change the function 'NTL_Hosting_Institution' to your institution or the institution that will host the item. Additionally, pay special attention to changing all identifiers and the content note for CoreTrustSeal curation levels.
This README.md file was originally created on 2024-08-02 by Peyton Tvrdy (0000-0002-9720-4725), Data Management and Data Curation Fellow, National Transportation Library peyton.tvrdy.ctr@dot.gov
2024-08-02: Version 2.0 Project Launch and README created