Skip to content

Script to parse text file downloads from ProQuest's Global Newsstream database into CSV of metadata and full text.

License

Notifications You must be signed in to change notification settings

chennesy/pq_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parse ProQuest Metadata

This notebook includes a python function to parse newspaper articles downloaded from ProQuest Newsstream into a pandas dataframe (and save to CSV) with metadata and full text (when full text is available).

Created by Cody Hennesy and David Naughton (University of Minnesota, Twin Cities, Libraries). Email Cody (chennesy@umn.edu) with any questions.

For an alternative approach using R and saving documents as HTML files, Jae Yeon Kim's Tidy Ethnic News parser.

See also: Factiva parser

About

Script to parse text file downloads from ProQuest's Global Newsstream database into CSV of metadata and full text.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published