This notebook includes a python function to parse newspaper articles downloaded from ProQuest Newsstream into a pandas dataframe (and save to CSV) with metadata and full text (when full text is available).
Created by Cody Hennesy and David Naughton (University of Minnesota, Twin Cities, Libraries). Email Cody (chennesy@umn.edu) with any questions.
For an alternative approach using R and saving documents as HTML files, Jae Yeon Kim's Tidy Ethnic News parser.
See also: Factiva parser