Great Recession News Corpus

Overview

Reconstructed "Great Recession News" Corpus, described in "Building the Great Recession News Corpus (GRNC): A contemporary diachronic corpus of economy news in English" (Research in Corpus Linguistics, 2020).

The authors don't share the source data neither the list of articles. The corpus can be only interacted through paid plans of Sketch Engine.

This repository offers an alternative by publishing digital identifiers (urls) of the documents used in the corpus. The content can be further retrieved for non-commerical purposes through APIs for developers or scrappers.

Data Description

SketchEngine processed 18,915 articles from "The Guardian" and 13,069 articles from "New York Times". There are some redundancies in the data, not mentioned in the original paper. The urls retrieved in this repo perfectly match what is available at SketchEngine.

source	unique urls	total urls
New York Times	12556	13069
The Guardian	18161	18915

Methodology

I developed a Selenium Bot to extract article identifiers that were available from SketchEngine.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Great Recession News Corpus

Overview

Data Description

Methodology

About

Releases

Packages

Languages

License

maciejskorski/GreatRecessionNews

Folders and files

Latest commit

History

Repository files navigation

Great Recession News Corpus

Overview

Data Description

Methodology

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages