NYTimes Corpus: Crowdsourcing Topical Relevance

This repository contains the crowdsourcing annotations for topical relevance referenced in the following paper:

Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas and Lora Aroyo: Studying Topical Relevance with Evidence-based Crowdsourcing. CIKM 2018.

If you find this data useful in your research, please consider citing:

@inproceedings{inel2018studying,
  title={Studying Topical Relevance with Evidence-based Crowdsourcing},
  author={Inel, Oana and Haralabopoulos, Giannis and Li, Dan and Van Gysel, Christophe and Szlávik, Zoltán and Simperl, Elena and Kanoulas, Evangelos and Aroyo, Lora},
  booktitle={To Appear in the Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM)},
  year={2018},
  organization={ACM}
}

Running the notebooks

To run and regenerate the results, you need to install the stable version of the crowdtruth==2.0 package from PyPI using:

pip install crowdtruth==2.0

Crowdsourcing Templates

The following crowdsourcing templates have been used in the aforementioned article. We use the same experiment notation as in the article. To check each crowdsourcing annotation template, click on the small template icon. The image will open in a new tab.

EXP. TYPE	EXP. SETTING	CROWDSOURCING RELEVANCE ANNOTATION SCALE	DOCUMENT GRANULARITY	DOCUMENT PARAGRAPH ORDER	TARGET ANNOTATION
Pilot	3P-Doc-NoHigh	3-point scale (Highly Relevant, Relevant, Not Relevant)	Full Document	N\A	Relevance Value
Pilot	3P-Doc-High	3-point scale (Highly Relevant,Relevant, Not Relevant)	Full Document	N\A	Relevance Value + Text Highlight
Pilot	2P-Doc-NoHigh	2-point scale (Relevant, Not Relevant)	Full Document	N\A	Relevance Value
Pilot	2P-Doc-High	2-point scale (Relevant, Not Relevant)	Full Document	N\A	Relevance Value + Text Highlight
Pilot	2P-OrdPar-NoHigh	2-point scale (Relevant, Not Relevant)	Document Paragraphs	Document Order	Relevance Value
Pilot	2P-OrdPar-High	2-point scale (Relevant, Not Relevant)	Document Paragraphs	Document Order	Relevance Value + Text Highlight
Pilot	2P-RndPar-NoHigh	2-point scale (Relevant, Not Relevant)	Document Paragraphs	Random Order	Relevance Value
Pilot	2P-RndPar-High	2-point scale (Relevant, Not Relevant)	Document Paragraphs	Random Order	Relevance Value + Text Highlight
Main	2P-RndPar-High	2-point scale (Relevant, Not Relevant)	Document Paragraphs	Random Order	Relevance Value + Text Highlight

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Plots		Plots
Results		Results
crowdsourcing_data		crowdsourcing_data
ground_truth_data		ground_truth_data
notebooks		notebooks
qrels		qrels
scripts		scripts
templates		templates
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYTimes Corpus: Crowdsourcing Topical Relevance

Running the notebooks

Crowdsourcing Templates

About

Releases 1

Packages

Languages

CrowdTruth/NYT-Crowdsourcing-Topical-Relevance

Folders and files

Latest commit

History

Repository files navigation

NYTimes Corpus: Crowdsourcing Topical Relevance

Running the notebooks

Crowdsourcing Templates

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages