Datsets used in AAAI's paper Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric by Ivan Yamshchikov, Viacheslav Shibaev, Nikolay Khlebnikov, Alexey Tikhonov.
-
Yelp paraphrases - contains paraphrases for Yelp dataset
-
Yelp random - contains random pairs from Yelp dataset
-
Paraphrase - contains paraphrases from Paraphrase dataset
-
Paraphrase random - contains random pairs from Paraphrase dataset
-
Paralex paraphrases - contains paraphrases from Paralex dataset
-
Yelp random - contains random pairs from Paralex dataset
-
Bible paraphrases - contains paraphrases from Bible dataset
-
Bible random - contains random pairs from Bible dataset
-
GYAFC formal paraphrases - contains paraphrases from formal part of GYAFC dataset
-
GYAFC formal random - contains random pairs from formal part of GYAFC dataset
-
GYAFC informal paraphrases - contains paraphrases from informal part of GYAFC dataset
-
GYAFC informal random - contains random pairs from informal part of GYAFC dataset
-
GYAFC rewrites paraphrases - contains paraphrases from rewrites part of GYAFC dataset
-
GYAFC rewrites random - contains random pairs from rewrites part of GYAFC dataset
- text_1 - given text
- text_2 - random/real paraphrase of text_1
- label_1 - score of the first labeller
- label_2 - score of the second labeller
- label_3 - score of the third labeller
- avg_score - average labellers score
All scores are in the range from 1 to 5, where 1 - "Not Similar At All" and 5 - "Highly Similar"