Question-Paraphrasing-System

The task is to generate paraphrased questions, that is questions that have the same meaning but are different in terms of the vocabulary and grammar of the sentence The training data is derived from the Datasets of Paraphrased SQuAD Questions https://github.com/nusnlp/paraphrasing-squad and consists of 1,118 paraphrased questions. The evaluation consists of 100 questions also derived from the Stanford Question Answering Dataset https://rajpurkar.github.io/SQuAD-explorer/

Original Compitition Link :

https://competitions.codalab.org/competitions/28529

Organizers :

John P. McCrae - Data Science Institute, National University of Ireland Galway

Evaulation Criteria:

Evaluation will be in terms of BLEU and PINC. BLEU score measures the similarity of the paraphrases with the reference sentences.

Model

Model 1

Its a pretrained T5 transformer which is trained on a huge quora dataset and give good results : Harmonic Mean : 0.118 (1) BLUE : 0.096 (2) PINC : 0.586 (1)

Model 2

Trained T5 transformer on my dataset on google colab and then downloaded those trained models to find paraphrases on eval Here i have experimented first by splitting the train dataset in 80:20 and then in 95:5 as the dataset is small, the second gives better results

With 80:20 split : Harmonic Mean : 0.085 (1) BLUE : 0.104 (2) PINC : 0.322 (1)

With 95:5 split: Harmonic Mean : 0.104 (1) BLUE : 0.113 (2) PINC : 0.378 (1)

Model 3

Tried creating a systrem from scratch , used a encoder decoder model. Created two models here where the second one a stacked LSTM enoder decoder. I thought increasing the complexity will help me getting good results but results are unsatisfactory. So I have not submitted these in compitition

Model 4

Tried experimenting with BART for paraphrasing task but results are not good. BART mimics the input sequence as is it which is also one of a shortcoming of BART. Both in Model 3 and Model 4 I felt working with a bigger corpus can give some great results Other dataset for paraphrasing can be found here:

Model 5

It contain's 2 variation both have fixed input length as 100 which improves the scores. In model 6 I have extended the dataset by attaching quora dataset and then shuffling the overall dataset, but it gives less score than model 5 which is simple same model as model 2 just with fixed input length as 100. We can consider model 5 as best model of all experiments with

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
compitition-dataset		compitition-dataset
model1		model1
model2		model2
model3		model3
model4		model4
model5		model5
LICENSE		LICENSE
Links.txt		Links.txt
Question-Paraphrasing-System.pdf		Question-Paraphrasing-System.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question-Paraphrasing-System

Original Compitition Link :

Organizers :

Evaulation Criteria:

Model

Model 1

Model 2

Model 3

Model 4

Model 5

Harmonic Mean : 0.108 (4)

BLUE : 0.127 (6)

PINC : 0.327 (17)

About

Releases

Packages

Languages

License

prakhargurawa/Question-Paraphrasing-System

Folders and files

Latest commit

History

Repository files navigation

Question-Paraphrasing-System

Original Compitition Link :

Organizers :

Evaulation Criteria:

Model

Model 1

Model 2

Model 3

Model 4

Model 5

Harmonic Mean : 0.108 (4)

BLUE : 0.127 (6)

PINC : 0.327 (17)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages