Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on DailyDialog dataset #8

Open
aman229 opened this issue Sep 19, 2020 · 1 comment
Open

Performance on DailyDialog dataset #8

aman229 opened this issue Sep 19, 2020 · 1 comment

Comments

@aman229
Copy link

aman229 commented Sep 19, 2020

Hi,
I tried running the Seq2Seq and HRED models on dialydialog dataset. Here are the results I got:

Model Seq2Seq Result
BLEU-1: 0.215
BLEU-2: 0.0986
BLEU-3: 0.057
BLEU-4: 0.0366
ROUGE: 0.0492
Distinct-1: 0.0268; Distinct-2: 0.131
Ref distinct-1: 0.0599; Ref distinct-2: 0.3644
BERTScore: 0.1414

Model HRED Result
BLEU-1: 0.2121
BLEU-2: 0.0961
BLEU-3: 0.0542
BLEU-4: 0.0331
ROUGE: 0.0502
Distinct-1: 0.0208; Distinct-2: 0.0992
Ref distinct-1: 0.0588; Ref distinct-2: 0.3619
BERTScore: 0.1436

These results seem to be much lower than the ones reported in the dailydialog paper: https://www.aclweb.org/anthology/I17-1099.pdf
Do you have any clues on why is that the case?
Thanks!

@gmftbyGMFTBY
Copy link
Owner

Hi, thanks for your attention on this repo. Compared with the results in the original DailyDialog paper, the BLEU-1/2 score are lower but it can also be found that the BLEU-3/4 are much better. In my opinion, the BLEU-3/4 score are more suitable than BLEU-1/2, which indicates that the model can generate more fluently. So I think it is just okay. If you are still confused about it, feel free to contact me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants