Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HWU64 odd number of samples #17

Open
jnehring opened this issue Feb 2, 2022 · 2 comments
Open

HWU64 odd number of samples #17

jnehring opened this issue Feb 2, 2022 · 2 comments

Comments

@jnehring
Copy link

jnehring commented Feb 2, 2022

The HWU64 dataset contains 25k samples according to the original paper. The DialoGLUE paper stats the same number of samples.

However, the Readme states 11k samples.

If I count the number of samples which are actually in the HWU64 part of DialoGLUE then I get 12,112 samples (12k).

My questions:

  • Is there a reason for the difference in numbers in the original HWU64 and in the DialoGLUE HWU64? Or is it a bug?
  • Did you compute the performance of the intent prediction models on 25k, 12k or 11k samples?

Thank you for your answers :)

@Shikib
Copy link

Shikib commented Feb 2, 2022

Thank you for bringing this discrepancy to our attention. We use the data downloading scripts provided by [https://arxiv.org/pdf/2009.13570.pdf] to get all of the intent prediction datasets, including HWU, in order to maintain consistency with prior work on intent prediction. The performance is reported on the data that is obtained by the data downloading scripts (i.e., the ~11k data points). The error here is the 25k number cited in the paper, which I will attempt to get corrected in the near future.

I'm not sure how you're counting 12,112 samples. I just ran the data downloading scripts and see:

1077 test.csv
9961 train.csv

@jnehring
Copy link
Author

jnehring commented Feb 2, 2022

thank you for the quick answer :) I counted 12,112 by adding

9960 train.csv
1076 test.csv
1076 val.csv

so I counted the val file also. There is still a discrepancy of 1 sample but that's ok for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants