HWU64 odd number of samples #17

jnehring · 2022-02-02T12:24:02Z

The HWU64 dataset contains 25k samples according to the original paper. The DialoGLUE paper stats the same number of samples.

However, the Readme states 11k samples.

If I count the number of samples which are actually in the HWU64 part of DialoGLUE then I get 12,112 samples (12k).

My questions:

Is there a reason for the difference in numbers in the original HWU64 and in the DialoGLUE HWU64? Or is it a bug?
Did you compute the performance of the intent prediction models on 25k, 12k or 11k samples?

Thank you for your answers :)

Shikib · 2022-02-02T13:00:20Z

Thank you for bringing this discrepancy to our attention. We use the data downloading scripts provided by [https://arxiv.org/pdf/2009.13570.pdf] to get all of the intent prediction datasets, including HWU, in order to maintain consistency with prior work on intent prediction. The performance is reported on the data that is obtained by the data downloading scripts (i.e., the ~11k data points). The error here is the 25k number cited in the paper, which I will attempt to get corrected in the near future.

I'm not sure how you're counting 12,112 samples. I just ran the data downloading scripts and see:

1077 test.csv
9961 train.csv

jnehring · 2022-02-02T13:40:16Z

thank you for the quick answer :) I counted 12,112 by adding

9960 train.csv
1076 test.csv
1076 val.csv

so I counted the val file also. There is still a discrepancy of 1 sample but that's ok for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HWU64 odd number of samples #17

HWU64 odd number of samples #17

jnehring commented Feb 2, 2022

Shikib commented Feb 2, 2022 •

edited

Loading

jnehring commented Feb 2, 2022

HWU64 odd number of samples #17

HWU64 odd number of samples #17

Comments

jnehring commented Feb 2, 2022

Shikib commented Feb 2, 2022 • edited Loading

jnehring commented Feb 2, 2022

Shikib commented Feb 2, 2022 •

edited

Loading