Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No references found for test set wmt23/* #261

Closed
kellymarchisio opened this issue Mar 28, 2024 · 4 comments
Closed

No references found for test set wmt23/* #261

kellymarchisio opened this issue Mar 28, 2024 · 4 comments

Comments

@kellymarchisio
Copy link

Running the below command results in: sacreBLEU: No references found for test set wmt23/en-de.:
cat $OUTFILE | sacrebleu -t wmt23 -l en-de

Same occurs for de-en/en-ja (I did not try others)

@martinpopel
Copy link
Collaborator

I found this bug last week and fixed it in #260, which I had merged now. (Thanks for reporting. I've forgotten this issue over the week.)

@kellymarchisio
Copy link
Author

kellymarchisio commented Mar 31, 2024

Nice, thanks! Do you plan to push the change to PyPi for easy install? (Of course installation is also easily done with python setup.py install, but may be more user-friendly out-of-the-box to have it on pypi)

@kellymarchisio
Copy link
Author

kellymarchisio commented Apr 11, 2024

Hi @martinpopel - I'm just coming back to this, and looks like I still have the issue.

  1. I cleared out the cache with rm -r /home/kelly/.sacrebleu/wmt23
  2. I reinstalled sacrebleu from source with python setup.py install
  3. I run sacrebleu -i test.out -t wmt23 -l en-de, which results in
sacreBLEU: No references found for test set wmt23/en-de.
sacreBLEU: System and reference streams have different lengths.
sacreBLEU: This could be an issue with your system output or with sacreBLEU's reference database if -t is given.
sacreBLEU: For the latter, try cleaning out the cache by typing:

sacreBLEU:   rm -r /home/kelly/.sacrebleu/wmt23

sacreBLEU: The test sets will be re-downloaded the next time you run sacreBLEU.

My test.out is 557 lines, as it should be. When I run wc on the .sacrebleu cache, I see:

557   27933  186081 wmt23.en-de.AIRC
    557   34155  233250 wmt23.en-de.GPT4-5shot
    557   33524  227284 wmt23.en-de.Lan-BridgeMT
    557   28501  192144 wmt23.en-de.NLLB_Greedy
    557   28117  188344 wmt23.en-de.NLLB_MBR_BLEU
    557   34736  234666 wmt23.en-de.ONLINE-A
    557   34410  236635 wmt23.en-de.ONLINE-B
    557   34060  231497 wmt23.en-de.ONLINE-G
    557   33365  226666 wmt23.en-de.ONLINE-M
    557   34749  234086 wmt23.en-de.ONLINE-W
    557   34450  235075 wmt23.en-de.ONLINE-Y
    557   37148  249078 wmt23.en-de.ZengHuiMT
    557     557   13278 wmt23.en-de.docid
    557     557    1671 wmt23.en-de.origlang
    557   34625  234302 wmt23.en-de.ref-refA
    557   34711  197749 wmt23.en-de.src

Do you know my next steps for fixing this?

I can of course get around this by running cat test.out | sacrebleu --tokenize 13a ~/.sacrebleu/wmt23/wmt23.en-de.ref-refA, but I want to match the intended implementation exactly to reduce chances of error.

@mjpost
Copy link
Owner

mjpost commented Apr 12, 2024

I just released v2.4.2, which includes this bugfix, and also adds a domain field (available with --echo) for WMT22 and WMT23.

@mjpost mjpost closed this as completed Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants