Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granular QE Tagging #2

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from
Draft

Granular QE Tagging #2

wants to merge 23 commits into from

Conversation

gsarti
Copy link
Owner

@gsarti gsarti commented Apr 13, 2023

No description provided.

@gsarti
Copy link
Owner Author

gsarti commented May 5, 2023

Had a look at the progress and everything seems coherent to me! 👍 Some comments for final steps:

  • I agree on the idea of adopting Levenshtein as fallback measure if embeddings for similarity are not provided.
  • A new flag add_nametbd_quality_tags needs to be added to scripts/preprocess.py and parse_from_folder in divemt/parse_utils.py, following the approach used for WMT22 quality tags
  • Since divemt/qe_taggers.py has become long and quite hard to navigate, it would make sense to create a divemt/taggers folder to separate its contents (a file for the QETagger ABC, a file for the WMT22QETagger and one for the NameTBDTagger, plus also tag_utils.py and wmt22qe_utils.py from the original folder). Paths in scripts will need to be adjusted accordingly!

The final goal is to be able to run preprocess.py to produce outputs containing QE annotations produced by the NameTBDTagger.

@k4black
Copy link
Collaborator

k4black commented May 5, 2023

@gsarti Thanks for checking!

  • Regarding Levenshtein I am more sure about hot to use it for mt-src
  • It's currently full of debug code, that's why I do not commit it yet
  • Sure! Make sense

@gsarti
Copy link
Owner Author

gsarti commented May 5, 2023

Yeah, Levenshtein can be used only for MT-PE matching, for MT-SRC embeddings should be required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants