Within COMET, there are several evaluation models available. The primary reference-based and reference-free models are:
- Default Model:
Unbabel/wmt22-comet-da
- This model employs a reference-based regression approach and is built upon the XLM-R architecture. It has been trained on direct assessments from WMT17 to WMT20 and provides scores ranging from 0 to 1, where 1 signifies a perfect translation. - Reference-free Model:
Unbabel/wmt22-cometkiwi-da
- This reference-free model employs a regression approach and is built on top of InfoXLM. It has been trained using direct assessments from WMT17 to WMT20, as well as direct assessments from the MLQE-PE corpus. Similar to other models, it generates scores ranging from 0 to 1. For those interested, we also offer larger versions of this model:Unbabel/wmt23-cometkiwi-da-xl
with 3.5 billion parameters andUnbabel/wmt23-cometkiwi-da-xxl
with 10.7 billion parameters. - eXplainable COMET (XCOMET):
Unbabel/XCOMET-XXL
- Our latest model is trained to identify error spans and assign a final quality score, resulting in an explainable neural metric. We offer this version in XXL with 10.7 billion parameters, as well as the XL variant with 3.5 billion parameters (Unbabel/XCOMET-XL
). These models have demonstrated the highest correlation with MQM and are our best performing evaluation models.
Please be aware that different models may be subject to varying licenses. To learn more, kindly refer to the LICENSES.models and model licenses sections.
If you intend to compare your results with papers published before 2022, it's likely that they used older evaluation models. In such cases, please refer to Unbabel/wmt20-comet-da and Unbabel/wmt20-comet-qe-da, which were the primary checkpoints used in previous versions (<2.0) of COMET.
UniTE Metric was developed by the NLP2CT Lab at the University of Macau and Alibaba Group, and all credits should be attributed to these groups. COMET framework fully supports running UniTE and thus we made the original UniTE-MUP checkpoint available in Hugging Face Hub. Additionally, we also trained our own UniTE model using the same data as wmt22-comet-da
. You can access both models here:
Unbabel/unite-mup
- This is the original UniTE Metric proposed in the UniTE: Unified Translation Evaluation paper.Unbabel/wmt22-unite-da
- This model was trained for our paper (Rei et al., ACL 2023) and it uses the same data asUnbabel/wmt22-comet-da
thus, the outputed scores are between 0 and 1.Unbabel/unite-xxl
- xCOMET models (Guerreiro et al. 2023) are training following a curriculum. The checkpoint resulting from the first phase of that curriculum (before the introduction of a sequence tagging task and MQM data) is a UniTE Model. An XL version is also available.
All other models developed through the years can be accessed through the following links:
Model | Download Link | Paper |
---|---|---|
emnlp20-comet-rank |
🔗 | 🔗 |
wmt20-comet-qe-da |
🔗 | 🔗 |
wmt21-comet-da |
🔗 | 🔗 |
wmt21-comet-mqm |
🔗 | 🔗 |
wmt21-comet-qe-da |
🔗 | 🔗 |
wmt21-comet-qe-mqm |
🔗 | 🔗 |
wmt21-comet-qe-da |
🔗 | 🔗 |
wmt21-cometinho-mqm |
🔗 | 🔗 |
wmt21-cometinho-da |
🔗 | 🔗 |
eamt22-cometinho-da |
🔗 | 🔗 |
eamt22-prune-comet-da |
🔗 | 🔗 |
Example :
wget https://unbabel-experimental-models.s3.amazonaws.com/comet/eamt22/eamt22-cometinho-da.tar.gz
tar -xf eamt22-cometinho-da.tar.gz
comet-score -s src.de -t hyp1.en -r ref.en --model eamt22-cometinho-da/checkpoints/model.ckpt