v1.0.1: Multi-GPU, HF dataloaders, MonoT5 rerankers and a brand new Wiki page
There have been multiple changes done to the repository ever since the last version. You can find the latest changes mentioned here below:
1. Brand New Wiki page for BEIR
Starting from v1.0.1, we have created a new Wiki page for the BEIR benchmark. We would keep it updated with the latest datasets available out there, examples of how you can evaluate your models on BEIR, Leaderboard, etc. Correspondingly we have shortened our README.md and displayed only necessary information out there. For a full overview. one can view the BEIR Wiki.
You can view the BEIR Wiki here: https://github.com/beir-cellar/beir/wiki.
2. Multi GPU evaluation with SBERT dense retrievers using Distributed Evaluation
Thanks to @NouamaneTazi, we currently now support multiple GPU evaluation for SBERT models across all datasets in BEIR. These benefit evaluation on large datasets such as BioASQ, where encoding takes 1 day at least to complete on a single GPU. Now with access to multi GPUs, one can evaluate large datasets quickly in contrast to old single GPU evaluation. Only Caveat, running on multiple GPUs requires evaluate
library to be installed which has a python version requirement of >= 3.7.
Example: evaluate_sbert_multi_gpu.py
3. Hugging Face Data loader for BEIR dataset. Uploaded all datasets on HF.
We added Hugging Face Dataloaders for all the public BEIR datasets. One can use it to easily work with BEIR datasets available on Hugging Face. We also made available all corpus and queries for eg. BeIR/fiqa
and qrels BeIR/fiqa-qrels
for all public BEIR datasets on HuggingFace. This step would mean one does not need to download the datasets and keep the locally in RAM. Again thanks to @NouamaneTazi.
You can find all datasets here: https://huggingface.co/BeIR
Example: evaluate_sbert_hf_loader.py
4. Added support for the T5 reranking model: monoT5 reranker
We added the support of the monoT5 reranking model within BEIR. These are stronger (but complex) rerankers that can be used to attain the best reranking performances currently on the BEIR benchmark.
Example: evaluate_bm25_monot5_reranking.py
5. Fix: Add ignore_identical_ids
with BEIR evaluation
Thanks to @kwang2049, we added a check to ignore identical ids within the evaluation script. This causes issues with ArguAna and Quora datasets, particularly as there a document and query can be alike (with the same id). By default, we remove these ids and evaluate the dataset accordingly. With this fix, one can evaluate Quora and ArguAna and provide the accurate and reproducible nDCG@10 scores.
5. Added HNSWSQ method in faiss retrieval methods
We added support to HNSWSQ faiss index method as a memory compression-based technique to evaluate across the BEIR datasets.
6. Added dependency of datasets library within setup.py
In order to support HF data loaders, we added the dependency of the datasets
library within our setup.py
.