Official repository for the paper FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. Official evaluation can be done by installing the mteb
library and evaluating your MTEB compatible model with zero (or only a few) lines of code changes!
Binary | Description |
---|---|
FollowIR-7B | 7B parameter model that does document reranking given a query and instructions. It is finetuned from Mistral-7B on the datasets below |
FollowIR-train | The dataset used to train FollowIR-7B. It consists of TREC instructions and queries, and GPT generated synthetic documents that have been filtered. |
FollowIR-train-raw | The pre-filtered version of the train set above. This was not used in model training as some GPT generated data is incorrect. |
You can also find the individual annotated test data (Robust04, Core17, and News21) although the format is best used with MTEB's evaluation code.
If you wish to reproduce the experiments in the paper you can use the following code:
git clone https://github.com/orionw/FollowIR.git
cd FollowIR/
conda create -n followir python=3.9 -y
conda activate followir
pip install -r requirements.txt
bash launch_all_jobs.sh
If your model is SentenceTransformer
compatible and requires no special tokens for concatenating the query and instructions, you can simply use the following one line command:
mteb -m $MODEL_NAME -t $DATASET
for each of the datasets in {Robust04InstructionRetrieval, Core17InstructionRetrieval, News21InstructionRetrieval}
If you have a bi-encoder model but want to do something different than simply appending the instruction to the query with a space, you can extend DenseRetrievalExactSearch
and check for instructions
in kwargs. See (see models/base_sentence_transformers/ as a starting place for small modifiations and models/e5/ for an example with larger modifications).
Rerankers have now been added to MTEB! If you are using a reranker model, you will need to extend the DenseRetrievalExactSearch
class and define an __init__
and predict
function (see models/rerankers section for a variety of reranker examples). Your predict function should take in input_to_rerank
which will be a tuple of the form:
# if there are no instructions, instructions will be a list of Nones
# Instructions will be present for all of the FollowIR datasets
queries, passages, instructions = list(zip(*input_to_rerank))
Your predict
function should use these and return a list containing a score for each tuple item.
If you found the code, data or model useful, free to cite:
@misc{weller2024followir,
title={FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions},
author={Orion Weller and Benjamin Chang and Sean MacAvaney and Kyle Lo and Arman Cohan and Benjamin Van Durme and Dawn Lawrie and Luca Soldaini},
year={2024},
eprint={2403.15246},
archivePrefix={arXiv},
primaryClass={cs.IR}
}