Skip to content

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Notifications You must be signed in to change notification settings

orionw/FollowIR

Repository files navigation

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Official repository for the paper FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. Official evaluation can be done by installing the mteb library and evaluating your MTEB compatible model with zero (or only a few) lines of code changes!

Links

Binary Description
FollowIR-7B 7B parameter model that does document reranking given a query and instructions. It is finetuned from Mistral-7B on the datasets below
FollowIR-train The dataset used to train FollowIR-7B. It consists of TREC instructions and queries, and GPT generated synthetic documents that have been filtered.
FollowIR-train-raw The pre-filtered version of the train set above. This was not used in model training as some GPT generated data is incorrect.

You can also find the individual annotated test data (Robust04, Core17, and News21) although the format is best used with MTEB's evaluation code.

Installation

If you wish to reproduce the experiments in the paper you can use the following code:

git clone https://github.com/orionw/FollowIR.git
cd FollowIR/
conda create -n followir python=3.9 -y
conda activate followir
pip install -r requirements.txt
bash launch_all_jobs.sh

Usage

If your model is SentenceTransformer compatible and requires no special tokens for concatenating the query and instructions, you can simply use the following one line command:

mteb -m $MODEL_NAME -t $DATASET

for each of the datasets in {Robust04InstructionRetrieval, Core17InstructionRetrieval, News21InstructionRetrieval}

If you have a bi-encoder model but want to do something different than simply appending the instruction to the query with a space, you can extend DenseRetrievalExactSearch and check for instructions in kwargs. See (see models/base_sentence_transformers/ as a starting place for small modifiations and models/e5/ for an example with larger modifications).

Reranker Usage

Rerankers have now been added to MTEB! If you are using a reranker model, you will need to extend the DenseRetrievalExactSearch class and define an __init__ and predict function (see models/rerankers section for a variety of reranker examples). Your predict function should take in input_to_rerank which will be a tuple of the form:

# if there are no instructions, instructions will be a list of Nones
# Instructions will be present for all of the FollowIR datasets
queries, passages, instructions = list(zip(*input_to_rerank))

Your predict function should use these and return a list containing a score for each tuple item.

Citing

If you found the code, data or model useful, free to cite:

@misc{weller2024followir,
      title={FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions}, 
      author={Orion Weller and Benjamin Chang and Sean MacAvaney and Kyle Lo and Arman Cohan and Benjamin Van Durme and Dawn Lawrie and Luca Soldaini},
      year={2024},
      eprint={2403.15246},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

About

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Topics

Resources

Stars

Watchers

Forks