Yuankai Li*, Jia-Chen Gu*, Di Wu, Kai-Wei Chang and Nanyun Peng
*Equal contribution
This repository hosts the code and data for our paper BRIEF, a lightweight, T5-based approach that performs query-aware multi-hop reasoning by compressing retrieved documents into highly dense textual summaries to integrate into in-context learning.
BRIEF (Bridging Retrieval and Inference through Evidence Fusion) is a lightweight approach that performs query-aware multi-hop reasoning by compressing retrieved documents into highly dense textual summaries to integrate into in-context learning. To enable learning compression for multi-hop reasoning, we curate synthetic data by extracting atomic proposition expressions that encapsulate distinct factoids from the source documents to compose synthetic summaries.
BRIEF generates more concise summaries and enables a range of LLMs to achieve exceptional open-domain question answering (QA) performance. For example, on HotpotQA, BRIEF improves the compression rate by 2 times compared to the state-of-the-art baseline, while outperforming it by 3.00% EM and 4.16% F1 with Flan-UL2 as the reader LM. It also generates more concise summaries than proprietary GPT-3.5, while demonstrating nearly identical QA performance.- [10/29] We released the test set for Multihop-NQ and Multihop-TriviaQA as well as the compressed result.
- [10/21] We released the compressed result and evaluation script for TriviaQA, NQ, HotpotQA and MuSiQue.
-
Clone this repository and navigate to the BRIEF folder
git clone https://github.com/JasonForJoy/BRIEF cd BRIEF/code
-
Install Package
conda create -n brief python=3.9 -y conda activate brief pip install --upgrade pip pip install -e .
If this doesn't work, just install the newest version of
pytorch
,transformers
andaccelerate
To adjust accelerate
to your needs, you may accelerate config
first. Then, the code below shows how to inference through FlanUL2 using the summarized documents. The following is an example for TriviaQA.
accelerate launch --main_process_port 29500 flanul2_reader.py \
--inference_type ours \
--proposition_name_or_path ../data/TriviaQA_brief_reply.json \
--output_path TriviaQA_read.json \
--downstream_dataset tqa
When evaluating the downstream performance of our model, simply follow the code below.
python evaluation.py --total_set ../data/ground_truth/TriviaQA_GT.json \
--input_path TriviaQA_read.json
--output_path TriviaQA_result.json
You can also check the length of the summary.
python summary_length.py --input_path ../data/TriviaQA_brief_reply.json
-
RECOMP
-
LLMLingua
- Follow the example at LLMLingua/examples/RAG.ipynb
-
Selective-Context
-
Follow Selective Context for LLMs
-
Note: we explicitly use
python==3.9.19, typing_extensions==4.8.0, thinc==8.0.17, spacy==3.2.0, pydantic==1.7.4, torch==2.4.1, numpy==1.26.4
since the originalrequirements.txt
is broken.
-
If you find BRIEF useful for your research and applications, please cite using this BibTeX:
@article{li2024brief,
title = "BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression",
author = "Li, Yuankai and
Gu, Jia-Chen and
Wu, Di and
Chang, Kai-Wei and
Peng, Nanyun",
journal={arXiv preprint arXiv:2410.15277},
year = "2024"
}