Skip to content

BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression

Notifications You must be signed in to change notification settings

JasonForJoy/BRIEF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression

Yuankai Li*, Jia-Chen Gu*, Di Wu, Kai-Wei Chang and Nanyun Peng

*Equal contribution

This repository hosts the code and data for our paper BRIEF, a lightweight, T5-based approach that performs query-aware multi-hop reasoning by compressing retrieved documents into highly dense textual summaries to integrate into in-context learning.

Overview

BRIEF (Bridging Retrieval and Inference through Evidence Fusion) is a lightweight approach that performs query-aware multi-hop reasoning by compressing retrieved documents into highly dense textual summaries to integrate into in-context learning. To enable learning compression for multi-hop reasoning, we curate synthetic data by extracting atomic proposition expressions that encapsulate distinct factoids from the source documents to compose synthetic summaries.



BRIEF generates more concise summaries and enables a range of LLMs to achieve exceptional open-domain question answering (QA) performance. For example, on HotpotQA, BRIEF improves the compression rate by 2 times compared to the state-of-the-art baseline, while outperforming it by 3.00% EM and 4.16% F1 with Flan-UL2 as the reader LM. It also generates more concise summaries than proprietary GPT-3.5, while demonstrating nearly identical QA performance.


Release

  • [10/29] We released the test set for Multihop-NQ and Multihop-TriviaQA as well as the compressed result.
  • [10/21] We released the compressed result and evaluation script for TriviaQA, NQ, HotpotQA and MuSiQue.

Installation and Setup

  1. Clone this repository and navigate to the BRIEF folder

    git clone https://github.com/JasonForJoy/BRIEF
    cd BRIEF/code
  2. Install Package

    conda create -n brief python=3.9 -y
    conda activate brief
    pip install --upgrade pip  
    pip install -e .

    If this doesn't work, just install the newest version of pytorch, transformers and accelerate

Start with FlanUL2

To adjust accelerate to your needs, you may accelerate config first. Then, the code below shows how to inference through FlanUL2 using the summarized documents. The following is an example for TriviaQA.

accelerate launch --main_process_port 29500 flanul2_reader.py \
--inference_type ours \
--proposition_name_or_path ../data/TriviaQA_brief_reply.json \
--output_path TriviaQA_read.json \
--downstream_dataset tqa

Evaluation Code

When evaluating the downstream performance of our model, simply follow the code below.

python evaluation.py --total_set ../data/ground_truth/TriviaQA_GT.json \
--input_path TriviaQA_read.json
--output_path TriviaQA_result.json

You can also check the length of the summary.

python summary_length.py --input_path ../data/TriviaQA_brief_reply.json

Baselines of other models

Citation

If you find BRIEF useful for your research and applications, please cite using this BibTeX:

@article{li2024brief,
 title = "BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression",
 author = "Li, Yuankai  and
           Gu, Jia-Chen  and
           Wu, Di  and
           Chang, Kai-Wei  and
           Peng, Nanyun",
 journal={arXiv preprint arXiv:2410.15277},
 year = "2024"
}

About

BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published