ODSum

Overview

ODSum introduces a benchmark for the task of Open Domain Multi-Document Summarization.

Dataset

The ODSum dataset is designed to evaluate the performance of modern summarization models in multi-document contexts spanning an open domain.

Dataset Statistics

Dataset Structure

Story

You can access the raw documents and queries paired with summaries in the data/story/raw folder. The data/story/oracle folder associates these queries with their respective 'ground truth' articles.

For the retrieval part, three distinct strategies are provided:

Sparse Retrieval (data/story/sparse)
Dense Retrieval (data/story/dense)
LLM-Embedding Retrieval (data/story/LLM-embedding)

Each of these retrieval folders contains three sub-versions:

min: Contains the least number of retrieved documents based on relevancy.
mean: An average number of retrieved documents.
max: Contains the maximum number of documents deemed relevant by the retriever.

Files in each folder:

Raw Data:
- documents: Contains the stories or documents.
- queries: Paired queries with four human-written summaries. There is no clear relationship between the query and the story in this raw form.
Oracle Data:
- Maps each query to its corresponding 'ground truth' articles.
Retrieval Data (Applies to sparse, dense, and LLM-embedding):
- min: Data with the minimum number of retrieved documents.
- mean: Data with an average number of retrieved documents.
- max: Data with the maximum number of retrieved documents based on their relevancy.

Note: The retrievers rank the documents based on their relevancy to the query, and they select the most relevant few. The number of retrieved documents is variable, depending on the retrieval strategy and the version (min, mean, max).

Models

BART Description: A sequence-to-sequence Transformer pre-trained using both a sentence permutation and text infilling objective. Checkpoint & Training: Used the BART-Large variant fine-tuned on the CNN/DailyMail dataset. It's further fine-tuned on ODSum with AdamW optimizer, utilizing a unique input format that merges queries and documents. Limitation: Due to a restricted context length of 1024 tokens, BART serves as a baseline model. PRIMERA Description: Designed explicitly for multi-document summarization, PRIMERA simplifies the processing of concatenated documents using efficient encoder-decoder transformers. Implementation: Fine-tuned on each ODSum setting. With a max input length of 4K tokens, documents are truncated to fit within this constraint. GPT Description: A well-known language model from OpenAI with proven efficacy in text summarization. Variants & Training: Employed both gpt-3.5-16k-turbo-0613 and gpt-4-0613 versions. Special prompts were crafted to guide GPT in summarization, emphasizing the placement of queries at articles' ends for better output.

Limitation: Stories and meetings had to be truncated to match GPT's max token limit. Llama-2 Description: An expansive series of auto-regressive text models renowned for capabilities from logical reasoning to text generation.

Checkpoint: Utilized the Llama-2-70b-Chat variant, which is particularly optimized for dialog contexts. For efficiency during inference, 4-bit NF4 quantization is employed.

Data Processing

To process the data and convert it into formats compatible with various summarization models, refer to data_process.ipynb

Experimental Results

Retrieval Peformance

Summarization Performance for ODSum-story

Summarization Performance for ODSum-meeting

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
code		code
data		data
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ODSum

Overview

Dataset

Dataset Statistics

Dataset Structure

Story

Models

Data Processing

Experimental Results

About

Releases

Packages

Contributors 4

Languages

License

yale-nlp/ODSum

Folders and files

Latest commit

History

Repository files navigation

ODSum

Overview

Dataset

Dataset Statistics

Dataset Structure

Story

Models

Data Processing

Experimental Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages