Long Paper Accepted at the EMNLP 2022 Main Conference!
The ECTSum dataset can be found under the data
folder.
Dataset | # Docs. | Coverage | Density | Compression | # Tokens Doc. | # Tokens Summary |
---|---|---|---|---|---|---|
Arxiv/PubMed | 346,187 | 0.87 | 3.94 | 31.17 | 5179.22 | 257.44 |
BillSum | 23,455 | _ | 4.12 | 13.64 | 1813.0 | 207.7 |
BigPatent | 1,341,362 | 0.86 | 2.38 | 36.84 | 3629.04 | 116.67 |
GovReport | 19,466 | _ | 19.01 | 19.01 | 9409.4 | 553.4 |
BookSum | 12,293 | 0.78 | 1.69 | 15.97 | 5101.88 | 505.32 |
------------ | --------- | --------- | ------- | ------------ | --------- | ---------- |
ECTSum | 2,425 | 0.85 | 2.43 | 103.67 | 2916.44 | 49.23 |
Codes and instructions for our proposed model ECT-BPS can be found under codes/ECT-BPS
Codes and instructions for our baseline models can be found under codes/baselines
brew install pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc
Open another terminal tab
cd <project_folder>
pyenv install 3.9.16
pyenv local 3.9.16
pyenv which python
pip install torch torchvision torchaudio
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
pip install sentence-transformers
pip install num2words
pip install word2number
python prepare_data_gpt3.py
The data is saved at out-data/
.
Processed data should be at this location.