This work introduces a hierarchical Transformer-based architecture called DART (Document-level Aspect-based Representation from Transformers) which effectively encodes information at different level of granularities with attention aggregation mechanisms to learn the local and global aspect-specific document representations.
The directory structure of this project is:
├── configs <- Hydra configuration files
│ ├── logdir <- Logger configs
│ ├── data <- Datamodule configs
│ ├── model <- Modelmodule configs
│ ├── experiment <- Experiment configs
│ └── cfg.yaml <- Main config for training
│
├── dataset <- Project data
├── datamodules <- Datamodules (TripAdvisor, BeerAdvocate, SocialNews)
├── models <- Models (DART, Longformer, BigBird)
├── logs <- Logs generated by hydra and lightning loggers
├── outputs <- Save generated data
├── utils <- Utility scripts
│
├── run.py <- Run Training and evaluation
└── README.md
Step 0. Download and install Miniconda from the official website.
Step 1. Install DART and dependencies.
Step 2. Specify root_dir
in configs/cfg.yaml
# clone project
git clone https://github.com/YanZehong/dart
cd dart
# [OPTIONAL] create conda environment and activate it
conda create -n dart -y python=3.10 pip
conda activate dart
# install pytorch according to instructions
# https://pytorch.org/get-started/
# conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
# install requirements
pip install -r requirements.txt
# IMPORTANT: modify the project path in configs/cfg.yaml
# root_dir: ''
Note: To install requirements, run
pip install -r requirements.txt
. Please ensure that you have met the prerequisites in PyTorch and install corresponding version.
Train model with chosen experiment configuration from configs/experiment/. For different datasets, please use the recommended experimental settings (tripadvisor-dart
, beeradvocate-dart
and socialnews-dart
).
python run.py experiment=socialnews-dart gpu=1
Train model with default configuration
# train on 1 GPU
python run.py gpu=1
# train with DDP (Distributed Data Parallel) (3 GPUs)
python run.py gpu=[0,1,2]
Warning: Currently there are problems with DDP mode, read this issue to learn more.
You can override any parameter from command line like this
python run.py gpu=3 train.num_epochs=10 train.batch_size=32
Note When you only specify some variables, other values/parameters will use default setting from configs/cfg.yaml.
# Specify the corresponding dataset
# by setting DATA_NAME as trip_advisor/beer_advocate/social_news
# evaluate on 1 GPU
python eval.py gpu=1 data=DATA_NAME ckpt_path='/path/to/ckpt/name.ckpt'
# evaluate on cpu
python eval.py ckpt_path='/path/to/ckpt/name.ckpt'
You can find fine-tuned checkpoints here.
We recommend using the following checkpoints :
Data | Fine-tuned Checkpoint | Size | Accuracy |
---|---|---|---|
trip_advisor | epoch=3-step=8694.ckpt | 1962MB | 86.36% |
beer_advocate | epoch=3-step=5936.ckpt | 1962MB | 88.13% |
social_news | epoch=4-step=840.ckpt | 1962MB | 83.81% |
# Optionally, you can download them using wget or gdown as
# Next, unzip it to the directory `ckpt_path`
pip install gdown
gdown --folder https://drive.google.com/drive/folders/1OAJw4dLMSe5ySM2QUy2lBtNPgFd74k1c
Note: If you get an error
mismatched input '=' expecting <EOF>
, use the escape character\=
to fix this problem. Or you can specify the value ofckpt_path
in configs/cfg.yaml. Consider visiting that gdown page for full instructions, since the source repo may have more up-to-date instructions.
Use Miniconda for GPU environments
Use miniconda for your python environments (it's usually unnecessary to install full anaconda environment, miniconda should be enough). It makes it easier to install some dependencies, like cudatoolkit for GPU support. It also allows you to access your environments globally.
Example installation:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Create new conda environment:
conda create -n dart python=3.10
conda activate dart
Use torchmetrics
Use official torchmetrics library to ensure proper calculation of metrics. This is especially important for multi-GPU training!
For example, instead of calculating accuracy by yourself, you should use the provided Accuracy
class like this:
from torchmetrics import Accuracy
class ModelModule(LightningModule):
def __init__(self)
self.train_acc = Accuracy()
self.val_acc = Accuracy()
def training_step(self, batch, batch_idx):
...
acc = self.train_acc(predictions, targets)
self.log("train/acc", acc)
...
def validation_step(self, batch, batch_idx):
...
acc = self.val_acc(predictions, targets)
self.log("val/acc", acc)
...
Make sure to use different metric instance for each step to ensure proper value reduction over all GPU processes.
Torchmetrics provides metrics for most use cases, like F1 score or confusion matrix. Read documentation for more.
Follow PyTorch Lightning style guide
The style guide is available here.
Overview of the model: DART overcomes the restriction of 512 tokens by splitting the long document into sentences or chunks of less than 512 tokens, and processing each sentence/chunk before aggregating the results. The proposed DART framework takes as input a document
- Sentence Encoding Block.
- Global Context Interaction Block.
- Aspect Aggregation Block.
- Sentiment Classification Block.
Experiments on multiple datasets including a curated dataset of long documents on social issues. Additionally, you can run fine-tuning of the downloaded model on your dataset of interest.
Dataset | #aspects | #docs | #long docs (%) | #sentences/doc | #tokens/doc | #tokens/sentence |
---|---|---|---|---|---|---|
TripAdvisor | 7 | 28543 | 4027 (14.1%) | 12.9 | 298.9 | 23.1 |
BeerAdvocate | 4 | 27583 | 217 (0.8%) | 11.1 | 173.5 | 15.7 |
SocialNews | 6 | 4512 | 1031 (22.9%) | 17.5 | 389.8 | 22.2 |
All code and models are released under the Apache 2.0 license. See the
LICENSE
file for more information.
All experiments in the paper were fine-tuned on a GPU/GPUs, which has 40GB of device RAM. Therefore, when using a GPU with 12GB - 16GB of RAM, you are likely to encounter out-of-memory issues if you use the same hyperparameters described in the paper. Additionally, different models require different amount of memory. Available memory also depends on the accelerator configuration (both type and count).
The factors that affect memory usage are:
-
data.max_num_seq
: You can fine-tune with a shorter max sequence length to save substantial memory. -
train.batch_size
: The memory usage is also directly proportional to the batch size. You could decrease thetrain.batch_size=8
(and decreasetrain.lr
accordingly) if you encounter an out-of-memory error. -
model.backbone
,base
vs.large
: Thelarge
model requires significantly more memory thanbase
.
If you find this work useful, please cite as following:
@inproceedings{yan-etal-2024-modeling,
title = "Modeling Complex Interactions in Long Documents for Aspect-Based Sentiment Analysis",
author = "Yan, Zehong and
Hsu, Wynne and
Lee, Mong-Li and
Bartram-Shaw, David",
booktitle = "Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, {\&} Social Media Analysis",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.wassa-1.3",
pages = "23--34",
}
If we submit the paper to a conference or journal, we will update the BibTeX.
For help or issues using DART, please submit a GitHub issue.
For personal communication related to DART, please contact me.
Yan Zehong 💻 |