This repository contains the source codes for the following papers:
-
GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding.
Jia-Chen Gu, Zhe-Hua Ling, Quan Liu, Cong Liu, Guoping Hu
ACL 2023 -
MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding.
Jia-Chen Gu, Chongyang Tao, Zhen-Hua Ling, Can Xu, Xiubo Geng, Daxin Jiang
ACL 2021
Recently, various neural models for multi-party conversation (MPC) have achieved impressive improvements on a variety of tasks such as addressee recognition, speaker identification and response prediction. However, these existing methods on MPC usually represent interlocutors and utterances individually and ignore the inherent complicated structure in MPC which may provide crucial interlocutor and utterance semantics and would enhance the conversation understanding process. To this end, we present MPC-BERT, a pre-trained model for MPC understanding that considers learning who says what to whom in a unified model with several elaborated self-supervised tasks. Particularly, these tasks can be generally categorized into (1) interlocutor structure modeling including reply-to utterance recognition, identical speaker searching and pointer consistency distinction, and (2) utterance semantics modeling including masked shared utterance restoration and shared node detection. We evaluate MPC-BERT on three downstream tasks including addressee recognition, speaker identification and response selection. Experimental results show that MPC-BERT outperforms previous methods by large margins and achieves new state-of-the-art performance on all three downstream tasks at two benchmarks.
Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention. However, existing methods on MPC understanding typically embed interlocutors and utterances into sequential information flows, or utilize only the superficial of inherent graph structures in MPCs. To this end, we present a plug-and-play and lightweight method named graph-induced fine-tuning (GIFT) which can adapt various Transformer-based pre-trained language models (PLMs) for universal MPC understanding. In detail, the full and equivalent connections among utterances in regular Transformer ignore the sparse but distinctive dependency of an utterance on another in MPCs. To distinguish different relationships between utterances, four types of edges are designed to integrate graph-induced signals into attention mechanisms to refine PLMs originally designed for processing sequential texts. We evaluate GIFT by implementing it into three PLMs, and test the performance on three downstream tasks including addressee recognition, speaker identification and response selection. Experimental results show that GIFT can significantly improve the performance of three PLMs on three downstream tasks and two benchmarks with only 4 additional parameters per encoding layer, achieving new state-of-the-art performance on MPC understanding.
Python 3.6
Tensorflow 1.13.1
-
Download the BERT released by the Google research, and move to path: ./uncased_L-12_H-768_A-12
-
We also release the pre-trained MPC-BERT model, and move to path: ./uncased_L-12_H-768_A-12_MPCBERT. You just need to fine-tune it to reproduce our results.
-
Download the Hu et al. (2019) dataset used in our paper, and move to path:
./data/ijcai2019/
-
Download the Ouchi and Tsuboi (2016) dataset used in our paper, and move to path:
./data/emnlp2016/
Unzip the dataset and run the following commands.cd data/emnlp2016/ python data_preprocess.py
Create the pre-training data.
python create_pretraining_data.py
Running the pre-training process.
cd scripts/
bash run_pretraining.sh
The pre-trained model will be saved to the path ./uncased_L-12_H-768_A-12_MPCBERT
.
Modify the filenames in this folder to make it the same as those in Google's BERT.
Take the task of addressee recognition as an example.
Create the fine-tuning data.
python create_finetuning_data_ar.py
Running the fine-tuning process.
cd scripts/
bash run_finetuning.sh
Modify the variable restore_model_dir
in run_testing.sh
Running the testing process.
cd scripts/
bash run_testing.sh
Take the task of addressee recognition as an example.
Create the fine-tuning data.
python create_finetuning_data_ar_gift.py
Running the fine-tuning process.
cd scripts/
bash run_finetuning_gift.sh
Modify the variable restore_model_dir
in run_testing_gift.sh
Running the testing process.
cd scripts/
bash run_testing_gift.sh
Replace these scripts and its corresponding data when evaluating on other downstream tasks.
create_finetuning_data_{ar, si, rs}_gift.py
run_finetuning_{ar, si, rs}_gift.py
run_testing_{ar, si, rs}_gift.py
Specifically for the task of response selection, a output_test.txt
file which records scores for each context-response pair will be saved to the path of restore_model_dir
after testing.
Modify the variable test_out_filename
in compute_metrics.py
and then run python compute_metrics.py
, various metrics will be shown.
If you think our work is helpful or use the code, please cite the following paper:
@inproceedings{gu-etal-2023-gift,
title = "{GIFT}: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding",
author = "Gu, Jia-Chen and
Ling, Zhen-Hua and
Liu, Quan and
Liu, Cong and
Hu, Guoping",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.651",
pages = "11645--11658",
}
@inproceedings{gu-etal-2021-mpc,
title = "{MPC}-{BERT}: A Pre-Trained Language Model for Multi-Party Conversation Understanding",
author = "Gu, Jia-Chen and
Tao, Chongyang and
Ling, Zhen-Hua and
Xu, Can and
Geng, Xiubo and
Jiang, Daxin",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.285",
pages = "3682--3692",
}
Thank Wenpeng Hu and Zhangming Chan for providing the processed Hu et al. (2019) dataset used in their paper.
Thank Ran Le for providing the processed Ouchi and Tsuboi (2016) dataset used in their paper.
Thank Prasan Yapa for providing a TF 2.0 version of MPC-BERT.
Please keep an eye on this repository if you are interested in our work. Feel free to contact us (gujc@ustc.edu.cn) or open issues.