Authors: Xingbang Liu, Hualiang Qin
BERTScore was proposed by Zhang, et al (2020), as a new metric for generated text evaluation. Its introduction of contextual embedding not only outperforms common metrics but also is more robust to the adversarial paraphrases. In the discussion part of this paper, it is mentioned that BERTScore can be integrated into training steps for learning loss computation since its differentiability. In this project, we are going to incorporate BERTScore into the T5 model and train on the WebNLG 2020 dataset to experiment with its performance on Data-to-Text tasks. We also want to compare the performance of the original T5 model and the modified BERTScore T5 model to see if the advantage of BERTScore is still preserved as a learning loss.
.
├── LICENSE
├── README.md
├── datasets
│ ├── dev
│ ├── test
│ └── train
├── scripts
└── src
The dataset is Web NLG 2020 (v3.0)
The WebNLG corpus comprises sets of triplets describing facts (entities and relations between them) and the corresponding facts in the form of natural language text. The corpus contains sets with up to 7 triplets each along with one or more reference texts for each set.
To read the XML files, WebNLG provides corpus-reader to read the triples and sentences.
All the other dependencies were installed in the docker image directly.
The running environment are encapsulated in the docker image. Follow the steps below:
- Prepare the repository with the structure in Repo Structure section.
- Build the docker image by running
sudo ./scripts/build_docker.sh
in theBERT_score_T5/
directory. - Run the docker image by using
sudo ./scripts/run_docker.sh
.
The above steps will create a docker image and run the docker image with
BERT_score_T5/
repository mounted to the docker volume. To learn how to
customize the Docker image, checkout:
- Run
bash scripts/start_jupyter.sh
after logged in docker image. - Follow the instructions in terminal and copy paste the link in browser.
- If you are using local machine, replace
hostname
withlocalhost
. - If you are using remote machine, use
ssh -N -f -L localhost:<remote_port>:localhost:<local_port> <remote_user_name>@<remote_ip>
, and don't forget to replace the information in<>
.
- If you are using local machine, replace
It saves time. You don't have to worry about installing drivers and dependencies anymore thanks to Nvidia.
This project is licensed under the MIT License - see the LICENSE file for details.