Fact-based Text Editing

Code and Datasets for Fact-based Text Editing (Iso et al; ACL 2020).

Dataset

Datasets are created from publicly availlable table-to-text datasets. The dataset created from "webnlg" referred to as "webedit", and the dataset created from "rotowire(-modified)" referred to as the "rotoedit" data.

To extract the data, run tar -jxvf webedit.tar.bz2 to form a webedit/ directory (and similarly for rotoedit.tar.bz2).

Model overview

The model, which we call FactEditor, consists of three components, a buffer for storing the draft text and its representations, a stream for storing the revised text and its representations, and a triples for storing the triples and their representations.

FactEditor scans the text in the buffer, copies the parts of text from the buffer into the stream if they are described in the triples in the memory, deletes the parts of the text if they are not mentioned in the triples, and inserts new parts of next into the stream which is only presented in the triples.

Usage

Dependencies

The code was written for Python 3.X and requires AllenNLP.
Dependencies can be installed using requirements.txt.

Training

Set your config file path and serialization dirctory as environment variables:

export CONFIG=<path to the config file>
export SERIALIZATION_DIR=<path to the serialization_dir>

Then you can train FactEditor:

allennlp train $CONFIG \
            -s $SERIALIZATION_DIR \
            --include-package editor

For example, the following is the sample script for training the model with WebEdit dataset:

allennlp train config/webedit.jsonnet \
            -s models/webedit \
            --include-package editor

Decoding

Set the dataset you want to decode and the model checkpoint you want to use as environment variables:

export INPUT_FILE=<path to the dev/test file>
export ARCHIVE_FILE=<path to the model archive file>

Then you can decode with FactEditor:

python predict.py $INPUT_FILE \
                  $ARCHIVE_FILE \
                  --cuda_device -1

To run on a GPU, run with --cuda_device 0 (or any other CUDA devices).

To run the model with a pretrained checkpoint the development set of WebEdit data:

python predict.py ./data/webedit/dev.jsonl \
                  ./models/webedit.tar.gz \
                  --cuda_device -1

References

@InProceedings{iso2020fact,
    author = {Iso, Hayate and
              Qiao, Chao and
              Li, Hang},
    title = {Fact-based Text Editing},
    booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
    pages={171--182},
    year = {2020}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
img		img
models		models
README.md		README.md
editor.py		editor.py
predict.py		predict.py
reader.py		reader.py
requirements.txt		requirements.txt
sari_hook.py		sari_hook.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fact-based Text Editing

Dataset

Model overview

Usage

Dependencies

Training

Decoding

References

About

Languages

isomap/factedit

Folders and files

Latest commit

History

Repository files navigation

Fact-based Text Editing

Dataset

Model overview

Usage

Dependencies

Training

Decoding

References

About

Topics

Resources

Stars

Watchers

Forks

Languages