Table to text

This project provides the demo of table-to-text (infobox-to-biography) based on this paper.

Run the demo

Setting up the environment

conda env create -f environment.yml
conda activate table_to_txt

Run the demo web page locally

python app.py

The page will be running on port 5000.

Open http://0.0.0.0:5000/ and test your input.

Input table:

Output summary:

Training the model yourself

wiki2bio

This project provides the implementation of table-to-text (infobox-to-biography) generation, taking the structure of a infobox for consideration.

Details of table-to-text generation can be found here. The implementation is based on Tensorflow 1.1.4 and Python 2.7.

In this demo the pretrained model is provided by us. You can also train the model yourself following the steps below.

Model Overview

wiki2bio is a natural language generation task which transforms Wikipedia infoboxes to corresponding biographies. We encode the structure of an infobox by taking field type and position information into consideration.

In the encoding phase, we update the cell memory of the LSTM unit by a field gate and its corresponding field value in order to incorporate field information into table representation. In the decoding phase, dual attention mechanism which contains word level attention and field level attention is proposed to model the semantic relevance between the generated description and the table.

Installation

We strongly recommended using GPUs to train the model. It takes about 64 hours to finish training on a GTX1080Ti GPU and get a decent result.

The implementation is based on Tensorflow 1.1.4.

Data

The dataset for evaluation is WIKIBIO from Lebret et al. 2016. We preprocess the dataset in a easy-to-use way.

The original_data we proprocessed can be downloaded via Google Drive or Baidu Yunpan.

original_data
training set: train.box; train.summary
testing set:  test.box; test.summary
valid set:    valid.box; valid.summary
vocabularies: word_vocab.txt; field_vocab.txt

*.box in the original_data is the infoboxes from Wikipedia. One infobox per line.

*.summary in the original_data is the biographies corresponding to the infoboxes in *.box. One biography per line.

word_vocab.txt and field_vocab.txt are vocabularies for words (20000 words) and field types (1480 types), respectively.

The whole dataset is divided into training set (582,659 instances, 80%), valid set (72,831 instances, 10%) and testing set (72,831 instances, 10%).

Usage

preprocess

Firstly, we extract words, field types and position information from the original infoboxes *.box. After that, we idlize the extracted words and field type according to the word vocabulary word_vocab.txt and field vocabulary field_vocab.txt.

python preprocess.py

After preprocessing, the directory structure looks like follows:

-original_data
-processed_data
  |-train
    |-train.box.pos
    |-train.box.rpos
    |-train.box.val
    |-train.box.lab
    |-train.summary.id
    |-train.box.val.id
    |-train.box.lab.id
  |-test
    |-...
  |-valid
    |-...
-results
  |-evaluation
  |-res

*.box.pos, *.box.rpos, *.box.val, *.box.lab represents the word position p+, word position p-, field content and field types, respectively.

Experiment results will be stored in the results/res directory.

train

For training, turn the "mode" in Main.py to train:

tf.app.flags.DEFINE_string("mode",'train','train or test')

Then run Main.py:

python Main.py

In the training stage, the model will report BLEU and ROUGE scores on the valid set and store the model parameters after certain training steps. The detailed results will be stored in the results/res/CUR_MODEL_TIME_STAMP/log.txt.

test

For testing, turn the "mode" in Main.py to test and the "load" to the selected model directory:

tf.app.flags.DEFINE_string("mode",'test','train or test')
tf.app.flags.DEFINE_string("load",'YOUR_BEST_MODEL_TIME_STAMP','load directory')

Then test your model by running:

python Main.py

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
1593400603919		1593400603919
ROUGE		ROUGE
doc		doc
original_data		original_data
processed_data/test		processed_data/test
results		results
static		static
templates		templates
web_demo		web_demo
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
AttentionUnit.py		AttentionUnit.py
AttentionUnit.pyc		AttentionUnit.pyc
DataLoader.py		DataLoader.py
DataLoader.pyc		DataLoader.pyc
LstmUnit.py		LstmUnit.py
LstmUnit.pyc		LstmUnit.pyc
Main.py		Main.py
OutputUnit.py		OutputUnit.py
OutputUnit.pyc		OutputUnit.pyc
PythonROUGE.py		PythonROUGE.py
PythonROUGE.pyc		PythonROUGE.pyc
README.md		README.md
SeqUnit.py		SeqUnit.py
SeqUnit.pyc		SeqUnit.pyc
app.py		app.py
dualAttentionUnit.py		dualAttentionUnit.py
dualAttentionUnit.pyc		dualAttentionUnit.pyc
environment.yml		environment.yml
fgateLstmUnit.py		fgateLstmUnit.py
fgateLstmUnit.pyc		fgateLstmUnit.pyc
preprocess.py		preprocess.py
preprocess.pyc		preprocess.pyc
test_input.ipynb		test_input.ipynb
util.py		util.py
util.pyc		util.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table to text

Run the demo

Setting up the environment

Run the demo web page locally

Training the model yourself

wiki2bio

Model Overview

Installation

Data

Usage

preprocess

train

test

About

Releases

Packages

Languages

jkooy/Table_to_text_web

Folders and files

Latest commit

History

Repository files navigation

Table to text

Run the demo

Setting up the environment

Run the demo web page locally

Training the model yourself

wiki2bio

Model Overview

Installation

Data

Usage

preprocess

train

test

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages