Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models, including:
- Convolutional Neural Networks (CNN)
- Long Short-Term Memory (LSTM) networks
- Transformer (self-attention) networks
Fairseq features:
- multi-GPU (distributed) training on one machine or across multiple machines
- fast beam search generation on both CPU and GPU
- large mini-batch training even on a single GPU via delayed updates
- fast half-precision floating point (FP16) training
- extensible: easily register new models, criterions, and tasks
We also provide pre-trained models for several benchmark translation and language modeling datasets.
- A PyTorch installation
- For training new models, you'll also need an NVIDIA GPU and NCCL
- Python version 3.6
Currently fairseq requires PyTorch version >= 0.4.0. Please follow the instructions here: https://github.com/pytorch/pytorch#installation.
If you use Docker make sure to increase the shared memory size either with
--ipc=host
or --shm-size
as command line options to nvidia-docker run
.
After PyTorch is installed, you can install fairseq with:
pip install -r requirements.txt
python setup.py build develop
The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.
We provide the following pre-trained models and pre-processed, binarized test sets:
Description | Dataset | Model | Test set(s) |
---|---|---|---|
Convolutional (Gehring et al., 2017) |
WMT14 English-French | download (.tar.bz2) | newstest2014: download (.tar.bz2) newstest2012/2013: download (.tar.bz2) |
Convolutional (Gehring et al., 2017) |
WMT14 English-German | download (.tar.bz2) | newstest2014: download (.tar.bz2) |
Convolutional (Gehring et al., 2017) |
WMT17 English-German | download (.tar.bz2) | newstest2014: download (.tar.bz2) |
Transformer (Ott et al., 2018) |
WMT14 English-French | download (.tar.bz2) | newstest2014 (shared vocab): download (.tar.bz2) |
Transformer (Ott et al., 2018) |
WMT16 English-German | download (.tar.bz2) | newstest2014 (shared vocab): download (.tar.bz2) |
Transformer (Edunov et al., 2018; WMT'18 winner) |
WMT'18 English-German | download (.tar.bz2) | See NOTE in the archive |
Description | Dataset | Model | Test set(s) |
---|---|---|---|
Convolutional (Dauphin et al., 2017) |
Google Billion Words | download (.tar.bz2) | download (.tar.bz2) |
Convolutional (Dauphin et al., 2017) |
WikiText-103 | download (.tar.bz2) | download (.tar.bz2) |
Description | Dataset | Model | Test set(s) |
---|---|---|---|
Stories with Convolutional Model (Fan et al., 2018) |
WritingPrompts | download (.tar.bz2) | download (.tar.bz2) |
Generation with the binarized test sets can be run in batch mode as follows, e.g. for WMT 2014 English-French on a GTX-1080ti:
$ curl https://s3.amazonaws.com/fairseq-py/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf - -C data-bin
$ curl https://s3.amazonaws.com/fairseq-py/data/wmt14.v2.en-fr.newstest2014.tar.bz2 | tar xvjf - -C data-bin
$ python generate.py data-bin/wmt14.en-fr.newstest2014 \
--path data-bin/wmt14.en-fr.fconv-py/model.pt \
--beam 5 --batch-size 128 --remove-bpe | tee /tmp/gen.out
...
| Translated 3003 sentences (96311 tokens) in 166.0s (580.04 tokens/s)
| Generate test with beam=5: BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
# Scoring with score.py:
$ grep ^H /tmp/gen.out | cut -f3- > /tmp/gen.out.sys
$ grep ^T /tmp/gen.out | cut -f2- > /tmp/gen.out.ref
$ python score.py --sys /tmp/gen.out.sys --ref /tmp/gen.out.ref
BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
- Facebook page: https://www.facebook.com/groups/fairseq.users
- Google group: https://groups.google.com/forum/#!forum/fairseq-users
If you use the code in your paper, then please cite it as:
@inproceedings{gehring2017convs2s,
author = {Gehring, Jonas, and Auli, Michael and Grangier, David and Yarats, Denis and Dauphin, Yann N},
title = "{Convolutional Sequence to Sequence Learning}",
booktitle = {Proc. of ICML},
year = 2017,
}
fairseq(-py) is BSD-licensed. The license applies to the pre-trained models as well. We also provide an additional patent grant.
This is a PyTorch version of fairseq, a sequence-to-sequence learning toolkit from Facebook AI Research. The original authors of this reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam Gross.