- Our OpenReview paper submission to the challenge can be found here
If you don't already, its a good idea to install the package into a virtual environment
python3 -m venv my_env
source ./my_env/bin/activate
Then you can install the package via pip:
pip install reformer-fastai
This project used nbdev for all development, see their docs here to install nbdev and get started. Once you have nbdev installed we suggest you follow the suggested contributor workflow
A pip installed version of this library is needed to run experiments. All experiments are run using the run_exp
command, followed by the particular task name and then the parameters related to that task. run_exp --help
will display a list of all parameters as well as a brief description. For brevity, an example of how to run a Reformer Language Model experiment is show below, a list of all experiment commands can be found here
Below is an example of the code used that generated the results in Section 4.4 "Effect of reversible layers" of our submission paper.
run_exp "lm_rev" \
--n_epochs=10 \
--bs=2 \
--max_seq_len=4096 \
--grad_accum=8 \
--save_model=True \
--clip=0.5 \
--seed=444 \
--precision=2 \
--do_wandb_logging=False \
The main hyperparameters used are documented in the Experiment Commands page and the Experiment Configs page.
All full description of our results, including charts and tables can be found in our paper here on OpenReview. Our results are summarised as follows:
Claims around speed on longer sequences and reduced memory footprint were validated; as sequence length increased, Locality Sensitive Hashing ("LSH") Attention became faster and increasing the number of hashes improved performance. We could not achieve the performance of a traditional Transformer with Reformer. Some experiments were not run for as long as in the paper due to a lack of computational resources. Potentially the under-performance of our Reformer may be due to under-training, implementation differences or nuances in JAX vs Pytorch. Also, exploding gradients were encountered with mixed precision training and several model settings were found to be unstable depending on the random seed or learning rate.
- Reformer Paper
- Authors ICLR video
- Google Blog
- Authors code (TRAX)
- Reformer enwik8 model and training config
- @lucidrain’s Reformer code
- HuggingFace: Reformer source code
- HuggingFace: Reformer notebook example
- HuggingFace: long sequences
- HuggingFace: Pretraining
Tokenizers used with these datasets can be found here
enwik8
- enwik8.zip, raw data, 100mb
- Tensor2Tensor enwik8 data generator code, with train/dev/test split. File lengths:
- Train: 89,621,832
- Eval: 5,000,000
- Test: 5,000,000
- Tokenier used: ByteTextTokenizer
WMT14
- WMT on HuggingFace Datasets
- Reformer pre-trained WMT14 vocab
- Vocab size = 33300, from WMT14 model config
- Train Test split: newstest2013 for validation and newstest2014 for test, in consistence with Vaswani et al. (2017) - from https://arxiv.org/pdf/2009.02070.pdf
- Tokenizer used: SubWordTextEncoder