pNLP-Mixer: an Efficient all-MLP Architecture for Language
Implementation of pNLP-Mixer in PyTorch and PyTorch Lightning.
pNLP-Mixer is the first successful application of the MLP-Mixer architecture in NLP. With a novel embedding-free projection layer, pNLP-Mixer shows performance comparable to transformer-based models (e.g. mBERT, RoBERTa) with significantly smaller parameter count and no expensive pretraining procedures.
- Python >= 3.6.10
- PyTorch >= 1.8.0
- PyTorch Lightning >= 1.4.3
- All other requirements are listed in the
requirements.txt
file.
Please check configuration examples and also comments in the cfg
directory.
python projection.py -v VOCAB_FILE -c CFG_PATH -g NGRAM_SIZE -o OUTPUT_FILE
VOCAB_FILE
: path to the vocab file that containsCFG_PATH
: path to the configurations fileNGRAM_SIZE
: size of n-grams used during hashingOUTPUT_FILE
: path where the resulting.npy
file will be stored
python run.py -c CFG_PATH -n MODEL_NAME -m MODE -p CKPT_PATH
CFG_PATH
: path to the configurations fileMODEL_NAME
: model name to be used for pytorch lightning loggingMODE
:train
ortest
(default:train
)CKPT_PATH
: (optional) checkpoint path to resume training from or to use for testing
The checkpoints used for evaluation are available here.
Model Size | Reported | Ours |
---|---|---|
pNLP-Mixer X-Small | 76.9% | 79.3% |
pNLP-Mixer Base | 80.8% | 79.4% |
pNLP-Mixer X-Large | 82.3% | 82.1% |
Model Size | Reported | Ours |
---|---|---|
pNLP-Mixer X-Small | 90.0% | 91.3% |
pNLP-Mixer Base | 92.1% | 92.8% |
pNLP-Mixer X-Large | 91.3% | 92.9% |
* Note that the paper reports the performance on the MultiATIS dataset using a 8-bit quantized model, whereas our performance was measured using a 32-bit float model.
Model Size | Reported | Ours |
---|---|---|
pNLP-Mixer X-Small | 81.9% | 81.5% |
pNLP-Mixer Base | 78.6% | 82.2% |
pNLP-Mixer X-Large | 82.9% | 82.9% |
@article{fusco2022pnlp,
title={pNLP-Mixer: an Efficient all-MLP Architecture for Language},
author={Fusco, Francesco and Pascual, Damian and Staar, Peter},
journal={arXiv preprint arXiv:2202.04350},
year={2022}
}
- Tony Woo @ MINDsLab Inc. (shwoo@mindslab.ai)
Special thanks to:
- Hyoung-Kyu Song @ MINDsLab Inc.
- Kang-wook Kim @ MINDsLab Inc.
- 8-bit quantization