This repo contains the code for ASC20-21 LE
BERT for RACE with pytorch-lightning and transformer
Implemented DCMN (reference code) and DUMA
This repo is for experimental purposes. In order to achieve the best performance on distributed systems, we ejected the code from pytorch-lightening to native pytorch and changed the model from the one implemented by huggingface to Nvidia's.
For the ejected version, check out bert-race-nvidia.
.
├── data
│ ├── RACE
│ │ ├── dev
│ │ ├── test
│ │ └── train
│ ├── RACEDataModule.py
│ ├── RACEDataModuleForALBERT.py
│ └── RACELocalLoader.py
├── model
│ ├── bert-large-uncased
│ │ ├── config.json
│ │ ├── pytorch_model.bin
│ │ └── vocab.txt
│ ├── ALBERTForRace.py
│ ├── BertForRace.py
│ ├── BertLongAttention.py
│ ├── BertPooler.py
│ ├── CheckptEnsemble.py
│ ├── DCMNForRace.py
│ ├── DUMAForRace.py
│ ├── FuseNet.py
│ └── SSingleMatchNet.py
├── plugins
│ ├── ApexDDP.py
│ └── ApexDDPAccelerator.py
├── result
│ └── asc01
├── hp_optimize.py
├── train.py
├── predict.py
├── README.md
├── requirements.txt
└── LICENSE
Please put the data and pre-trained model into data
and model
as above.
pip install -r requirements.txt
You need to install apex
separately
scl enable devtoolset-9 bash
conda activte [env]
# then compile and install apex and other modules
Install
horovod
HOROVOD_NCCL_LIB=/usr/lib64/ HOROVOD_NCCL_INCLUDE=/usr/include/ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL HOROVOD_NCCL_LINK=SHARED pip install --no-cache-dir horovod
- PyTorch Lightening
- Transformer
- Refactor RACE Dataset Loader
- Use
datasets
fromtransformer
- Better Interface and Format
- Faster Data Loading (Using rust & multi-process)
- Cache Tokenized Results
- Custom Datasets (Local loading)
- Use
- Mix Precision Training (Apex)
- TensorBoard Logging
- Change Log Dir
- Text Logging (Should be same as baseline code,
override pl original progress bar, will be done after ejection) -
Argparse(Not that important) - Inference & Answer Saving
- Hyper Parameter Tuning (Optuna)
- More parameters (Will be done in ejection)
- Parallelism
- FairShare
-
DeepSpeed(Unstable)
- Distributed (Will be done after ejection)
- DDP
-
Apex DDP(Given up) -
Apex + Horovod(Given up)
-
Cross Validation(Useless) -
Data Augmentation(Useless) - Model Tweak
- DCMN (Bad test result (acc around 60 only, far lower than the paper's result) && buggy now, I'm not going to debug it anymore, if anyone wants to use it, please checkout a working commit #1df19a5)
- DUMA
- Sentence Selection (Bad result)
- Sliding Window (Bad result)
- Rouge Score (small improvement on short sequences)
-
Use features from previous layers(Useless)
- Model Ensemble (Buggy, will be done after ejection)
-
Find Best Seed(Useless, there will be new datasets and pre-trained model on-site) - Further Speedup Training Process
-
LongFormer(Seems useless) -
Nvidia Bert(Will be done in ejection)
-