DRMOE: Towards Better Mixture of Experts via Dual Routing Strategy

This is the implementation of the paper "DRMOE: Towards Better Mixture of Experts via Dual Routing Strategy".

Running

You can implement out model according to the following steps:

Install the necessary packages. Run the command:
```
pip install -r requirements.txt
```
To train the DRMOE and generate the answer to test, please run the command:
```
bash ./experiments/train.sh
```
Finally, you can get the answer at results/eval_metric.csv.

To ease the configuration of the environment, I list versions of my hardware and software equipments:

Hardware:
- GPU: RTX 3090
- Cuda: 11.3.1
- Driver version: 535.146.02
Software:
- Python: 3.9.5
- Pytorch: 1.12.0+cu113
- transformers: 4.28.1
- deepspeed: 0.9.4

You can also visit this link to get the tar.gz file containing the complete virtual environment.

Method	CMeIE	CHIP-CDN	CHIP-CDEE	CHIP-MDCFNPC	CHIP-CTC	KUAKE-QIC	IMCS-V2-MRG	MedDG	Average
DRMOE	0.4675	0.8247	0.5622	0.7813	0.8927	0.8597	0.3771	0.1126	0.6097
w/o TSL&DR	0.4497	0.8229	0.5545	0.7810	0.8645	0.8563	0.3608	0.1093	0.5999
w/o TSL	0.4784	0.8355	0.5668	0.7729	0.8836	0.8021	0.3630	0.1135	0.6020
w/o DR	0.4655	0.8168	0.5685	0.7736	0.8864	0.8587	0.3805	0.1138	0.6080

These additional ablation experiments further demonstrate the effectiveness of the proposed module in our study.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
.vscode		.vscode
datasets		datasets
experiments		experiments
resources		resources
results		results
src		src
LICENSE		LICENSE
README.md		README.md
debug.json		debug.json
requirements.txt		requirements.txt
run_mlora.py		run_mlora.py
training_args.json		training_args.json