Note: For most of the experimentation, we adapted various official github repos. You can find them in Reference section. That being said below are the scripts that we either wrote from scratch or adapted an existing script and upgraded it.
GenderBiasAnalysis/TinyBERT/FT_Bert_Classification.py
GenderBiasAnalysis/TinyBERT/task_distill.py
GenderBiasAnalysis/TinyBERT/bias_analysis.py
GenderBiasAnalysis/TinyBERT/result.ipynb
GenderBiasAnalysis/imdbtests/res_data/IMDB_data_preparation_script.py
GenderBiasAnalysis/imdbtests/rate.py
GenderBiasAnalysis/imdbtests/res_plots/biases.ipynb
GenderBiasAnalysis/imdbtests/res_plots/tables.ipynb
GenderBiasAnalysis/TinyBERT/seat_analysis.ipynb
GenderBiasAnalysis/TinyBERT/seat_analysis.py
GenderBiasAnalysis/TinyBERT/seat_bert_encoder.ipynb
Note: We used bert base uncased as our teacher model for experimentation. You can find the github repo of this project at [https://github.com/HemaDevaSagar35/GenderBiasAnalysis]. Also we are sharing google drive link, below, where we uploaded our models and data (that are too huge to put in canvas or github)
There are 2 facets here:
- Training Bert and TinyBert models on MLMA and IMDB datasets
- Doing various bias analysis with the models obtained from step 1.
You could Skip step 1 and run scripts responsible for Step 2 using the models we generated from our experiments. You can find the models we generated and relevent data here [https://drive.google.com/drive/folders/1XmLXSMbYAur1mZfqGJfmUaTQGa8BuX1S?usp=share_link]
But incase you want to run Step 1 and re-generate the models, below are the things you have to do:
- Go to GenderBiasAnalysis/TinyBERT/ and run
pip install -r requirements.txt
- Download GloVe embedding from here [https://nlp.stanford.edu/data/glove.6B.zip]
- Unzip it in GenderBiasAnalysis/TinyBERT/embeddings folder
- Download bert-base-uncased folder from [https://drive.google.com/drive/folders/1XmLXSMbYAur1mZfqGJfmUaTQGa8BuX1S?usp=share_link] and place it in GenderBiasAnalysis/TinyBERT/
- Download tinybert-gkd-model folder from [https://drive.google.com/drive/folders/1XmLXSMbYAur1mZfqGJfmUaTQGa8BuX1S?usp=share_link] and place it in GenderBiasAnalysis/TinyBERT/
- Download the glue_data folder from [https://drive.google.com/drive/folders/1XmLXSMbYAur1mZfqGJfmUaTQGa8BuX1S?usp=share_link] and place them in GenderBiasAnalysis/data/
- Change the working directory to GenderBiasAnalysis/TinyBERT/
- Fine Tune Bert on MLMA dataset with the following command
python FT_Bert_Classification.py --data_dir ../data/glue_data/MLMA \
--pre_trained_bert bert-base-uncased \
--task_name MLMA \
--do_lower_case \
--output_dir output_models \
--num_train_epochs 30
- Do intermediate distillation of TinyBERT on MLMA using the following command
python task_distill.py --teacher_model output_models \
--student_model tinybert-gkd-model \
--data_dir ../data/glue_data/MLMA \
--task_name MLMA \
--output_dir tiny_temp_model \
--max_seq_length 128 \
--train_batch_size 32 \
--num_train_epochs 20 \
--aug_train \
--do_lower_case
- Do prediction layer distillation of TinyBERT on MLMA using the following command
python task_distill.py --pred_distill \
--teacher_model output_models \
--student_model tiny_temp_model \
--data_dir ../data/glue_data/MLMA \
--task_name MLMA \
--output_dir tinybert_model \
--aug_train \
--do_lower_case \
--learning_rate 3e-5 \
--num_train_epochs 3 \
--max_seq_length 64 \
--train_batch_size 32
- Change the working directory to GenderBiasAnalysis/TinyBERT/
- Fine tune bert on IMDB dataset using the following comamand
python FT_Bert_Classification.py --data_dir ../data/glue_data/IMDB \
--pre_trained_bert bert-base-uncased \
--task_name IMDB \
--do_lower_case \
--output_dir imdb_output_models \
--num_train_epochs 30
- Do intermediate distillation of TinyBERT on IMDB using the following command
python task_distill.py --teacher_model imdb_output_models \
--student_model tinybert-gkd-model \
--data_dir ../data/glue_data/IMDB \
--task_name IMDB \
--output_dir tiny_temp_imdb_model \
--max_seq_length 128 \
--train_batch_size 32 \
--num_train_epochs 20 \
--do_lower_case
- Do prediction layer distillation of TinyBERT on IMDB using the following command
python task_distill.py --pred_distill \
--teacher_model imdb_output_models \
--student_model tiny_temp_imdb_model \
--data_dir ../data/glue_data/IMDB \
--task_name IMDB \
--output_dir tinybert_imdb_model \
--do_lower_case \
--learning_rate 3e-5 \
--num_train_epochs 3 \
--max_seq_length 64 \
--train_batch_size 32
Like mentioned at the start, you can run step 2 either using our models directly or by first running Step 1 and re-generating the models. For convinience we are categorizing based on the analysis we did
If you want to run this analysis in step 2 directly using our models, please do the following pre-requisites first
- Download the folder output_models, tinybert_model, imdb_output_models, tinybert_imdb_model from here [https://drive.google.com/drive/folders/1XmLXSMbYAur1mZfqGJfmUaTQGa8BuX1S?usp=share_link] and place them in GenderBiasAnalysis/TinyBERT/
- Download the glue_data folder from [https://drive.google.com/drive/folders/1XmLXSMbYAur1mZfqGJfmUaTQGa8BuX1S?usp=share_link] and place them in GenderBiasAnalysis/data/ NOTE: The above are required if you want to run this analysis in Step 2 directly using our models.
Now following are the codes you have to run to get all the results under this analysis in Step 2.
Run the ipython notebook named GenderBiasAnalysis/TinyBERT/analysis.ipynb. This ipython notebook is self explanatory and does all the analysis that was presented in the report and presentation slides.
First, place the models tinybert_imdb_model and imdb_output_models in the "GenderBiasAnalysis/imdbtests/res_models/models" folder (download them from here https://drive.google.com/drive/folders/1XmLXSMbYAur1mZfqGJfmUaTQGa8BuX1S) rename them as 'IMDB_tinybert_original' and 'imdb_bertbase_original' respectively.
Change the working directory to GenderBiasAnalysis/ Then run the following command
python imdbtests/res_data/IMDB_data_preparation_script.py | tee imdbtests/data_prep.txt
Then run the following command to get your bias calculations for specs in GenderBiasAnalysis/imdbtests/res_results folder
python -c 'import imdbtests.rate; rate.rate()'
After this , you see the results in GenderBiasAnalysis/imdbtests/res_results folder
use GenderBiasAnalysis/imdbtests/res_plots/biases.ipynb and GenderBiasAnalysis/imdbtests/res_plots/tables.ipynb for consolidating and getting results for biases for model of your choice in table and picture format
Run the 2 steps mentioned in Unintended Bias section. Run the ipython notebook named GenderBiasAnalysis/TinyBERT/seat_analysis.ipynb. Notebook is self explanatory. SEAT test, results and plot can be found in SEAT folder
Follow steps 1 and 2 mentioned in Unintended Bias section. To run log probability bias tests use the following command
First, Change the working directory to GenderBiasAnalysis/
python TinyBERT/log_probability_bias_analysis.py
--eval Log_Probability_Bias/Corpus_Creation/BEC-Pro/BEC-Pro_EN.tsv
--model [MODEL PATH]
--out Log_Probability_Bias/results/[SAMPLE NAME].csv
MODEL PATH for the 4 models would be TinyBERT/output_models, TinyBERT/imdb_output_models, TinyBERT/tinybert_model, TinyBERT/tinybert_imdb_model. To run for all the four models, replace MODEL PATH with each model path individually and then run. Also replace [SAMPLE NAME].csv, with the name you want to store the file with.
Sample command:
python TinyBERT/log_probability_bias_analysis.py
--eval Log_Probability_Bias/Corpus_Creation/BEC-Pro/BEC-Pro_EN.tsv
--model TinyBERT/tinybert_imdb_model
--out Log_Probability_Bias/results/tinybert_result.csv
Results and other resources related to log probability tests can be found in Log_Probability_Bias folder
First, change the working directory to GenderBiasAnalysis/categorical_bias_score/ Then do
pip install -r requirements.txt
Evaluation script Add the pretrained model using the below command and change the name of the bert_config file to config.json as the code is testing using the open source transformers
categorical_score.py --lang en --custom_model_path [MODEL PATH]
https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT
https://github.com/sciphie/bias-bert
https://github.com/W4ngatang/sent-bias
https://github.com/marionbartl/gender-bias-BERT\
https://github.com/jaimeenahn/ethnic_bias