Sexism custom classifier is implemented to detect sexism automatically in the field of natural language processing. In this work, the dataset from Samory et al. (2020) was used to train three models on top of four different feature sets (both individual and combinations). The aim of the experiments is to answer two research questions:
- What is the informativeness of different feature sets on the sexism detection task?
- What are the improvements on the sexism classifiers’ performance when different feature sets are introduced?
- Sentiment : The sentiment intensity provided by VADER sentiment analysis method.
- Word N-grams : Term Frequency/Inverse Frequency (TF-IDF) weights of word n-grams proviided by scikit-learn.
- Type Dependency : The type dependency relationships provided by Stanford Parser. The parser can be dowloaded from here.
- Document Embeddings : The document embeddings provided by BERT language representation model.
- Logistic Regression
- Convolutional Neural Network
- Support Vector Machine
This example code trains Logistic Regression on top of sentiment and uni-gram features. To replicate the results, run the scripts in the 'experiments/scripts.txt'
export PARAMS_FILE='experiments/params.json'
export HYPERPARAMS_FILE='experiments/hyperparams.json'
python run.py \
--params_file=$PARAMS_FILE \
--hyperparams_file=$HYPERPARAMS_FILE \
This example code obtains the pre-trained BERT embeddings.
export DATA_FILE=/path/data.csv
python run_bert_feature_extraction.py \
--data_file=$DATA_FILE \