Skip to content

Latest commit

 

History

History
26 lines (19 loc) · 1.01 KB

README.md

File metadata and controls

26 lines (19 loc) · 1.01 KB

Bias-Classification

Dataset

In this project, the dataset used is Jigsaw Unintended Bias in Toxicity Classification. It is available on Kaggle (https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data)

Data Files

  • train.csv
  • test.csv

Notebooks

In this Project, Gender is chosen as a basic genre for Identifing Bias.

  • Data_Preparation.ipynb: In this ipython notebook, we prepare data so that it can be used in BERT_Data-Classification and hence let us understand about bias.

  • BERT_Data-Classification.ipynb: In this notebook, we perform text classification by fine-tuning a BERT-based model.

  • bias-toxicity-classification.ipynb: In this notebook, toxicity Classification using Logistic Regression and using LSTM architecture.

Strategy:

Importing libraries
Data Cleaning
Exploratory Data Analysis
Data Splitting
Using Logistic Regression
Using LSTM - Single LSTM layer architecture
Comparing AUC/Designed Metrics AUC