From f976f874e880816d666647982e0e737ca890aaae Mon Sep 17 00:00:00 2001 From: Keshav Arora <119474193+CoderOMaster@users.noreply.github.com> Date: Thu, 1 Feb 2024 18:36:48 +0530 Subject: [PATCH] Create README.md --- Toxic Comment Analysis/README.md | 66 ++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 Toxic Comment Analysis/README.md diff --git a/Toxic Comment Analysis/README.md b/Toxic Comment Analysis/README.md new file mode 100644 index 000000000..4985e49d5 --- /dev/null +++ b/Toxic Comment Analysis/README.md @@ -0,0 +1,66 @@ +# TOXIC COMMENT ANALYSIS + +## GOAL +Develop a machine learning model to tell whether a comment is toxic or not + +## DATASET +Explore https://www.kaggle.com/datasets/devkhant24/toxic-comment + +## MODELS USED +- Naive Bayes +- Random Forest +- Catboost +- Decision Tree +- Bidirectional LSTM +- RNN +- Logistic Regression + +## LIBRARIES +- Pandas +- Numpy +- TensorFlow +- Seaborn +- Matplotlib +- Scikit-Learn +- OS +- Re +- Math +- Beautiful Soup +- NLTK +- Spacy + +## IMPLEMENTATION +1. Loaded Dataset +2. Converted into standard csv file and renamed columns for ease. +3. Implemented cleaning and preprocessing to remove any emojis,symbols,links,etc +4. Classified toxic comment on the basis if intensity of angered comment > 0.55 then its toxic. +5. Implement tokenization for sequence conversion. +6. Trained models with various algorithms. + +## Models and Accuracies + +| Model | Accuracy | +| ----------------- |:----------:| +| Naive Bayes | 0.77 | +| Random Forest | 0.76 | +| Catboost | 0.74 | +| Logistic Regression| 0.77 | +| Decision Tree | 0.73 | +| RNN | 0.69 | +| Bidirectional LSTM | 0.68 | + +**VISUALISATION** + +![Alt Text](./Images/1.png) + +![Alt Text](./Images/2.png) + +![Alt Text](./Images/3.png) + +**CONCLUSION** + +Naive Bayes and Logistic Regression Model have the best accuracy in detecting toxicity of a comment + +**NAME** + +Keshav Arora