Merge pull request #564 from CoderOMaster/main

TOXIC COMMENT ANALYSIS
abhisheks008 · Feb 2, 2024 · 931a47b · 931a47b
2 parents adc341e + f976f87
commit 931a47b
Show file tree

Hide file tree

Showing 7 changed files with 3,643 additions and 0 deletions.
diff --git a/Toxic Comment Analysis/Dataset/EI-reg-En-anger-train.txt b/Toxic Comment Analysis/Dataset/EI-reg-En-anger-train.txt
diff --git a/Toxic Comment Analysis/Images/1.png b/Toxic Comment Analysis/Images/1.png
diff --git a/Toxic Comment Analysis/Images/2.png b/Toxic Comment Analysis/Images/2.png
diff --git a/Toxic Comment Analysis/Images/3.png b/Toxic Comment Analysis/Images/3.png
diff --git a/Toxic Comment Analysis/Models/Model.ipynb b/Toxic Comment Analysis/Models/Model.ipynb
diff --git a/Toxic Comment Analysis/README.md b/Toxic Comment Analysis/README.md
@@ -0,0 +1,66 @@
+# TOXIC COMMENT ANALYSIS
+
+## GOAL
+Develop a machine learning model to tell whether a comment is toxic or not
+
+## DATASET
+Explore https://www.kaggle.com/datasets/devkhant24/toxic-comment
+
+## MODELS USED
+- Naive Bayes
+- Random Forest
+- Catboost
+- Decision Tree
+- Bidirectional LSTM
+- RNN
+- Logistic Regression
+
+## LIBRARIES
+- Pandas
+- Numpy
+- TensorFlow
+- Seaborn
+- Matplotlib
+- Scikit-Learn
+- OS
+- Re
+- Math
+- Beautiful Soup
+- NLTK
+- Spacy
+
+## IMPLEMENTATION
+1. Loaded Dataset
+2. Converted into standard csv file and renamed columns for ease.
+3. Implemented cleaning and preprocessing to remove any emojis,symbols,links,etc
+4. Classified toxic comment on the basis if intensity of angered comment > 0.55 then its toxic.
+5. Implement tokenization for sequence conversion.
+6. Trained models with various algorithms.
+
+## Models and Accuracies
+
+| Model             | Accuracy   | 
+| ----------------- |:----------:| 
+| Naive Bayes       | 0.77       |                    
+| Random Forest     | 0.76       |                    
+| Catboost          | 0.74       |                    
+| Logistic Regression| 0.77      | 
+| Decision Tree      | 0.73      |
+| RNN                | 0.69      |
+| Bidirectional LSTM | 0.68      |
+
+**VISUALISATION**
+
+![Alt Text](./Images/1.png)
+
+![Alt Text](./Images/2.png)
+
+![Alt Text](./Images/3.png)
+
+**CONCLUSION**
+
+Naive Bayes and Logistic Regression Model have the best accuracy in detecting toxicity of a comment
+
+**NAME**
+
+Keshav Arora
diff --git a/Toxic Comment Analysis/requirements.txt b/Toxic Comment Analysis/requirements.txt
@@ -0,0 +1,4 @@
+pandas==1.3.3
+matplotlib==3.4.3
+numpy==1.21.2
+catboost==1.2.2