-
-
Notifications
You must be signed in to change notification settings - Fork 216
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #564 from CoderOMaster/main
TOXIC COMMENT ANALYSIS
- Loading branch information
Showing
7 changed files
with
3,643 additions
and
0 deletions.
There are no files selected for viewing
1,702 changes: 1,702 additions & 0 deletions
1,702
Toxic Comment Analysis/Dataset/EI-reg-En-anger-train.txt
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# TOXIC COMMENT ANALYSIS | ||
|
||
## GOAL | ||
Develop a machine learning model to tell whether a comment is toxic or not | ||
|
||
## DATASET | ||
Explore https://www.kaggle.com/datasets/devkhant24/toxic-comment | ||
|
||
## MODELS USED | ||
- Naive Bayes | ||
- Random Forest | ||
- Catboost | ||
- Decision Tree | ||
- Bidirectional LSTM | ||
- RNN | ||
- Logistic Regression | ||
|
||
## LIBRARIES | ||
- Pandas | ||
- Numpy | ||
- TensorFlow | ||
- Seaborn | ||
- Matplotlib | ||
- Scikit-Learn | ||
- OS | ||
- Re | ||
- Math | ||
- Beautiful Soup | ||
- NLTK | ||
- Spacy | ||
|
||
## IMPLEMENTATION | ||
1. Loaded Dataset | ||
2. Converted into standard csv file and renamed columns for ease. | ||
3. Implemented cleaning and preprocessing to remove any emojis,symbols,links,etc | ||
4. Classified toxic comment on the basis if intensity of angered comment > 0.55 then its toxic. | ||
5. Implement tokenization for sequence conversion. | ||
6. Trained models with various algorithms. | ||
|
||
## Models and Accuracies | ||
|
||
| Model | Accuracy | | ||
| ----------------- |:----------:| | ||
| Naive Bayes | 0.77 | | ||
| Random Forest | 0.76 | | ||
| Catboost | 0.74 | | ||
| Logistic Regression| 0.77 | | ||
| Decision Tree | 0.73 | | ||
| RNN | 0.69 | | ||
| Bidirectional LSTM | 0.68 | | ||
|
||
**VISUALISATION** | ||
|
||
![Alt Text](./Images/1.png) | ||
|
||
![Alt Text](./Images/2.png) | ||
|
||
![Alt Text](./Images/3.png) | ||
|
||
**CONCLUSION** | ||
|
||
Naive Bayes and Logistic Regression Model have the best accuracy in detecting toxicity of a comment | ||
|
||
**NAME** | ||
|
||
Keshav Arora |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
pandas==1.3.3 | ||
matplotlib==3.4.3 | ||
numpy==1.21.2 | ||
catboost==1.2.2 |