Skip to content

stackaway/Crime_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crime Classification

OBJECTIVE

The objective of this project is to develop robust machine learning models for classifying textual data into categories of 'Sexist' or 'Racist'. Leveraging Natural Language Processing (NLP) techniques and supervised learning algorithms, our objective is to build models that can accurately discern and categorize text-based content, empowering users to detect instances of sexism and racism within textual data.

IDEA

Given the diverse nature of the provided dataset, which encompassed various cyber crimes, our focus was on the twitter_sexism_parsed_dataset.csv and twitter_racism_parsed_dataset.csv files. We trained separate Long Short-Term Memory (LSTM) networks on these datasets, performing tokenization and lemmatization as preprocessing steps. After training, we saved the trained models.

Implementation

We created sexism_classifier.ipynb and racism_classifier.ipynb notebooks for training and saving the LSTM models for sexism and racism classification respectively. These notebooks should be run first to save the trained models. Then, the crime_classification.ipynb notebook can be executed to load these pretrained models and classify new text data.

Versions

  • TensorFlow version: 2.16.1
  • Pandas version: 2.2.1
  • NumPy version: 1.23.5
  • NLTK version: 3.8.1
  • Keras version: 3.0.5

Model Performance

  • sexism_classifier.ipynb has attained an accuracy of 0.8572 and a loss of 0.4666.
  • racism_classifier.ipynb has attained an accuracy of 0.9081 and a loss of 0.3188.

Output Image

After running crime_classification.ipynb, the following output image is obtained: Output Image

Conclusion

Through the development and implementation of the crime classification system, we have successfully demonstrated the effectiveness of utilizing machine learning techniques for identifying instances of sexism and racism within textual data. The achieved accuracies and model performance metrics underscore the potential of such approaches in addressing and combating cyber crimes involving hate speech and discriminatory content.

Churnika S Mundas

LinkedIn GitHub

About

classifies the messages into different categories

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published