Spam Classifier

I have developed a spam classifier program in Python which classifies given emails as spam or ham using Multilayer Perceptron (MLP).

🌟 Overview

I used the Apache SpamAssassin public data to train and test a ML-based classification model based on Multilevel Perceptron because of their high efficacy in terms of precision and recall. If you want to run this project, you only need the dependencies (see below). No extra files are needed as the Jupyter notebook will download all the required files.

💾 Project Files Description

This Project includes 1 executable file and 2 output files. The description is as follows:

Executable Files:

spam-classifier-optimized.ipynb - A Jupyter Notebook consisiting of all the functions required for training, testing and classification of the emails. Includes all functions required for classification operations.

Result Files:

evaluation.txt - Contains evaluation results table as well as Confusion Matrix of Spam and Ham classes.
spam_classifier_best.sav - Contains the weights of the most optimized model.
confusion_matrix.png - Confusion Matrix of the final result.

📚 Multilevel Perceptron

The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. It is based on a slightly different artificial neuron called a threshold logic unit (TLU), or sometimes a linear threshold unit (LTU). The inputs and output are numbers (instead of binary on/off values), and each input connection is associated with a weight. The TLU computes a weighted sum of its inputs (z = w₁x₁ +... + w_nx_n = x^Tw), then applies a step function to that sum and outputs the result: h_w(x) = step(z), where z = x^Tw.

An MLP is composed of one (passthrough) input layer, one or more layers of TLUs, called hidden layers, and one final layer of TLUs called the output layer. The layers close to the input layer are usually called the lower layers, and the ones close to the outputs are usually called the upper layers. Every layer except the output layer includes a bias neuron and is fully connected to the next layer.

📋 Stages in development

Every stage described here has been followed in the attached Jupyter Notebook.

Download the dataset.
Prepare the data
- Remove all the email headers(like sender details, receiver details, subject, and date)
- Convert the whole email into lowercase
- Replace all the url's present with the word 'URL' in email
- Replace all the numbers present with the word 'NUM' in email
- Remove all the punctuations present in email
Split it into two sets - test and train.
Convert the resulting text into bag-of-words representation (vector of counts of all words that appears in the training instance)
Train and evaluate the MLP model on recall, precision and ROC
Fine-tune the MLP classifier
Evaluate it on the test set

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
gifs		gifs
images		images
results		results
README.md		README.md
spam-classifier-optimized.ipynb		spam-classifier-optimized.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Classifier

🌟 Overview

💾 Project Files Description

Executable Files:

Result Files:

📚 Multilevel Perceptron

📋 Stages in development

⚡ Technologies Used

📋 Dependencies

📜 Credits

About

Releases

Packages

Languages

harshsingh-24/spam-classifier

Folders and files

Latest commit

History

Repository files navigation

Spam Classifier

🌟 Overview

💾 Project Files Description

Executable Files:

Result Files:

📚 Multilevel Perceptron

📋 Stages in development

⚡ Technologies Used

📋 Dependencies

📜 Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages