This Project includes 1 executable file and 2 output files. The description is as follows:
- spam-classifier-optimized.ipynb - A Jupyter Notebook consisiting of all the functions required for training, testing and classification of the emails. Includes all functions required for classification operations.
- evaluation.txt - Contains evaluation results table as well as Confusion Matrix of Spam and Ham classes.
- spam_classifier_best.sav - Contains the weights of the most optimized model.
- confusion_matrix.png - Confusion Matrix of the final result.
The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. It is based on a slightly different artificial neuron called a threshold logic unit (TLU), or sometimes a linear threshold unit (LTU). The inputs and output are numbers (instead of binary on/off values), and each input connection is associated with a weight. The TLU computes a weighted sum of its inputs (z = w1x1 +... + wnxn = xTw), then applies a step function to that sum and outputs the result: hw(x) = step(z), where z = xTw.
An MLP is composed of one (passthrough) input layer, one or more layers of TLUs, called hidden layers, and one final layer of TLUs called the output layer. The layers close to the input layer are usually called the lower layers, and the ones close to the outputs are usually called the upper layers. Every layer except the output layer includes a bias neuron and is fully connected to the next layer.
Every stage described here has been followed in the attached Jupyter Notebook.- Download the dataset.
- Prepare the data
- Remove all the email headers(like sender details, receiver details, subject, and date)
- Convert the whole email into lowercase
- Replace all the url's present with the word 'URL' in email
- Replace all the numbers present with the word 'NUM' in email
- Remove all the punctuations present in email
- Split it into two sets - test and train.
- Convert the resulting text into bag-of-words representation (vector of counts of all words that appears in the training instance)
- Train and evaluate the MLP model on recall, precision and ROC
- Fine-tune the MLP classifier
- Evaluate it on the test set
- NumPy v1.16.2
- Scikit-Learn v0.20.3
- Matplotlib v3.0.2
- Joblib v0.13.2
Harsh Singh Jadon