A spam classifier to predict spam and ham(not spam) Emails using concepts of Machine Learning, Natural Language Processing(NLP) and Python.
Workflow of the project
- Data cleaning (using RegEx)
- Tokenization (using Word Tokenization)
- Removing Stop words
- Lemmatization (using WordNet)
- Vectorization (using TF-IDF)
- Label Encoding
- Naïve Bayes
- Random Forest
- Support Vector Machine
- k- Nearest Neighbors
- Cross Validation Scores
- Accuracy on Testing and Testing dataset
