SMS Spam Detect

Try the App.

The Problem

Spam detection is an old and continuing problem. I get spam texts every day, and wonder how they got through my spam filter, given that every day I flag them and in doing so train what I believe now should be a state-of-the-art, bleeding edge spam detector since I have a Google phone. Shouldn't the filter catch what is clearly a spam SMS to me?

In this project I tackled this old problem using a small corpus (download the SMS Spam Collection from this UCI Machine Learning Repository and classical ML algorithms, aiming at explainability. I achieve 99% accuracy (see more evaluation metrics and tests in this notebook) during model evaluation, yet since the training data is small I expect this model to generalize poorly, despite all the tests.

So I deploy the model in an app to see how it does in the wild - with unseen data - to fully understand the challenge.

homepage

The App

Hosted in Heroku, the app consists of a simple homepage (above) with a form that accepts a text input and a results page (below) in which I offer a detailed look into all that goes behind the scenes to transform this text into a prediction of whether it is spam or not.

top of results page

The app is meant to demistify machine learning (or "AI" as it's commonly referred to) - since it often is but a series, however complex and probabilistic, of transformations of inputs into outputs - a text becomes a 1 or a 0.

Machines are not intelligent. As one of the founders of the field, Michael I. Jordan, expertly comments in this Lex Fridman Podcast: the "I" in "AI" is a misnomer. We have yet to fully comprehend how humans think, let understand whether machines think at all - and if so, how that might differ from how humans think.

Business Applications

This app employes both Natural Language Processing (NLP) and Supervised Machine Learning which are widely applicable to businesses in a variety of ways. The proportion of unstructured text data in the internet only grows compared to structured data such as tabular data. Text data is often found in databases sitting around untapped, as front-facing apps continuously capture open text fields with user comments.

Insights can be extracted from text using NLP and various analytic methods, whether using machine learning or using simpler designs and iterating through solutions. This project's framework for processing text and for classification can be extended and adapted to any other classification tasks involving textual data.

Acknowledgements

This journey into the fields of NLP and ML took months of learning and development of my own understanding of various inner workings of models I never ended up deploying. I am indebted to numerous tutorials and blogs I've read and watched along the way. Below is a list in order of most-to-least influential:

Data Science Dojo's Introduction To Text Analytics With R by David Langer
Aurélien Géron's Classification Notebook
Scikit-Learn's API Docs
Chayan Kathuria's tutorial Build & Deploy a Spam Classifier app on Heroku Cloud in 10 minutes!
Analytics Vidhya's Introduction to Topic Modeling and Latent Semantic Analysis
Prof. Steve Brunton's YouTube lectures on Singular Value Decomposition
Kevin Arvai's tutorial Fine Tuning a Classifier in Scikit-Learn
Cole Brendel's article Quickly Compare Multiple Models
Josh Starmer's StatQuest YouTube channel

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
custom		custom
data/5_deployment		data/5_deployment
logs		logs
static		static
templates		templates
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Procfile		Procfile
README.md		README.md
TestVariables.ipynb		TestVariables.ipynb
contractions_map.json		contractions_map.json
nltk.txt		nltk.txt
requirements.txt		requirements.txt
runtime.txt		runtime.txt
spam-detect42.py		spam-detect42.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMS Spam Detect

Try the App.

The Problem

The App

Business Applications

Acknowledgements

About

Releases

Packages

Languages

License

BigBangData/SMS_SpamDetect

Folders and files

Latest commit

History

Repository files navigation

SMS Spam Detect

Try the App.

The Problem

The App

Business Applications

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages