SMS_spam_prediction_ML

README

Introduction

This repository contains code for analyzing and classifying SMS messages as spam or ham (non-spam). The code is divided into different sections to perform data cleaning, exploratory data analysis (EDA), data preprocessing, and model building.

Data Cleaning: In this section, the data cleaning process is described. It involves loading the dataset, dropping unnecessary columns, and handling missing values.
EDA (Exploratory Data Analysis): This section explores the dataset to gain insights into its structure and characteristics. It includes analyzing the distribution of spam vs. ham messages, examining the distribution of message lengths, and visualizing word clouds for both spam and ham messages.
Data Preprocessing: Here, the text data is preprocessed before feeding it into the model. Steps include lowercasing, tokenization, stemming, and vectorization using TF-IDF (Term Frequency-Inverse Document Frequency) representation.
Model Building: The code for building and evaluating different Naive Bayes classifiers (Gaussian, Multinomial, Bernoulli) is provided in this section. The models are trained on the preprocessed text data and evaluated using accuracy, precision, and confusion matrix.

Usage

Data: The dataset used for this analysis is stored in a CSV file named "spam.csv".
Requirements: Ensure to have the necessary Python libraries installed, including pandas, numpy, scikit-learn, nltk, wordcloud, and seaborn.
Execution: Run the code cells in a Python environment, such as Jupyter Notebook, sequentially to execute each section and PyCharm to run app.py using streamlit.
Model Deployment: After building the model, it is saved using pickle for deployment in other applications.

Example Messages

Spam Messages:
- "Congratulations! You've won $1 million in the international lottery. Claim now!"
- "You have been selected to win a free iPhone! Click here to claim your prize."
Ham Messages:
- "Hey, sorry I missed dinner. Hope you are not mad at me."
- "Reminder: Meeting tomorrow at 2 PM in the conference room."

Performance Evaluation

The model's performance is evaluated using metrics such as accuracy, precision, and ROC curve (Receiver Operating Characteristic).
The confusion matrix provides insights into the model's performance in classifying spam and ham messages.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
app.py		app.py
model.pkl		model.pkl
sms-spam-detection (2).ipynb		sms-spam-detection (2).ipynb
spam.csv		spam.csv
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMS_spam_prediction_ML

README

Introduction

Contents

Usage

Example Messages

Performance Evaluation

About

Releases

Packages

Languages

Vasudha-01/SMS_spam_prediction_ML

Folders and files

Latest commit

History

Repository files navigation

SMS_spam_prediction_ML

README

Introduction

Contents

Usage

Example Messages

Performance Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages