Hello all, this project is my experiement with the Naive bayes classifier and implemented this by following the Simiplilearn's lecture, Text Classification Using Naive Bayes | Naive Bayes Algorithm In Machine Learning in Youtube. But what I did on my own is, to deploy the trained Naive bayes classifier model using Flask so that a user can use it through the front-end web application.
Entire development part was done using Jupyter notebook while the deployment part was done using VSCode.
The dataset used for this project was fetch_20newsgroups, imported from "sklearn.datasets". As the name suggests, this dataset contains articles which are classified into 20 news groups, as mentioned below.
- alt.atheism
- comp.graphics
- comp.os.ms-windows.misc
- comp.sys.ibm.pc.hardware
- comp.sys.mac.hardware
- comp.windows.x
- misc.forsale
- rec.autos
- rec.motorcycles
- rec.sport.baseball
- rec.sport.hockey
- sci.crypt
- sci.electronics
- sci.med
- sci.space
- soc.religion.christian
- talk.politics.guns
- talk.politics.mideast
- talk.politics.misc
- talk.religion.misc
This dataset was split into training and testing sets respectively.
First, to extract features from the data, we vectorized each article in the training set by using the "TfidfVectorizer", imported from sklearn.feature_extraction.text. This assigns class to each vector in the article based on the probability.
Then, we used "MultinomialNB" algorithm (since this is a multi classification problem), imported from from sklearn.naive_bayes. We trained a model by fitting this MultinomialNB on the training dataset.
We defined pipeline by using "make_pipeline", imported from from sklearn.pipeline. In this pipeline, we passed TfidfVectorizer first and then MultinomialNB so that entire model training follows a sequence.
we used the trained model to make predictions on the test data and visualized the confusion matrix, found that our model's accuracy was 77.39%.
Using Joblib library, we saved our model as a .pkl file. Then, to deploy this model, we created a web application in python using the Flask - a micro web framework in VSCode, which works as a back-end. It takes input from the front-end webpage (developed using HTML5), makes prediction and returns the result in a user understandable format which gets displayed in the same front-end web page, so that a user can sense it.
Deployed this web application on Heroku Cloud using Gunicorn a Python web server gateway interface.
Link: - https://nb-text-classification.herokuapp.com/
Bayes Theorem lecture by Krish Naik
Indepth intuition of a Naive Bayes Classifier by Krish Naik
Application of Naïve Bayes Classifier on Text Data (NLP) - The intuition by Krish Naik
Deploying a web application in Heroku using Heroku CLI