Dataset link: Download the dataset from here. There are two files, one for real news and one for fake news (both in English) with a total of 23481 “fake” tweets and 21417 “real” articles.
OBJECTIVE:
To analyze three different Machine Learning Classification algorithms / models on ‘Fake and real news’ dataset and select the model which gives us the highest performance to detect fake and real news.
• Python libraries used as follows:
- Numpy
- Pandas
- Scikit – Learn
- Matplotlib
- NLTK
- WordCloud
• ML Algorithms used are:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
RESULT:
In this project, modeling process was consist of vectorizing the corpus stored in the “text” column, then applying TF-IDF, and finally a classification machine learning algorithm. After analyzing all three models’ performance, it has been observed that Decision Tree Classifier model has shown the highest accuracy (99.57%) for the dataset. That means this classification model / algorithm can detect fake and real news with 99.57% accuracy.