Fake news is false or misleading information. False information can cause several problems because people and even governments could take action based on fake news. And some actions could lead to very serious losses. Therefore, detecting whether published news is fake or not is a serious task, especially nowadays where we live in a small world where everyone can stream news to the world in a matter of seconds. In this project, we address the fake news detection problem, classifying given news as normal or fake.
Our dataset is taken from a Kaggle challenge. It is composed of 2 features. A feature containing the news, a text, and a binary feature containing either 0 or 1 indicating whether the text is fake news or not. We process the text using different embedding techniques. Basic ones such as TF-IDF, and more sophisticated ones such as Word2Vec and Bert. Once we get the embeddings, we build various machine learning classification models such as Logistic Regression, Support Vector Machines, and RandomForestClassifier. Besides, we ensemble more than a classifier together using a voting system. Finally, we interpret our results and discuss possible improvements.