Skip to content

Pregraduate project. Evaluation of machine learning algorithms of scikit-learn at a dataset of newspaper articles.

License

Notifications You must be signed in to change notification settings

jim-spyropoulos/NLP-in-Neswpaper-articles

Repository files navigation

NLP in Neswpaper articles

Pregraduate university team project for the class of Data Mining. The project had three goals:

  1. Given a train_set.csv of newspaper's articles, we had to plot a wordcloud for each article category.
  2. Implementation of K-means clustering algorithm for the given dataset.
  3. Evaluation of scikit-learn's classification algorithms (Multinomial Naive Bayes, Bernouli, KNN, SVM, Random Forest) via 10-fold-cross-validation and accuracy and roc-plot metrics. We chose the best versions of the above algorithms
    and combined them in a new, voting classifier to assign labels to the articles of the test_set.csv .

We converted articles to vectors using a pipeline of count_vectorizer, tfidf transformer and svdl. Impementations of thoses algorithms were provided by scikit-learn.

About

Pregraduate project. Evaluation of machine learning algorithms of scikit-learn at a dataset of newspaper articles.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages