GitHub - jim-spyropoulos/NLP-in-Neswpaper-articles: Pregraduate project. Evaluation of machine learning algorithms of scikit-learn at a dataset of newspaper articles.

NLP in Neswpaper articles

Pregraduate university team project for the class of Data Mining. The project had three goals:

Given a train_set.csv of newspaper's articles, we had to plot a wordcloud for each article category.
Implementation of K-means clustering algorithm for the given dataset.
Evaluation of scikit-learn's classification algorithms (Multinomial Naive Bayes, Bernouli, KNN, SVM, Random Forest) via 10-fold-cross-validation and accuracy and roc-plot metrics. We chose the best versions of the above algorithms
and combined them in a new, voting classifier to assign labels to the articles of the test_set.csv .

We converted articles to vectors using a pipeline of count_vectorizer, tfidf transformer and svdl. Impementations of thoses algorithms were provided by scikit-learn.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Classification.py		Classification.py
LICENSE.md		LICENSE.md
README.md		README.md
clustering.py		clustering.py
clustering_KMeans.csv		clustering_KMeans.csv
method_for_test_file.py		method_for_test_file.py
test_set.csv		test_set.csv
train_set.csv		train_set.csv
wordcloud.py		wordcloud.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

jim-spyropoulos/NLP-in-Neswpaper-articles

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages