Skip to content

ozkansener/Information-Quality-of-Webdocuments-with-Machine-Learning

Repository files navigation

Automatically Assessing the Information Quality of Webdocuments with Machine Learning

Read the research paper: Research Paper

Abstract

Web users who searches for information via a search engine have very less insights about the IQ of Web documents. We trained a multiple output regression for the task of estimating the Informa- tion Quality (IQ) of Web documents based on historical data where features and assessments where collected. For retrieving the IQ scores we automatically collect their features and predict the IQ based on the patterns that our algorithm learned. The model for semi-automatically assessing the IQ of Web documents was in- spired by the work of Ceolin et al. Compared to their framework our Framework is also capable of retrieving documents inherent a given topic of interest to the user in a comparable manner, we provide descriptive insight about the content of the Web document and we increased the responsivity of the information.

What has it

  • Multiple Target Regression Machine Learning
  • Multi-core Processor Crawler nested with Multi-Threaded Crawler (Mixed Conccurency and Parallelism)
  • Search by searchengine
  • Web Server
  • Fault recovery
  • Pivot-Grid

Future Work

  • Personalized Content and Informartion Quality based Recommendation System
  • Crawl the entire Web and restructure it
  • Advanced Text Mining

Requirements:

  • Ubuntu
  • Python 3
  • Apache 2.4
  • Flask
  • Gunicorn
  • Init D process

Usage:

In our other reposotry you will find out how we configured the tool and some suggestions. Qupid

We blocked the open ports with our internal Firewall.

Questions: ozkansener@gmail.com

Vrije Universiteit Amsterdam