Automatically Assessing the Information Quality of Webdocuments with Machine Learning

Read the research paper: Research Paper

Abstract

Web users who searches for information via a search engine have very less insights about the IQ of Web documents. We trained a multiple output regression for the task of estimating the Informa- tion Quality (IQ) of Web documents based on historical data where features and assessments where collected. For retrieving the IQ scores we automatically collect their features and predict the IQ based on the patterns that our algorithm learned. The model for semi-automatically assessing the IQ of Web documents was in- spired by the work of Ceolin et al. Compared to their framework our Framework is also capable of retrieving documents inherent a given topic of interest to the user in a comparable manner, we provide descriptive insight about the content of the Web document and we increased the responsivity of the information.

What has it

Multiple Target Regression Machine Learning
Multi-core Processor Crawler nested with Multi-Threaded Crawler (Mixed Conccurency and Parallelism)
Search by searchengine
Web Server
Fault recovery
Pivot-Grid

Future Work

Personalized Content and Informartion Quality based Recommendation System
Crawl the entire Web and restructure it
Advanced Text Mining

Requirements:

Ubuntu
Python 3
Apache 2.4
Flask
Gunicorn
Init D process

Usage:

In our other reposotry you will find out how we configured the tool and some suggestions. Qupid

We blocked the open ports with our internal Firewall.

Questions: ozkansener@gmail.com

Vrije Universiteit Amsterdam

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
logs		logs
static		static
.DS_Store		.DS_Store
README.md		README.md
Researchpaper.pdf		Researchpaper.pdf
app.py		app.py
em.txt		em.txt
gunicorn.conf		gunicorn.conf
log copy.txt		log copy.txt
log.txt		log.txt
pickle.model		pickle.model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatically Assessing the Information Quality of Webdocuments with Machine Learning

Abstract

What has it

Future Work

Requirements:

Usage:

About

Releases

Packages

Languages

ozkansener/Information-Quality-of-Webdocuments-with-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Automatically Assessing the Information Quality of Webdocuments with Machine Learning

Abstract

What has it

Future Work

Requirements:

Usage:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages