Skip to content

Latest commit

 

History

History
17 lines (13 loc) · 687 Bytes

README.md

File metadata and controls

17 lines (13 loc) · 687 Bytes

Web_Crawler

Simple web crawler

1 . Objective

I work with a simple web crawler to measure aspects of a crawl, study the characteristics of the crawl, download web pages from the crawl and gather webpage metadata, all from pre-selected news websites.

2 . Preliminaries

To begin I will make use of an existing open source Java web crawler called crawler4j. This crawler is built upon the open source crawler4j library which is located on github. For complete details on downloading and compiling see https://github.com/yasserg/crawler4j Also see the following document for help installing Eclipse and crawler4j http://www-scf.usc.edu/~csci572/2017Spring/hw2/Crawler4jinstallation.pdf