Skip to content

This is a simple Java program that extracts the images, internal hyperlinks and andexternal hyperlinks from a HTML file.

Notifications You must be signed in to change notification settings

BalawalSultan/Java-Web-Scraper-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Simple Web Scraper

This is a simple Java program that extracts the images, internal hyperlinks and andexternal hyperlinks from a HTML file.

Installation

Use Maven to prepare the application.

mvn install compile package

Once the command is executed Maven will generate a folder called target, inside that folder you will find two jar files “WebScraper-1.0-SNAPSHOT.jar” and “WebScraper.jar”, the one we are interested in is WebSraper.jar. Once we have the WebScraper.jar file we can take it and place it anywhere we want.

Usage

In order for application to distinguish between internal and external hyperlinks the name of the HTML file that we want to scrape must be “the domain of the website.html”. To run the web scraper we must run the following command.

java -jar WebScraper.jar "unibz.it.html"

License

MIT

About

This is a simple Java program that extracts the images, internal hyperlinks and andexternal hyperlinks from a HTML file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages