🌏 Scrape the world!
This library screen-scrapes data from html and injects data into POJO using annotation.
@WebScraper(url = "http://foo.com/bar.html")
public class Baz {
@Target(value = "//TABLE//TR/TD[2]/DIV/text()")
String artist;
@Target(value = "//TABLE//TR/TD[4]/A/text()")
String title;
@Target(value = "//TABLE//TR/TD[4]/A/@href")
String url;
}
:
List<Baz> bazs = WebScraper.Util.scrape(Baz.class);
-
InputHandler
... apply any processing before parsing -
Parser
XPathParser
... defaultHtmlXPathParser
... for original purposeSaxonXPathParser
... for huge xml fileJsonPathParser
... for json return
-
Parser#foreach()
... like java collection stream
- Scraping composers from JASRAC database for iTuens
- Scraping json for deep learning proof reading
- Amazon purchase history
- Amazon yourstore collection
- Google suggest
- Yahoo! Japan proof reading
Tidy versiondeleted garbled text- InputHandler w/o cache
argument injection into WebScraper#url@WebScraper(url = "http://foo.com?bar={bar}") public static class Result { : List<Result> data = WebScraper.Util.scrape(Result.class, @UrlParam(bar) args[0]);
json parsercss selector- integrate serdes
@WebScraper#encoding()
@Target
add exception handler or second, third optionxml2xpath