-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Sergey Bronnikov edited this page Jan 9, 2020
·
25 revisions
- https://curl.haxx.se/libcurl/c/crawler.html
- https://www.monkey.org/~provos/crawl/
- http://www.aosabook.org/en/500L/a-web-crawler-with-asyncio-coroutines.html
- swish-e http://www.esa.org/tiee/search/html/spider.html
- sphinx http://sphinxsearch.com/docs/current.html
- swish-e http://www.esa.org/tiee/search/html/spider.html
- Apache Solr + Nutch
- Xapian - Аксенов пишет что этот движок уже умер и есть только два opensource движка: sphinx и lucene
- harvest http://harvest.sourceforge.net/harvest/doc/index.html
- https://wiby.me/about.html
- https://github.com/Stazer/Crawler
- https://curl.haxx.se/libcurl/c/crawler.html
- Solr 5+ DOES in fact now do web crawling! http://lucene.apache.org/solr/
- Nutch - http://lucene.apache.org/nutch/
- Websphinx - http://www.cs.cmu.edu/~rcm/websphinx/
- JSpider - http://j-spider.sourceforge.net/
- Heritrix - http://crawler.archive.org/
- Web-archive https://webarchive.jira.com/wiki/spaces/Heritrix/overview
- http://www.crawl-anywhere.com/
- Simple web crawler
- Manticore
- wget as a crawler and sphinx as search engine
- Go https://github.com/PuerkitoBio/gocrawl
- Go: https://git.autistici.org/ale/crawl/
- Go: https://github.com/temoto/heroshi
- Sphinx
$ pkg_add sphinx--pgsql
/usr/local/bin/searchd -h
-
Lucene, Solr, Sphinx, Xapian, Indri
-
FamilySearch https://familysearch.org/developers/
-
Javascript https://github.com/FamilySearch/fs-js-lite
-
https://web.archive.org/web/20160324121529/http://codavr.ru/
-
http://tutorialzine.com/2010/09/google-powered-site-search-ajax-jquery/
-
https://developers.google.com/custom-search/docs/tutorial/implementingsearchbox?hl=ru