Skip to content

MingyiZhang/simple-distributed-crawler-library

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Distributed Web Crawler Library

A simple distributed web crawler library that is written in Go.

The library is implemented completed from scratch. As a Golang practice project, it is mainly focused on the distributed structure. One needs to implement their own web parsers as shown in the examples.

It is the capstone project of the imooc's Golang course.

Architecture

As a distributed web crawler, it contains several components

Components are communicated using JSON-RPC.

Algorithm

The crawler uses breadth first search to scrape website.

Examples

There are two simple examples included:

TODO

  • separate service for saving data
  • separate service for parsing web data
  • frontend for display search results
  • use testcontainers in tests
  • separate service for checking duplication
  • Kubernetes deployment
  • gRPC and Protobuf version

About

A simple distributed crawler library in Golang

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published