Skip to content

Latest commit

 

History

History
29 lines (19 loc) · 1.38 KB

ARCHITECTURE.md

File metadata and controls

29 lines (19 loc) · 1.38 KB

Vietlott-data architecture

The project is quite simple with all sources are in /src

You can start with /src/cli to check what are available and start there

product config

I tried to make the process of adding new product as easy as possible via config first approach.

The base config is at vietlott.config.products.ProductConfig, with settings mostly works for all products of Vietlott.

Key points:

  • cookies used to needed to crawl but not anymore (disabled for all products)
  • data on website are in pages so the fetching are designed around that mechanism (also the detect missing and back-filled mechanism at missing.py)

runner

The project uses Github Actions with config to schedule the run daily to crawl & push to itself. So no server required.

To make it easier (for me) to dev, the binary file set PYTHONPATH to /src, but it can and should be using installed cli:

[project.scripts]
vietlott-crawl = "vietlott.cli.crawl:crawl"
vietlott-missing = "vietlott.cli.crawl:detect_missing_data"