The project is quite simple with all sources are in /src
You can start with /src/cli
to check what are available and start there
I tried to make the process of adding new product as easy as possible via config first approach.
The base config is at vietlott.config.products.ProductConfig
, with settings mostly works for all products of Vietlott.
Key points:
- cookies used to needed to crawl but not anymore (disabled for all products)
- data on website are in pages so the fetching are designed around that mechanism (also the detect missing and back-filled mechanism at missing.py)
The project uses Github Actions with config to schedule the run daily to crawl & push to itself. So no server required.
To make it easier (for me) to dev, the binary file
set PYTHONPATH
to /src
, but it can and should be using installed cli:
[project.scripts]
vietlott-crawl = "vietlott.cli.crawl:crawl"
vietlott-missing = "vietlott.cli.crawl:detect_missing_data"