A web scrapper and deposit system data pipeline!
- Scrape data from urls, html file and xml files
- Let users deposit their data through a deposit system
- Data pipeline for extracting, cleaning and storing data in database
- Backend: Python, Django, PostgreSQL
- Infrastructure: Terraform, Google Cloud Compute Instance
- Deployment: Nginx through Linux Bash Script
chmod +x ./scripts/run_backend.sh && ./scripts/run_backend.sh
- Run Pytest:
.venv/bin/pytest -rP
- Run Pytest Coverage:
.venv/bin/pytest --cov=backend
- Check Docs Coverage:
.venv/bin/interrogate -v backend
- Check Docs Style:
.venv/bin/pydocstyle backend
- Show Docs Locally:
.venv/bin/mkdocs serve --dev-addr 127.0.0.1:9000
- Deploy Docs to GitHub Pages:
.venv/bin/mkdocs gh-deploy
- Create GCP project and get the project id
- Create a GCP storage and get the bucket name
- Download a service key file and rename it to
infrastructure/.gcp_creds.json
- Copy
infrastructure/.backend.hcl.sample
and rename it toinfrastructure/.backend.hcl
- Copy
infrastructure/.secrets.auto.tfvars.sample
and rename it toinfrastructure/.secrets.auto.tfvars
- Generate an SSH Key.
- Create the folder
infrastructure/.ssh
and copyid_rsa.pub
andid_rsa
inside it
- Create an alias for terraform command
alias TF=docker compose -f infrastructure/.docker-compose.yml run --rm terraform
- terraform init
TF init -backend-config=.backend.hcl
- terraform apply gcp
TF apply -target="module.gcp" --auto-approve
- terraform destroy gcp
TF destroy -target="module.gcp" --auto-approve
- terraform output gcp
TF output gcp