Skip to content
/ soodud Public

Product price comparison scraper & webapp w/ clustering algos (2022)

Notifications You must be signed in to change notification settings

ea-ae/soodud

Repository files navigation

Soodud

Soodud is a webapp that scrapes data from online stores with Python and then uses C++ hierarchical cluster analysis to form comparable products between stores that are stored in PostgreSQL. The resulting data is served by a Django REST API and then processed by a TailwindCSS & React frontend.

CI/CD is implemented through Github Actions and Docker Compose. Nginx & fail2ban are used to compress/cache/serve static files, provide rate limiting, and detect malicious bots. All commits are ran through flake8 and other pre-commit filters.

image

Setup

  1. Clone the project.
  2. Create a valid .env file based on .env.example.
  3. In order to contribute, first install the required git commit hooks with cd django && pipenv run pre-commit install.

Development

  1. Install dependencies using cd client && npm install --dev and cd django && pipenv install.
  2. Build the C++ project and move clustering/out/clustering.(so|pyd) into the django/data/stores/ directory.
  3. Start the webpack dev server using cd client && npm run server
  4. Start the Python virtual environment with cd django && pipenv shell.
  5. Start the Django dev server with tools/start_server.sh.
  6. To scrape new product data and form updated product clusters, run tools/run_service.sh launch and tools/run_service.sh match respectively.

Production

  1. If this is your initial configuration, temporarily disable HTTPS in nginx/nginx.conf by commenting out the include.
  2. Run Docker Compose with tools/compose.sh.
  3. Create a new cronjob with tools/cron.txt as a reference. This will ensure that the product database is updated once a day.

About

Product price comparison scraper & webapp w/ clustering algos (2022)

Resources

Stars

Watchers

Forks

Packages

No packages published