Skip to content

A fictitious music streaming service with a real website and API so you can learn how to scrape!

Notifications You must be signed in to change notification settings

tilburgsciencehub/music-to-scrape

Repository files navigation

Music-to-scrape

We’re music-to-scrape, a fictitious music streaming service with a real website and API. Built for educational purposes, you can use us to learn web scraping!

Getting started

Head over to https://music-to-scrape.org to view our live website, or directly check out our API documentation.

drawing

Running this project

Using Docker

Starting up the frontend and backend

The easiest way to run our project is using Docker.

  • Install Docker and clone this repository.
  • Open the terminal at the repository's root directory and run the following commands: docker compose build and docker compose up.
  • Wait a bit for the website and API to be launched. If the process breaks, you likely haven't allocated enough memory (e.g., the built takes about 6 GB of memory)
  • Once docker has been launched, you can access the website and API locally at these addresses:
    • API: http://localhost:8080 (whereby localhost typically is 127.0.0.1)
    • Front end: http://localhost:8000
  • Press Ctrl + C in the terminal to quit.

Configuring server for public access and HTTPS traffic

If you're running this project publicly, it's worthwhile configuring HTTPS access on your server. Following the notes here.

TLDR:

  • If you have already built the image, it's enough to start it with docker compose up -d.
  • If you want to rebuild, use docker compose build first.
  • Unsure whether a docker container with the site is running already? Check with docker ps; stop unnecessary images using docker stop IMAGEID.

Manual setup (i.e., not using Docker)

Install packages and simulate data

  • Clone this repository
  • Ensure you have R installed, and run simulate.R in src/simulate to generate the fictitious data.
  • Install required Python packages
pip install fastapi
pip install fastapi_utils
pip install sqlalchemy
pip install pydantic
pip install uvicorn
pip install gunicorn
pip install flask
pip install flask_sqlalchemy

Start the API

  • Open terminal
  • Go to the sql_app folder inside the repository
  • Run the following command: uvicorn main:app --port 8080
    • If you want to the FastAPI connection, press Ctrl + C in the terminal to quit.
    • If you want to check the documentation, you can go to following address when uvicorn is started:
      • http://127.0.0.1:8080/docs
      • uvicorn will show you which link is used when running the application.

Start the front end

  • Open terminal
  • Go to the flask_app folder inside the repository
  • Run the following command: gunicorn app:app --bind 127.0.0.1:8000
  • If you want to the Flask connection, press Ctrl + c in the terminal to quit.

Changing the data

  • Open the simulate.R file within the src/simulate folder
  • Make your adjustments
  • Run the complete file top-down, and the databases will be updated.

Acknowledgements