Skip to content

Latest commit

 

History

History
67 lines (52 loc) · 4.04 KB

README.md

File metadata and controls

67 lines (52 loc) · 4.04 KB

Global Short-Term Rentals - a Data Warehouse and Dashboard

The global short-term rentals project is a full data pipeline and warehouse. A dashboard allows for the exploration of the impact of short-term rental listings (Airbnb) on housing. Data is pulled from three separate public datasets and consists of over 35 million records from 2015-2020. The tools used include Python, Snowflake, DBT, and Metabase.

Demo Video2

Architecture:

Overview

  • Python scripts perform API calls and web scraping for extraction and loading.
  • Snowflake is used as a cloud-based data warehouse.
  • DBT is used for transformations, data testing, and data documentation.
  • Metabase hosted on Heroku is used for data visualizations.

Data Sources

ETL and Data Model

Data Flow and Star Schema

DBT Models

DBT was used for the transformation layer. DBT allows for automated analytics tools like:

  • Revision control of transformation tables and views.
  • Automated testing, including the testing of source data.
  • Automated documentation creation. Here's the documentation for this project.

DBT Models Lineage Graph

Visualization

A video demonstration is available.

Screen shots:

Dashboard 1 Dashboard 2

Environment Setup:

To reproduce the warehouse, you will need to have these cloud-based accounts set up prior to running the remaining steps:

  1. You will need to have a Kaggle account and authenticate using an API token. For more information, read Kaggle's API documentation.
  2. You will need a Snowflake account with sysadmin privileges. They offer a 30-day trial with up to $400 in free usage. This project uses well under that, and no credit card is required until the limits are reached.

To setup the warehouse:

  1. Download the repository and create a virtual environment.
  2. Install the required dependencies with pip install -r requirements.txt.
  3. Copy .env.example to a file called .env at the root level of the project. Add in relevant Snowflake and dbt environment variables.
  4. If using a bash shell, run set -o allexport; source .env; set +o allexport to export credentials and other environment variables. You'll need to make adjustments on a Windows machine.

Extract, load, and transform data:

  1. Setup users, databases, and warehouses on Snowflake. Use src/create_warehouse.sql as a guide.
  2. To extract datasets (scraping and API calls), run python src/extract.py. The script may take a few hours.
  3. To load data, run python src/load.py. Again, depending on your connection, will take awhile.
  4. To install dbt dependencies, from dbt subdirectory run dbt deps.
  5. To transform data, from dbt subdirectory run dbt run.
  6. To test your data, from dbt subdirectory run dbt test.
  7. To generate documentation, from dbt subdirectory run dbt docs generate, and dbt docs serve for locally hosted documentation.

Visualization:

  1. Download and install Metabase on the platform of your choice (I used Heroku).
  2. Follow setup instructions for an admin user and add your Snowflake credentials.
  3. Create a new dashboard.
  4. Create questions for each card based on the files in visualizations/, adjusting the visualization component.

Author:

A personal project by Rebecca Sanjabi.