Skip to content

Latest commit

 

History

History
150 lines (115 loc) · 3.8 KB

README.md

File metadata and controls

150 lines (115 loc) · 3.8 KB

ETL Weather Forecast

CI Coverage Status Python 3.6

Introduction

Python ETL Pineline craws data from Weather Forecast, transforms and loads to MySQL.

Authors: Bijeck


Technicals

  • Mock
  • MySQL
  • CI Python Code Validation
  • Dimensional Model
  • Pytest

Requirements

Project uses a number of open source projects to work properly:

  • MySQL - For run sql query and store data
  • Python - Main programming language that lets project run effectively.
  • MySQL Workbench - Manipulate with database, and show data

You should sign up your account in RapiAPI and subcribe to Weather Map API.


Project Folder

  • mock: contains mock data for testing
  • src : contains source files
  • src/etl : contains etl files
  • tests : contains tests files
  • tests/etl : contains etl tests files
  • config.json: contains configuration for MySQL server and X-RapidAPI-Key from Weather Map API
  • requirements.txt: list python requirement packages
  • .github/workflows/python-app.yml: file for run CI in github
  • database.sql: database script
  • weather_schema.png: database weather schema

Create Enviroment

Be Sure you have Virtulenv installed if not running below:

pip install virtualenv

After unzip the project, create a virtual environment with the following:

cd ETL_SuMP

virtualenv venv

Then active the virtual environment and install the packages:

# For Mac or Linux
source venv/bin/activate

# For windows
venv\Scripts\activate.bat

Installation

Install python packages to run project effectively:

pip install -r requirements.txt

Configuration

Configure your MySQL server in config.json:

Key Value
host localhost
user root
password yourpassword
database databasename

Configure your API-key from Weather Map API in config.json to able run appication:

Key Value
X-RapidAPI-Key key

Run Project

Create database and table:

python src\db.py

Run project:

python src\main.py

Enter your location to fetch data:

Enter your location: london

Your location 's data will countinue get after 30 seconds. You can terminate the project by press in your keyboard:

Note: You can terminate the project by press Ctrl + C


Run Test

Run test:

pytest -v

Run test with keywords( Examples: get,extract,transform,error):

pytest -k keywords -v

Run to see coverage all project results:

coverage run -m pytest

Slow Changing Dimensions

SCD Type 1

  • Apply in city_dim table.
  • Replace attribute from old record by new record with same city_id

SCD Type 2

  • Apply in weather_fact table.
  • Record have current_flag column to monitor the current weather of city.
  • When new weather datas is inserted, its current_flag will be Y and old record will be N. So we can keep the historical weather data of a city.

Note

You can use database.sql file to create database and it contains data for you.

Note: Rename database name in the file with your prefer name.