GitHub

#ETL Assignment Guideline

Step 1: Ingesting Dummy Data in Mysql DB

First Downloaded the dummy data from a website. (https://www.mysqltutorial.org/mysql-sample-database.aspx)
The sql file can be found in

sql_flies folder

File name mysqlsampledatabase.sql

execute the in a IDE to create table and data. The name of the database will be classicmodels

Step 2: Create Corresponding table for ETL in Postgres DB

The sql file can be found in

sql_flies folder

File name postgres_db_ddl.sql

All the DDL for the table are given there
Also added a log table in order to keep track of the Script.

Step 3: Writing ETL framework (Using Python)

First create a folder
Open command line and run pipenv shell
If pipenv not installed then install the librery using pip and then run the command

pip install pipenv

After creating virtual environment install all the libreries by running

pip install -r requirements.txt

How the ETL works

There is a file in credential folder

{
    "src_credential":{
        "database":"",
        "host":"",
         "port":,
        "user_name": "",
        "password": ""
    },
    
    "dest_credential":{
        "database":"",
        "host":"",
         "port":,
        "user_name": "",
        "password": ""
    },
    "mail_credential":{
        "user_mail":"",
        "mail_app_password": ""
        "to_user":"minhajislam95@gmail.com",
        "CC_user":[""]
    }

}

It contains Json data where source and destination credentials needs to be inserted
Also credentials for sending mail needs to be provided.
In mail_app_password section user's passord or app_password can be given
There is another json file named table_names.json

{
    "src_DB_name":"",
    "src_table_name":"",
    "src_target_column":"",
    "dest_schema_name":"",
    "dest_table_name":"",
    "dest_target_column":"",
    "status":""
}

This file contains multiple json. One can add or remove json from this file. if necessary.
It contains iformation on how to access source table and how to access destination tables.
The status are two type 'active'/'inactive'. Only tables having active status will fetch data from source and insert in destination.
[Mail System] - Mail system is used to alert corresponding people after ETL process is fininsd with a log file.
[Log File] - The log file contains how many data were inserted on that specific hour.

Step 4: Setting up hourly cromjob in Crontab(OS: centos 7)

move the ETL scriptin a directory
Login as a root user and install python and other corresponding libreries from requirements.txt
Install cron if not installed.

yum -y install cron

Install cron if not installed.
To edit cron write:

crontab -e

Configure the cron file. Inside cron file write:

00 0-23 * * * python_file_location loaction_of_script/main.py 
example:
00 0-23 * * * bin/python3 ETL_script/main.py

Here 00 0-23 means the script will run in the first minute of every hour.
to get python location run

which python

Step 5: Adding a BI tool to the Data warehouse (Power BI)

Download and Install power BI from microsoft store
Click on files-> Get Data->Get Data to get started
select postgres database from the list
Fillup the required fields to connect to data source
Finally select necessary tables and load them

Step 6: Adding project to github

Install gitbash if not inswtalled.
Create a new repository -then follow the commands to push files.
to get python location run

git init
git add .
git commit "first commit"
git remote add origin the_repositorylink
git push -u origin main

Link of my repository *https://github.com/minhaislam/ETL_script

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
credential		credential
sql_files		sql_files
.gitignore		.gitignore
DB_connection.py		DB_connection.py
Pipfile		Pipfile
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
table_names.json		table_names.json
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Step 1: Ingesting Dummy Data in Mysql DB

Step 2: Create Corresponding table for ETL in Postgres DB

Step 3: Writing ETL framework (Using Python)

How the ETL works

Step 4: Setting up hourly cromjob in Crontab(OS: centos 7)

Step 5: Adding a BI tool to the Data warehouse (Power BI)

Step 6: Adding project to github

About

Releases

Packages

Languages

minhaislam/ETL_script

Folders and files

Latest commit

History

Repository files navigation

Step 1: Ingesting Dummy Data in Mysql DB

Step 2: Create Corresponding table for ETL in Postgres DB

Step 3: Writing ETL framework (Using Python)

How the ETL works

Step 4: Setting up hourly cromjob in Crontab(OS: centos 7)

Step 5: Adding a BI tool to the Data warehouse (Power BI)

Step 6: Adding project to github

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages