#ETL Assignment Guideline
- First Downloaded the dummy data from a website. (https://www.mysqltutorial.org/mysql-sample-database.aspx)
- The sql file can be found in
- sql_flies folder
- File name mysqlsampledatabase.sql
- execute the in a IDE to create table and data. The name of the database will be classicmodels
- The sql file can be found in
- sql_flies folder
- File name postgres_db_ddl.sql
- All the DDL for the table are given there
- Also added a log table in order to keep track of the Script.
- First create a folder
- Open command line and run pipenv shell
- If pipenv not installed then install the librery using pip and then run the command
pip install pipenv
- After creating virtual environment install all the libreries by running
pip install -r requirements.txt
- There is a file in credential folder
"user_name": "",
"password": ""
"user_name": "",
"password": ""
"mail_app_password": ""
It contains Json data where source and destination credentials needs to be inserted
Also credentials for sending mail needs to be provided.
In mail_app_password section user's passord or app_password can be given
There is another json file named table_names.json
This file contains multiple json. One can add or remove json from this file. if necessary.
It contains iformation on how to access source table and how to access destination tables.
The status are two type 'active'/'inactive'. Only tables having active status will fetch data from source and insert in destination.
[Mail System] - Mail system is used to alert corresponding people after ETL process is fininsd with a log file.
[Log File] - The log file contains how many data were inserted on that specific hour.
- move the ETL scriptin a directory
- Login as a root user and install python and other corresponding libreries from requirements.txt
- Install cron if not installed.
yum -y install cron
- Install cron if not installed.
- To edit cron write:
crontab -e
- Configure the cron file. Inside cron file write:
00 0-23 * * * python_file_location loaction_of_script/main.py
00 0-23 * * * bin/python3 ETL_script/main.py
- Here 00 0-23 means the script will run in the first minute of every hour.
- to get python location run
which python
- Download and Install power BI from microsoft store
- Click on files-> Get Data->Get Data to get started
- select postgres database from the list
- Fillup the required fields to connect to data source
- Finally select necessary tables and load them
- Install gitbash if not inswtalled.
- Create a new repository -then follow the commands to push files.
- to get python location run
git init
git add .
git commit "first commit"
git remote add origin the_repositorylink
git push -u origin main
- Link of my repository *https://github.com/minhaislam/ETL_script