#ETL Assignment Guideline
- First Downloaded the dummy data from a website. (https://www.mysqltutorial.org/mysql-sample-database.aspx)
- The sql file can be found in
- sql_flies folder
- File name mysqlsampledatabase.sql
- execute the in a IDE to create table and data. The name of the database will be classicmodels
- The sql file can be found in
- sql_flies folder
- File name postgres_db_ddl.sql
- All the DDL for the table are given there
- Also added a log table in order to keep track of the Script.
- First create a folder
- Open command line and run pipenv shell
- If pipenv not installed then install the librery using pip and then run the command
pip install pipenv
- After creating virtual environment install all the libreries by running
pip install -r requirements.txt
- There is a file in credential folder
{
"src_credential":{
"database":"",
"host":"",
"port":,
"user_name": "",
"password": ""
},
"dest_credential":{
"database":"",
"host":"",
"port":,
"user_name": "",
"password": ""
},
"mail_credential":{
"user_mail":"",
"mail_app_password": ""
"to_user":"minhajislam95@gmail.com",
"CC_user":[""]
}
}
-
It contains Json data where source and destination credentials needs to be inserted
-
Also credentials for sending mail needs to be provided.
-
In mail_app_password section user's passord or app_password can be given
-
There is another json file named table_names.json
{
"src_DB_name":"",
"src_table_name":"",
"src_target_column":"",
"dest_schema_name":"",
"dest_table_name":"",
"dest_target_column":"",
"status":""
}
-
This file contains multiple json. One can add or remove json from this file. if necessary.
-
It contains iformation on how to access source table and how to access destination tables.
-
The status are two type 'active'/'inactive'. Only tables having active status will fetch data from source and insert in destination.
-
[Mail System] - Mail system is used to alert corresponding people after ETL process is fininsd with a log file.
-
[Log File] - The log file contains how many data were inserted on that specific hour.
- move the ETL scriptin a directory
- Login as a root user and install python and other corresponding libreries from requirements.txt
- Install cron if not installed.
yum -y install cron
- Install cron if not installed.
- To edit cron write:
crontab -e
- Configure the cron file. Inside cron file write:
00 0-23 * * * python_file_location loaction_of_script/main.py
example:
00 0-23 * * * bin/python3 ETL_script/main.py
- Here 00 0-23 means the script will run in the first minute of every hour.
- to get python location run
which python
- Download and Install power BI from microsoft store
- Click on files-> Get Data->Get Data to get started
- select postgres database from the list
- Fillup the required fields to connect to data source
- Finally select necessary tables and load them
- Install gitbash if not inswtalled.
- Create a new repository -then follow the commands to push files.
- to get python location run
git init
git add .
git commit "first commit"
git remote add origin the_repositorylink
git push -u origin main
- Link of my repository *https://github.com/minhaislam/ETL_script