Skip to content

build an ETL data pipeline using PySpark to migrate data from PostgreSQL to SQL Server.

Notifications You must be signed in to change notification settings

ArkanNibrastama/Data-Migration-PostgreSQL-to-SQLServer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Migration from PostgreSQL to SQLServer using PySpark

This project is to design, implement and execute an ETL pipeline using PySpark to migrate data from a PostgreSQL database to a SQL Server database. The pipeline should be designed to handle large amounts of data and ensure data integrity during the migration process. The ETL pipeline should include steps for extracting data from the PostgreSQL database, transforming the data to match the schema of the SQL Server database, and loading the data into the SQL Server database. The ultimate goal is to have the data in the SQL Server database accurately reflect the data in the PostgreSQL database with minimal data loss and minimal disruption to ongoing operations.

Architecture

Architecture

Run Locally

  • Clone the project

    git clone https://github.com/ArkanNibrastama/Data-Migration-PostgreSQL-to-SQLServer-use-PySpark.git
  • Make a database on PostgreSQL and import data from dataset folder

  • Install all the dependencies

    pip install -r reuquirements.txt
  • Fill the blank variable with your own data
    example:

    uid = '{YOUR USER ID ON POSTGRESQL}'
    pwd = '{YOUR PSSWORD}'
    host = 'localhost'
    port = '5432' #this is the default port
    db = '{YOUR DB NAME}'
    driver = "org.postgresql.Driver"
    url = f"jdbc:postgresql://{host}:5432/{db}?user={uid}&password={pwd}"
  • Finally, you can run the program on your local computer

Full explanation

To make better understand of this repository, you can check my linkedin post about this project Data Migration : PostgreSQL to SQL Server using PySpark.

About

build an ETL data pipeline using PySpark to migrate data from PostgreSQL to SQL Server.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published