Allocate resource and scheduling based on route, traffic, transport type information.
Building the data pipeline.
- Generate data
- Generate data using
resources/data-generator.py
script toraw-folder
and copy it towork-folder
.
- Generate data using
- Design data pipeline
- Build data model
- Ingest data, Extract and Load into Data warehouse
- The flat file data first go into the staging area, where preprocessing with SSIS and put to MSSQL.
- Load data onto Snowflake with the transformation
- The SSIS will execute
src/snowflake/putFile.bat
to runput
command on Snowflake (load data to the internal stage on Snowflake). src/snowflake/loadData.py
will run after that to connect Snowflake endpoint tocopy
data from internal stage to destination table.
- The SSIS will execute
- Visualize your data
- Data on the dimensional tables is visualized on the Power BI dashboard.
- Install Snowsql and add the following setting to
config
file (Windows:config
file located atC:\Users\<user-name>\.snowsql
):[connections.loadingdb] accountname = <accountname> # Ask the repo owner for the accountname username = <username> # Ask the repo owner for the username password = <password> # Ask the repo owner for the password warehousename = COMPUTE_LOADING dbname = PUBLICTRANSPORTATION schemaname = PUBLIC rolename = sysadmin
- Install python dependencies:
pip install -r requirements.txt
. Make sure you can import libraries usingcmd
. We recommend installing Anaconda and addconda
environment to windowsPATH
. - Clone the repo:
git clone https://github.com/dhuy237/fa-project-1-team-7.git
- Generate data:
python resources/data-generator.py
- Login into MSSQL and run init_mssql.sql
- Authen SnowSQL and run init_snowflake.sql
- Open
src/snowflake/putFile.bat
:- Change the
-f
file path to the folder containsrc/snowflake/ssis2snow.sql
file.
- Change the
- Open
src/snowflake/ssis2snow.sql
:- Change all the path to the place that contains
resources/work-folder
, keep all the.csv
name after the path.
- Change all the path to the place that contains
- Open
src/snowflake/loadData.py
:- Change
SRA_PATH
: Generate RSA key with the following command (Snowflake document), you can change the directory to a temp folder so that it does not overwrite your existing key:
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
- Send the
rsa_key.pub
file to the repo owner to add to your Snowflake user account. - Change
LOG_PATH
: Change your path to the place containsresources/logs
.
- Change
- Open SSIS solution:
- Update
localhost.PublicTransportation
connection to the local database on SSMS. - Update
variables
:DataPath
: change your path to the place containsresources/work-folder
.ErrorPath
: change your path to the place containsresources/logs
. The error log will be stored here if there is any error during execution.ExecutePath
: change your path to the place containssrc/snowflake
.PythonExecutionPath
: change your path to the place containspython.exe
. If you use Anaconda, the path should look like thisC:\Users\<user-name>\Anaconda3\python.exe
.
- Execute the package.
- Update
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project.
- Create your Feature Branch (
git checkout -b feature/Feature
). - Commit your Changes (
git commit -m 'Add some feature'
). - Push to the Branch (
git push origin feature/Feature
). - Open a Pull Request.
- Huy Tran (dhuy237) - d.huy723@gmail.com
- Quyen Mai (mtquyen) - maithiquyen124@gmail.com