Skip to content

This repository presents a simple streaming data pipeline to get statistics per vehicle from a data source.

Notifications You must be signed in to change notification settings

MBrugnaroto/Trips-Data-Pipeline

Repository files navigation

Trips Data Pipeline

This repository presents a simple pipeline to get statistics per vehicle from a data source. This project has the idea of providing the user with a view of the total trips made, total kilometers traveled, total moving time and total stopped time per vehicle and per month.

infra-diagram

Requirements

  • AWS Account (sign up)
  • Linux System
  • SED
  • Make
  • Docker

Skeleton

├── /env
|   ├── /airflow
|   |   ├── /logs
|   |   ├── /plugins
|   ├── /app
|   |   ├── /statistic_per_vehicle
|   |   |   ├── /extractor
|   |   |   ├── /loader
|   |   |   ├── /sender
|   ├── /datasource
|   ├── /datawarehouse
|   ├── /kafka
|   |   ├── /connectors
|   |   ├── /libs
|   |   ├── /sink/config
|   |   ├── /source/config
|   ├── /kibana
|   |   ├── /pgsync
├── /imgs
├── /source
|   ├── /app
|   |   |   ├── /dags
|   |   |   ├── /etls
|   |   |   |   ├── /statistic_per_vehicle
|   |   |   |   |   ├── querys
|   |   |   ├── /services
|   |   |   |   ├── /email_sender
|   ├── /data
|   |   |   ├── /statistic_per_vehicle

S3 - Folders Structure

s3-structure

How to Run:

The steps to set up the environment are a bit complex. So be careful.

  1. Create an S3 bucket called trip-statistics (Creating a bucket) with all public acess - AWS Region: us-east-1
  2. Fill your email, AWS access key and AWS secrete access key in the yourconfig.sh file (it's in the root directory of the repository)
  3. Finally, with the terminal open in the root directory of the repository, run the following command:
$ make
  • If you want clean your environment run:
$ make clean

Note: You can individually starts the env components using make command. But keep in mind that some components have dependencies.

With environment up you can access the Airflow to trigger the statistic per vehicle dag.

analytics-dashboard

  • In your browser access the portal through follwing URL:
localhost:8080

Environment information:

  • URI to access the data warehouse (postgresdb):
localhost:3307
  • Data warehouse login:
user: postgres
password: postgres
  • Database:
mobi7_code_interview
  • Table:
consumer_statistics

NOTE: Airflow grid is not showing up on the platform. The reason can see in this Github thread. The fix forecast is for Airflow version 2.3.2. But if you want the Airflow grid to appear, log in to the platform using the following credentials:

  • Airflow login:
user: airflow
password: airflow
  • Analytics dasbhoard:

analytics-dashboard

NOTE: you can create your own Kibana dashboard via the following url:

localhost:5601

About

This repository presents a simple streaming data pipeline to get statistics per vehicle from a data source.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published