Ingest data from API with Python , store raw data in Google Cloud Storage , build data pipeline with Apache Spark , load data into Google BigQuery then create dashboard with Google Looker Studio
- Create Project
- Create Bucket (can create with either UI or CLI gsutil)
- Select data center : Singapore.
- Others settings is default
- Go in bucket and upload the file (can upload with UI or CLI gsutil or Python SDK)
gsutil documentaion : https://cloud.google.com/storage/docs/gsutil
Cloud Composer is fully-managed Apache Airflow
- Go to Cloud Composer in Google Cloud Console
- Create Cloud Composer 1
- name an environment
- Location (nearest from your country) : us-central1
- Machine type : n1-standard-2 (RAM 7.5 GB)
- Disk Size : 20 GB
- Image Version : select the latest
- Python version : 3
- leave others settings default
- Add Package for Cloud Composer
- add 3 packages (pymysql, requests, pandas)
- After finish creating Cloud Composer then go to Airflow webserver