-
Notifications
You must be signed in to change notification settings - Fork 0
Home
NAME | Description |
---|---|
API | Use API from provider to download the data. |
API Push | The provider pushes positions data thru API Ingest |
NAF MAIL | Receive emails from provider using the SMTP server in the body or as attachment. |
WEBSITE DOWNLOAD | Scrape or download data from the provider website. |
There's a client on our side that consumes the provider API and store the positions in GCS buckets. This client could be in two places:
- the pipe-vms-{country} repository and triggered from the ingestion DAG or
- In a VM
vms-services-which-requires-extrenal-ip
that has a static public IP assigned that is white-listed in the provider's firewalls.
The provider pushes positions records to our API Ingest endpoints. The API Ingest is responsible for validation the position record and storing those positions in a GCS bucket gs://api-ingest/received/{appId}/{YYYY-MM-DD}T{HH:mm:ss.ms}Z-{positionId}.json
The SMTP-SERVER code can be found in the smtp-server repository. And the configuration can be found in the deploy repository (due to the sensitive information, this repo is not public. The SMTP server is hosted in the skytruth-pelagos-production
project.
GCS
The raw data, without manipulation is stored as a backup in a GCS bucket created for that given country in the skytruth-pelagos-production
project. The convention of the name of the buckets is: gfw-raw-data-vms-[ISO-3-COUNTRY]-[zone]
where zone is central
all the time. We are not using multiregion buckets.
We are using terraform to create the buckets. And we are switching to have one Service Account per country to access the data from those buckets.
BQ
We store the RAW data in Datasets in BQ in world-fishing-827
with the prefix VMS-[Country Name]
. (There are two legacy exceptions, that we have never changed because of the time needed: Peru_VMS
and KKP_Indonesia
). The data from these datasets is the one used as an input for the pipes.
All pipelines are running in the Composer Production instance. And the dags are located in the composer-dags-production repository.
The VMS pipelines repositories follow the convention pipe-vms-[country-name]
. The normalization for some countries is being done with BQ transformation and others is done in Dataflow.
BQ transformation can be found in the /assests/*normalized*.sql
eg: /assests/fetch-normalized-vms.sql.j2
Dataflow transformation can be found in the /pipe_vms_[country-name]/*normalize*