Skip to content
Rodrigo Fuentes edited this page Aug 17, 2023 · 6 revisions

VMS

VMS Ingestion Types

NAME Description
API Use API from provider to download the data.
API Push The provider pushes positions data thru API Ingest
NAF MAIL Receive emails from provider using the SMTP server in the body or as attachment.
WEBSITE DOWNLOAD Scrape or download data from the provider website.

API Pull

There's a client on our side that consumes the provider API and store the positions in GCS buckets. This client could be in two places:

  • the pipe-vms-{country} repository and triggered from the ingestion DAG or
  • In a VM vms-services-which-requires-extrenal-ip that has a static public IP assigned that is white-listed in the provider's firewalls.

API Push

The provider pushes positions records to our API Ingest endpoints. The API Ingest is responsible for validation the position record and storing those positions in a GCS bucket gs://api-ingest/received/{appId}/{YYYY-MM-DD}T{HH:mm:ss.ms}Z-{positionId}.json

NAF MAIL

The SMTP-SERVER code can be found in the smtp-server repository. And the configuration can be found in the deploy repository (due to the sensitive information, this repo is not public. The SMTP server is hosted in the skytruth-pelagos-production project.

Raw Data

GCS

The raw data, without manipulation is stored as a backup in a GCS bucket created for that given country in the skytruth-pelagos-production project. The convention of the name of the buckets is: gfw-raw-data-vms-[ISO-3-COUNTRY]-[zone] where zone is central all the time. We are not using multiregion buckets.

We are using terraform to create the buckets. And we are switching to have one Service Account per country to access the data from those buckets.

BQ

We store the RAW data in Datasets in BQ in world-fishing-827 with the prefix VMS-[Country Name]. (There are two legacy exceptions, that we have never changed because of the time needed: Peru_VMS and KKP_Indonesia). The data from these datasets is the one used as an input for the pipes.

VMS Pipeline Orquestration

All pipelines are running in the Composer Production instance. And the dags are located in the composer-dags-production repository.

VMS Normalization Pipelines

The VMS pipelines repositories follow the convention pipe-vms-[country-name]. The normalization for some countries is being done with BQ transformation and others is done in Dataflow.

BQ transformation can be found in the /assests/*normalized*.sql eg: /assests/fetch-normalized-vms.sql.j2 Dataflow transformation can be found in the /pipe_vms_[country-name]/*normalize*

BQ conventions