Skip to content
This repository has been archived by the owner on Feb 2, 2022. It is now read-only.

Augmenter Importer specification

Rémy Greinhofer edited this page Apr 19, 2019 · 5 revisions

Goal

The goal of the augmenters is to automatically augment the raw data sets with extra information.

An example is the geocoding augmenter which adds coordinates to a fatality entry.

How it works

An augmenter reads a data set, uses the content to perform operations or to query services, and returns the augmented data.

Conventions

General

  • Augmenters must have the option to be piped together:
cat fatalities-2019-raw.json | scrapd-augmenter-geocoding-geocensus | scrapd-augmenter-geocoding-tamu
  • Augmenters must have the option to update the data in place:
python scrapd-augmenter-geocoding-geocensus.py -i fatalities-all-augmented.json
  • Augmenters must be able to read from stdin.
  • Augmenters must be written in Python or Go.
  • Augmenters should not have any external dependency other than what ScrAPD uses (if written in Python).
  • Augmenters must have unit and integration tests.

Input

  • A JSON file representing the data set.
  • The internal format is a list of objects.

Output

  • on-screen or in-place

Naming

  • The general format is: scrapd-{type}-{operation_or_datatype}-{service}.
    • type: tool type (augmenter or importer)
    • operation_or_datatype: the type of operation performed by the augmenter or the type of data added to the data set
    • service: the name of the service used to perform the operation or retrieve the data
  • All the components of the name must be in lower case

Examples

  • scrapd-augmenter-geocoding-geocensus
  • scrapd-augmenter-geocoding-tamu
  • scrapd-importer-dataset-apd
Clone this wiki locally