Augmenter Importer specification

Goal

The goal of the augmenters is to automatically augment the raw data sets with extra information.

An example is the geocoding augmenter which adds coordinates to a fatality entry.

How it works

An augmenter is in charge of generating data to augment a raw data set. It generates a file containing set of extra data that will be injected into the raw data set. This file is called an "augmentation".

The way it works is very similar to generating database migrations and applying them. "Augmentations" are applied using the scrapd-merger tool.

Conventions

General

Augmenters must be written in Python or Go.
Augmenters should not have any external dependency other than what ScrAPD uses (if written in Python).
Augmenters must have unit as well as integration tests (if applicable).
Augmenters must generate a file containing the data to be injected into a fatality case.
A file containing the data to be injected is called an augmentation.
An augmentation is applied using the scrapd-merger.
- The key for matching the information is "Case".
- If "Case" is not found, the entry is ignored.

Options

The augmenters must implement the following flags:

Include entries without results (they are excluded by default)
- -e, --empties
Include entries which don't match an existing entry
- -x, --extras
Add existing augmentation to avoid reprocessing entries that where previously processed
- --augmentations [augmentation_1, ...]

Input

A JSON file representing the raw data set.
The format is a list of objects.

Output

On-screen or in a file
The format is a list of objects:

[
  {
    "Case": "19-0400694",
    "Latitude": 30.303625,
    "Longitude": -97.67139
  },
  {
    "Case": "19-0370320",
    "Latitude": 30.243967,
    "Longitude": -97.764366
  }
]

Augmenters naming

The general format is: scrapd-{type}-{operation_or_datatype}-{service}.
- type: tool type (augmenter or importer)
- operation_or_datatype: the type of operation performed by the augmenter or the type of data added to the data set
- service: the name of the service used to perform the operation or retrieve the data
All the components of the name must be in lower case

Examples

scrapd-augmenter-geocoding-geocensus.py
scrapd-augmenter-geocoding-tamu.py
scrapd-importer-dataset-apd.py

Augmentation naming

The naming convention for the augmentations is very similar to the augmenters one, EXCEPT it MUST include the year:

The general format is: augmentation-{operation_or_datatype}-{service}-{year}.
- operation_or_datatype: the type of operation performed by the augmenter or the type of data added to the data set
- service: the name of the service used to perform the operation or retrieve the data
All the components of the name must be in lower case

Examples

augmentation-geocoding-geocensus-2017.json
augmentation-geocoding-tamu-2017.json
augmentation-import-apd-2017.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Augmenter Importer specification

Goal

How it works

Conventions

General

Options

Input

Output

Augmenters naming

Examples

Augmentation naming

Examples

Clone this wiki locally