This repository has been archived by the owner on Feb 2, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
Augmenter Importer specification
Rémy Greinhofer edited this page May 11, 2019
·
5 revisions
The goal of the augmenters is to automatically augment the raw data sets with extra information.
An example is the geocoding augmenter which adds coordinates to a fatality entry.
An augmenter is in charge of generating data to augment a raw data set. It generates a file containing set of extra data that will be injected into the raw data set. This file is called an "augmentation".
The way it works is very similar to generating database migrations and applying them. "Augmentations" are applied using the scrapd-merger
tool.
- Augmenters must be written in Python or Go.
- Augmenters should not have any external dependency other than what ScrAPD uses (if written in Python).
- Augmenters must have unit as well as integration tests (if applicable).
- Augmenters must generate a file containing the data to be injected into a fatality case.
- A file containing the data to be injected is called an augmentation.
- An augmentation is applied using the
scrapd-merger
.- The key for matching the information is
"Case"
. - If
"Case"
is not found, the entry is ignored.
- The key for matching the information is
The augmenters must implement the following flags:
- Include entries without results (they are excluded by default)
-e, --empties
- Include entries which don't match an existing entry
-x, --extras
- Add existing augmentation to avoid reprocessing entries that where previously processed
--augmentations [augmentation_1, ...]
- A JSON file representing the raw data set.
- The format is a list of objects.
- On-screen or in a file
- The format is a list of objects:
[
{
"Case": "19-0400694",
"Latitude": 30.303625,
"Longitude": -97.67139
},
{
"Case": "19-0370320",
"Latitude": 30.243967,
"Longitude": -97.764366
}
]
- The general format is:
scrapd-{type}-{operation_or_datatype}-{service}
.-
type
: tool type (augmenter
orimporter
) -
operation_or_datatype
: the type of operation performed by the augmenter or the type of data added to the data set -
service
: the name of the service used to perform the operation or retrieve the data
-
- All the components of the name must be in lower case
- scrapd-augmenter-geocoding-geocensus.py
- scrapd-augmenter-geocoding-tamu.py
- scrapd-importer-dataset-apd.py
The naming convention for the augmentations is very similar to the augmenters one, EXCEPT it MUST include the year:
- The general format is:
augmentation-{operation_or_datatype}-{service}-{year}
.-
operation_or_datatype
: the type of operation performed by the augmenter or the type of data added to the data set -
service
: the name of the service used to perform the operation or retrieve the data
-
- All the components of the name must be in lower case
- augmentation-geocoding-geocensus-2017.json
- augmentation-geocoding-tamu-2017.json
- augmentation-import-apd-2017.json