The hierarchy builder is a service that forms part of the dataset import process. It requires a 'full' hierarchy to be available for the dataset you are importing
Run the dp-hierarchy-builder
service:
make debug
The import scripts rely on the CLI tool for the database you're targeting:
cypher-shell
for Neo4j / Cypher importgremlin.sh
for Neptune / Gremlin imports
You may also need an SSH tunnel to the database if it's not running locally.
Example import for the CPIH full hierarchy: (replace the filename for other hierarchy import files)
Cypher:
cypher-shell < import-scripts/cypher/cpih1dim1aggid.cypher
Gremlin:
./gremlin-import.sh import-scripts/gremlin/cpih1dim1aggid.grm
For Cypher imports, you can use additional flags if running against an environment other than localhost:
cypher-shell -u USER -p PASSWORD -a bolt://localhost:7687 < .....
Scripts for updating and debugging Kafka can be found here(dp-data-tools)
Environment variable | Default | Description |
---|---|---|
BIND_ADDR | :22700 | The host and port to bind to |
KAFKA_ADDR | localhost:9092 | A list of Kafka host addresses |
KAFKA_VERSION | 1.0.2 |
The kafka version that this service expects to connect to |
KAFKA_OFFSET_OLDEST | true | sets kafka offset to oldest if true |
KAFKA_SEC_PROTO | unset | if set to TLS , kafka connections will use TLS [1] |
KAFKA_SEC_CLIENT_KEY | unset | PEM for the client key [1] |
KAFKA_SEC_CLIENT_CERT | unset | PEM for the client certificate [1] |
KAFKA_SEC_CA_CERTS | unset | CA cert chain for the server cert [1] |
KAFKA_SEC_SKIP_VERIFY | false | ignores server certificate issues if true [1] |
CONSUMER_GROUP | dp-hierarchy-builder | The name of the Kafka consumer group |
CONSUMER_TOPIC | observations-imported | The name of the topic to consumes messages from |
PRODUCER_TOPIC | hierarchy-built | The name of the topic to produces messages to |
ERROR_PRODUCER_TOPIC | import-error | The name of the topic to send error messages to |
GRACEFUL_SHUTDOWN_TIMEOUT | time.Second * 10 | Time time to wait when gracefully shutting down before closing |
HEALTHCHECK_INTERVAL | 30s | The time between doing health checks |
HEALTHCHECK_CRITICAL_TIMEOUT | 90s | The time taken for the health changes from warning state to critical due to subsystem check failures |
Plus the graph database vars from dp-graph - namely GRAPH_DRIVER_TYPE
and GRAPH_ADDR
Notes:
Environment variable | Default | Description |
---|---|---|
GRAPH_DRIVER_TYPE | "" | string identifier for the implementation to be used (e.g. 'neptune' or 'mock') |
GRAPH_ADDR | "" | address of the database matching the chosen driver type (web socket) |
NEPTUNE_TLS_SKIP_VERIFY | false | flag to skip TLS certificate verification, should only be true when run locally |
NEPTUNE_TLS_SKIP_VERIFY
to true. See our Neptune guide for more details.
The /healthcheck
endpoint returns the current status of the service. Dependent services are health checked on an interval defined by the HEALTHCHECK_INTERVAL
environment variable.
On a development machine a request to the health check endpoint can be made by:
curl localhost:22700/healthcheck
There are a number of utility applications for manual tasks (found under the cmd directory):
- v4-transformer - take a V4 file and create a full hierarchy input file / cypher script
- geography-transformer - take a geography input CSV file and output a full hierarchy input file / cypher script
- hierarchy-transformer - take a hierarchy input CSV file and generate cypher script
- builder - builds an instance hierarchy from a full hierarchy
- producer - produces a Kafka message for the dp-hierarchy-builder process to consume
- Manually create instance hierarchy for CPIH - note you will have to replace the value for 'instance-id'
go run cmd/builder/main.go --instance-id 27a4019f-6491-4876-bbdd-1439a40e5bb9 --dimension-name aggregate --code-list-id e44de4c4-d39e-4e2f-942b-3ca10584d078
- Manually create instance hierarchy for mid-year-pop-est - note you will have to replace the value for 'instance-id'
go run cmd/builder/main.go --instance-id 34b8c139-a1fe-45b1-95e2-e77df3682256 --dimension-name geography --code-list-id mid-year-pop-geography
If running one of the above commands against an environment, you can specify the neo4j URL with the flag (replacing USER, PASSSWORD, host, and port as required):
--neo-url="bolt://USER:PASSWORD@localhost:7687"
make FILE=./cmd/hierarchy-transformer/hierarchy.csv generate-full
output is written to ./cmd/hierarchy-transformer/output
make FILE=./cmd/geography-transformer/WD16_LAD16_CTY16_OTH_UK_LU.csv generate-full-from-geography
output is written to ``./cmd/geography-transformer/output`
codelistid=cpih1dim1aggid
go run cmd/code-label-parent-transformer/main.go --file import-scripts/code-label-parent-csv/$codelistid.csv --code-list-id $codelistid --output import-scripts/$codelistid.csv`
go run cmd/producer/main.go --instance-id '58004716-a2d4-4dd1-a6c3-6accab30ad2a' --code-list-id 'cpih1dim1aggid' --dimension-name 'aggregate'
See CONTRIBUTING for details.
Copyright © 2016-2021, Office for National Statistics (https://www.ons.gov.uk)
Released under MIT license, see LICENSE for details.