Provides a component that can act as a health check for high availability deployments. The HealthCheck component
returns a summary status of either healthy
, degraded
, or unhealthy
for the endpoint.
The intended semantics of these statuses
map to the corresponding Envoy statuses, see
the Envoy documentation for more details,
but in short degraded
maps to "endpoint is impaired, do not use unless you have no other choice" and unhealthy
maps to "endpoint is hard
down/should not be used under any circumstances".
The returned status is the worst of the status values from the individual sub-checks:
-
The datastore clusters' own health statuses. The datastore clusters reflect their status as green/yellow/red. See the Elasticsearch documentation for details on the meaning of these statuses.
green
maps tohealthy
,yellow
todegraded
, andred
tounhealthy
.
-
The recency of data present in ElasticGraph indices. The HealthCheck configuration specifies the expected "max recency" for items within an index.
- If no records have been indexed within the specified period, the HealthCheck component will consider the index to be in a
degraded
status.
- If no records have been indexed within the specified period, the HealthCheck component will consider the index to be in a
As mentioned above, the returned status is the worst status of these two checks. E.g. if the datastore cluster(s) are all green
, but a recency check fails, the
overall status will be degraded
. If the recency checks pass, but at least one datastore cluster is red
, an unhealthy
status will be returned.
To use, simply register the EnvoyExtension
when defining your schema:
require(envoy_extension_path = "elastic_graph/health_check/envoy_extension")
schema.register_graphql_extension ElasticGraph::HealthCheck::EnvoyExtension,
defined_at: envoy_extension_path,
http_path_segment: "/_status"
These checks are configurable. The following configuration will be used as an example:
health_check:
clusters_to_consider: ["widgets-cluster"]
data_recency_checks:
Widget:
timestamp_field: createdAt
expected_max_recency_seconds: 30
clusters_to_consider
configures the first check (datastore cluster health), and specifies which clusters' health status is monitored.data_recency_checks
configures the second check (data recency), and configures the recency check described above. In this example, if no new "Widgets" are indexed for thirty seconds (perhaps because of an infrastructure issue), adegraded
status will be returned.- Note that this setting is most appropriate for types where you expect a steady stream of indexing (and where the absence of new records is indicative of some kind of failure).
A given ElasticGraph GraphQL endpoint does not necessarily have access to all datastore clusters - more specifically, the endpoint will only have access
to clusters present in the datastore.clusters
configuration map.
If a health check is configured for either a cluster or type that the GraphQL endpoint does not have access to, the respective check will be skipped. This is appropriate, as since the GraphQL endpoint does not have access to the cluster/type, the cluster's/type's health is immaterial.
For example, with the following configuration:
datastore:
clusters:
widgets-cluster: { ... }
# components-cluster: { ... } ### Not available, commented out.
health_check:
clusters_to_consider: ["widgets-cluster", "components-cluster"]
data_recency_checks:
Component:
timestamp_field: createdAt
expected_max_recency_seconds: 10
Widget:
timestamp_field: createdAt
expected_max_recency_seconds: 30
... the components-cluster
datastore status health check will be skipped, as will the Component recency check. However the widgets-cluster
/Widget
health
checks will proceed as normal.