The Hydra platform provides a streamlined data streaming experience by abstracting away the underlying implementation, instead providing users a simple REST API.
A set of common util-based classes meant to be shared across the entire Hydra ecosystem, including hydra-spark and dispatch modules.
Core, shared traits and classes that define both the ingestion and transport protocols in Hydra. This module is a dependency to any ingestor or transport implementation.
This also includes Avro schema resolution, validation, and management.
The ingestion implementation, including HTTP endpoints, actors, communication protocols, registration of ingestors, and a lot of other stuff.
A Transport implementation that replicates messages into Kafka.
Hydra is built using SBT. To build Hydra, run:
sbt clean compile
We have a development MSK and Schema Registry Cluster running in the eplur-staging AWS account. Access to this cluster is granted via IAM to the exp_adapt_dvs_set
role.
- Create a .env from the example.env template.
- Update the .env file with your AWS Credentials. Those can be gathered in AWS Identity Center.
- Use the Makefile to build and deploy Hydra Publish into a local Docker container.
You can test Hydra has started by going to this resource:
http://localhost:8088/health
You should see something like:
{
"BuildInfo": {
"builtAtMillis": "1639518050466",
"name": "hydra-ingest",
"scalaVersion": "2.12.11",
"version": "0.11.3.979",
"sbtVersion": "1.3.13",
"builtAtString": "2021-12-14 21:40:50.466"
},
"ConsumerGroupisActive": {
"ConsumerGroupName": "v2MetadataConsumer",
"State": true
}
}
Path | HTTP Method | Description |
---|---|---|
/health | GET | A summary overview of the overall health of the system. Includes health checks for Kafka. |
/metrics | GET | A collection of JVM-related memory and thread management metrics, including deadlocked threads, garbage collection run times, etc. |
Path | HTTP Method | Description |
---|---|---|
/schemas | GET | Returns a list of all schemas managed by Hydra. |
/schemas/[NAME] | GET | Returns information for the latest version of a schema. |
/schemas/[NAME]?schema | GET | Returns only the JSON for the latest schema. |
/schemas/[NAME]/versions/ | GET | Returns all versions for a schema. |
/schemas/[NAME/versions/[VERSION] | GET | Returns metadata for a specific version of a schema |
/schemas | POST | Registers a new schema with the registry. Use this for both new and existing schemas; for existing schemas, compatibility will be checked prior to registration. |
Path | HTTP Method | Description |
---|---|---|
/v2/topics | GET | A list of all the registered topics currently managed by Hydra. |
/v2/topics/[NAME] | GET | Get the current schema for requested topic. |
/v2/topics/[NAME] | POST | Create or update custom topics. Also registers key and value schemas in Schema Registry if applicable. |
/v2/topics/[NAME] | DELETE | Delete topics. Requires authentication. |
Path | HTTP Method | Description |
---|---|---|
/v2/topics/[NAME]/records | POST | Creates a new record in the specified topic. |
"If you want to make a Kafka topic from scratch, you must first invent the universe." -Carl Sagan
We are using this topic to test:
{
"streamType": "Entity",
"deprecated": false,
"dataClassification": "InternalUseOnly",
"contact": {
"email": "john.doe@email.com"
},
"createdDate": "2022-01-25T12:00:00Z",
"notes": "Here are some notes.",
"parentSubjects": [],
"teamName": "team-john-doe",
"schemas": {
"key": {
"type": "record",
"name": "key",
"namespace": "",
"fields": [
{
"name": "id",
"type": {
"type":"string",
"logicalType":"uuid"
},
"doc":"This is a doc field."
}
]
},
"value": {
"type": "record",
"name": "val",
"namespace": "dvs.data_platform.dvs_sandbox",
"fields": [
{
"name": "myValue",
"type": "string",
"doc":"It is my value."
}
]
}
}
}
curl -X POST localhost:8088/topics/tech.my-first-topic -d '{
"streamType": "Entity",
"deprecated": false,
"dataClassification": "InternalUseOnly",
"contact": {
"email": "john.doe@email.com"
},
"createdDate": "2022-01-25T12:00:00Z",
"notes": "Here are some notes.",
"parentSubjects": [],
"teamName": "team-john-doe",
"schemas": {
"key": {
"type": "record",
"name": "Test2",
"namespace": "",
"fields": [
{
"name": "id",
"type": {
"type":"string",
"logicalType":"uuid"
},
"doc":"This is a doc field."
}
]
},
"value": {
"type": "record",
"name": "Test2",
"namespace": "dvs.data_platform.dvs_sandbox",
"fields": [
{
"name": "myValue",
"type": "string",
"doc":"It is my value."
}
]
}
}
}'
You should see something like this:
OK
curl -X POST -d '{"key": {"id":"7db11b7a-4560-4a86-b00b-6f380bfb1564"}, "value":{"myValue":"someValue"}}' -H 'Content-Type: application/json' 'http://localhost:8088/v2/topics/tech.my-first-topic/records'
A sample Hydra response is:
{"offset":0,"partition":6}
Hydra validates payloads against the underlying schema. For instance:
curl -X POST -d '{"key": {"id":"123"}, "value":{"myValue":"someValue"}}' -H 'Content-Type: application/json' 'http://localhost:8088/v2/topics/tech.my-first-topic/records'
Should return:
hydra.avro.convert.StringToGenericRecord$InvalidLogicalTypeError: Invalid logical type.
Expected UUID but received 123
[http://schema-registry:8081/subjects/tech.my-first-topic-key/versions/latest/schema]
You may have noticed that the JSON above uses "/v2" endpoints. Most /v2 endpoints have /v1 counterparts, but we highly recommend using /v2 as /v1 endpoints are deprecated. If you need to use "/v1" endpoints, simply remove the "/v2" segment.
We used to highly recommend checking out the project documentation here, but then we forgot about it for two years. We might get around to updating it in the future.
There you can find the "latest" documentation about the Hydra project, including examples, API endpoints, and a lot more info on how to get started.
This README file only contains basic definitions and set up instructions.
Contributions via Github Pull Request are welcome.
Profiling software provided by
YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.
Try using the gitter chat link above!
Apache 2.0, see LICENSE.md