Recap reads and writes schemas from web services, databases, and schema registries in a standard format.
⭐️ If you like this project, please give it a star! It helps the project get more visibility.
Format | Read | Write |
---|---|---|
Avro | ✅ | ✅ |
BigQuery | ✅ | |
Confluent Schema Registry | ✅ | |
Hive Metastore | ✅ | |
JSON Schema | ✅ | ✅ |
MySQL | ✅ | |
PostgreSQL | ✅ | |
Protobuf | ✅ | ✅ |
Snowflake | ✅ | |
SQLite | ✅ |
Install Recap and all of its optional dependencies:
pip install 'recap-core[all]'
You can also select specific dependencies:
pip install 'recap-core[avro,kafka]'
See pyproject.toml
for a list of optional dependencies.
Recap comes with a command line interface that can list and read schemas from external systems.
List the children of a URL:
recap ls postgresql://user:pass@host:port/testdb
[
"pg_toast",
"pg_catalog",
"public",
"information_schema"
]
Keep drilling down:
recap ls postgresql://user:pass@host:port/testdb/public
[
"test_types"
]
Read the schema for the test_types
table as a Recap struct:
recap schema postgresql://user:pass@host:port/testdb/public/test_types
{
"type": "struct",
"fields": [
{
"type": "int64",
"name": "test_bigint",
"optional": true
}
]
}
Recap comes with a stateless HTTP/JSON gateway that can list and read schemas from data catalogs and databases.
Start the server at http://localhost:8000:
recap serve
List the schemas in a PostgreSQL database:
curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
["pg_toast","pg_catalog","public","information_schema"]
And read a schema:
curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}
The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.
An OpenAPI schema is available at http://localhost:8000/docs.
You can store schemas in Recap's schema registry.
Start the server at http://localhost:8000:
recap serve
Put a schema in the registry:
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
http://localhost:8000/registry/some_schema
Get the schema (and version) from the registry:
curl http://localhost:8000/registry/some_schema
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
Put a new version of the schema in the registry:
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
http://localhost:8000/registry/some_schema
List schema versions:
curl http://localhost:8000/registry/some_schema/versions
[1,2]
Get a specific version of the schema:
curl http://localhost:8000/registry/some_schema/versions/1
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
The registry uses fsspec to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the registry docs for more details.
An OpenAPI schema is available at http://localhost:8000/docs.
Recap has recap.converters
and recap.clients
packages.
- Converters convert schemas to and from Recap schemas.
- Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.
Read a schema from PostgreSQL:
from recap.clients import create_client
with create_client("postgresql://user:pass@host:port/testdb") as c:
c.schema("testdb", "public", "test_types")
Convert the schema to Avro, Protobuf, and JSON schemas:
from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverter
avro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)
Transpile schemas from one format to another:
from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverter
json_schema = """
{
"type": "object",
"$id": "https://recap.build/person.schema.json",
"properties": {
"name": {"type": "string"}
}
}
"""
# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)
Store schemas in Recap's schema registry:
from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntType
storage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
"postgresql://localhost:5432/testdb/public/test_table",
StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")
# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")
# List all schemas in the registry
schemas = storage.ls()
Recap's gateway and registry are also available as a Docker image:
docker run \
-p 8000:8000 \
-e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
ghcr.io/recap-build/recap:latest
See Recap's Docker documentation for more details.
See Recap's type spec for details on Recap's type system.
Recap's documentation is available at recap.build.