Skip to content

Commit

Permalink
Merge branch 'main' into zio-schema-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
khajavi committed Sep 9, 2023
2 parents d464145 + aa0770e commit a64a823
Show file tree
Hide file tree
Showing 21 changed files with 847 additions and 253 deletions.
85 changes: 34 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,93 +10,76 @@

## Introduction

ZIO Schema helps us to solve some of the most common problems in distributed computing, such as serialization, deserialization, and data migration.

It turns a compiled-time construct (the type of a data structure) into a runtime construct (a value that can be read, manipulated, and composed at runtime). A schema is a structure of a data type. ZIO Schema reifies the concept of structure for data types. It makes a high-level description of any data type and makes them first-class values.
Schema is a structure of a data type. ZIO Schema reifies the concept of structure for data types. It makes a high-level description of any data type and makes them as first-class values.

Creating a schema for a data type helps us to write codecs for that data type. So this library can be a host of functionalities useful for writing codecs and protocols like JSON, Protobuf, CSV, and so forth.

## What Problems Does ZIO Schema Solve?

With schema descriptions that can be automatically derived for case classes and sealed traits, _ZIO Schema_ will be going to provide powerful features for free:
With schema descriptions that can be automatically derived for case classes and sealed traits, _ZIO Schema_ will be going to provide powerful features for free (Note that the project is in the development stage and all these features are not supported yet):

1. Metaprogramming without macros, reflection, or complicated implicit derivations.
1. Creating serialization and deserialization codecs for any supported protocol (JSON, Protobuf, etc.)
2. Deriving standard type classes (`Eq`, `Show`, `Ordering`, etc.) from the structure of the data
3. Default values for data types
2. Automate ETL (Extract, Transform, Load) pipelines
1. Diffing: diffing between two values of the same type
2. Patching: applying a diff to a value to update it
3. Migration: migrating values from one type to another
3. Computations as data: Not only we can turn types into values, but we can also turn computations into values. This opens up a whole new world of possibilities concerning distributed computing.
- Codecs for any supported protocol (JSON, protobuf, etc.), so data structures can be serialized and deserialized in a principled way
- Diffing, patching, merging, and other generic-data-based operations
- Migration of data structures from one schema to another compatible schema
- Derivation of arbitrary type classes (`Eq`, `Show`, `Ord`, etc.) from the structure of the data

When our data structures need to be serialized, deserialized, persisted, or transported across the wire, then _ZIO Schema_ lets us focus on data modeling and automatically tackle all the low-level, messy details for us.

_ZIO Schema_ is used by a growing number of ZIO libraries, including [ZIO Flow](https://zio.dev/zio-flow), [ZIO Redis](https://zio-redis), [ZIO SQL](https://zio.dev/zio-sql) and [ZIO DynamoDB](https://zio.dev/zio-dynamodb).
_ZIO Schema_ is used by a growing number of ZIO libraries, including _ZIO Flow_, _ZIO Redis_, _ZIO Web_, _ZIO SQL_ and _ZIO DynamoDB_.

## Installation

In order to use this library, we need to add the following lines in our `build.sbt` file:

```scala
libraryDependencies += "dev.zio" %% "zio-schema" % "0.4.13"
libraryDependencies += "dev.zio" %% "zio-schema-avro" % "0.4.13"
libraryDependencies += "dev.zio" %% "zio-schema-bson" % "0.4.13"
libraryDependencies += "dev.zio" %% "zio-schema-json" % "0.4.13"
libraryDependencies += "dev.zio" %% "zio-schema-msg-pack" % "0.4.13"
libraryDependencies += "dev.zio" %% "zio-schema-protobuf" % "0.4.13"
libraryDependencies += "dev.zio" %% "zio-schema-thrift" % "0.4.13"
libraryDependencies += "dev.zio" %% "zio-schema-zio-test" % "0.4.13"

// Required for the automatic generic derivation of schemas
libraryDependencies += "dev.zio" %% "zio-schema-derivation" % "0.4.13"
// Required for automatic generic derivation of schemas
libraryDependencies += "dev.zio" %% "zio-schema-derivation" % "0.4.13",
libraryDependencies += "org.scala-lang" % "scala-reflect" % scalaVersion.value % "provided"
```

## Example

In this simple example first, we create a schema for `Person` and then run the _diff_ operation on two instances of the `Person` data type, and finally, we encode a Person instance using _Protobuf_ protocol:
In this simple example first, we create a schema for `Person` and then run the _diff_ operation on two instances of the `Person` data type, and finally we encode a Person instance using _Protobuf_ protocol:

```scala
import zio._
import zio.stream._
import zio.schema.codec.{BinaryCodec, ProtobufCodec}
import zio.console.putStrLn
import zio.schema.codec.ProtobufCodec._
import zio.schema.{DeriveSchema, Schema}
import zio.stream.ZStream
import zio.{Chunk, ExitCode, URIO}

final case class Person(name: String, age: Int)

final case class Person(name: String, age: Int, id: String)
object Person {
implicit val schema: Schema[Person] = DeriveSchema.gen
val protobufCodec: BinaryCodec[Person] = ProtobufCodec.protobufCodec
implicit val schema: Schema[Person] = DeriveSchema.gen[Person]
}

object Main extends ZIOAppDefault {
def run =
ZStream
.succeed(Person("John", 43))
.via(Person.protobufCodec.streamEncoder)
.runCollect
.flatMap(x =>
Console.printLine(s"Encoded data with protobuf codec: ${toHex(x)}")
)

def toHex(chunk: Chunk[Byte]): String =
chunk.map("%02X".format(_)).mkString
}
```
Person.schema

Here is the output of running the above program:
import zio.schema.syntax._

```scala
Encoded data with protobuf codec: 0A044A6F686E102B
Person("Alex", 31, "0123").diff(Person("Alex", 31, "124"))

def toHex(chunk: Chunk[Byte]): String =
chunk.toArray.map("%02X".format(_)).mkString

zio.Runtime.default.unsafe.run(
ZStream
.succeed(Person("Thomas", 23, "2354"))
.transduce(
encoder(Person.schema)
)
.runCollect
.flatMap(x => putStrLn(s"Encoded data with protobuf codec: ${toHex(x)}"))
).getOrThrowFiberFailure()
```


## Resources

- [Zymposium - ZIO Schema](https://www.youtube.com/watch?v=GfNiDaL5aIM) by John A. De Goes, Adam Fraser, and Kit Langton (May 2021)
- [ZIO SCHEMA: A Toolkit For Functional Distributed Computing](https://www.youtube.com/watch?v=lJziseYKvHo&t=481s) by Dan Harris (Functional Scala 2021)
- [Creating Declarative Query Plans With ZIO Schema](https://www.youtube.com/watch?v=ClePN4P9_pg) by Dan Harris (ZIO World 2022)
- [Describing Data...with free applicative functors (and more)](https://www.youtube.com/watch?v=oRLkb6mqvVM) by Kris Nuttycombe (Scala World) on the idea behind the [xenomorph](https://github.com/nuttycom/xenomorph) library
- [Zymposium - ZIO Schema](https://www.youtube.com/watch?v=GfNiDaL5aIM) by John A. De Goes, Adam Fraser and Kit Langton (May 2021)

## Documentation

Expand Down
4 changes: 2 additions & 2 deletions tests/shared/src/test/scala-2/zio/schema/MetaSchemaSpec.scala
Original file line number Diff line number Diff line change
Expand Up @@ -289,12 +289,12 @@ object MetaSchemaSpec extends ZIOSpecDefault {
check(SchemaGen.anyCaseClassSchema) { schema =>
assert(MetaSchema.fromSchema(schema).toSchema)(hasSameSchemaStructure(schema))
}
},
} @@ TestAspect.ignore, //annotations are missing in the meta schema
test("sealed trait") {
check(SchemaGen.anyEnumSchema) { schema =>
assert(MetaSchema.fromSchema(schema).toSchema)(hasSameSchemaStructure(schema))
}
},
} @@ TestAspect.ignore, //annotations are missing in the meta schema
test("recursive type") {
check(SchemaGen.anyRecursiveType) { schema =>
assert(MetaSchema.fromSchema(schema).toSchema)(hasSameSchemaStructure(schema))
Expand Down
129 changes: 95 additions & 34 deletions zio-schema-avro/shared/src/main/scala/zio/schema/codec/AvroCodec.scala
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,13 @@ import zio.{ Chunk, Unsafe, ZIO }

object AvroCodec {

implicit def schemaBasedBinaryCodec[A](implicit schema: Schema[A]): BinaryCodec[A] =
new BinaryCodec[A] {
trait ExtendedBinaryCodec[A] extends BinaryCodec[A] {
def encodeGenericRecord(value: A)(implicit schema: Schema[A]): GenericData.Record
def decodeGenericRecord(value: GenericRecord)(implicit schema: Schema[A]): Either[DecodeError, A]
}

implicit def schemaBasedBinaryCodec[A](implicit schema: Schema[A]): ExtendedBinaryCodec[A] =
new ExtendedBinaryCodec[A] {

val avroSchema: SchemaAvro =
AvroSchemaCodec.encodeToApacheAvro(schema).getOrElse(throw new Exception("Avro schema could not be generated."))
Expand Down Expand Up @@ -59,6 +64,12 @@ object AvroCodec {
decode(chunk).map(Chunk(_))
)
}

override def encodeGenericRecord(value: A)(implicit schema: Schema[A]): GenericData.Record =
encodeValue(value, schema).asInstanceOf[GenericData.Record]

override def decodeGenericRecord(value: GenericRecord)(implicit schema: Schema[A]): Either[DecodeError, A] =
decodeValue(value, schema)
}

private def decodeValue[A](raw: Any, schema: Schema[A]): Either[DecodeError, A] = schema match {
Expand Down Expand Up @@ -209,16 +220,31 @@ object AvroCodec {
private def decodeCaseClass1[A, Z](raw: Any, schema: Schema.CaseClass1[A, Z]) =
decodeValue(raw, schema.field.schema).map(schema.defaultConstruct)

private def decodeEnum[Z](raw: Any, cases: Schema.Case[Z, _]*): Either[DecodeError, Any] = {
val generic = raw.asInstanceOf[GenericData.Record]
val enumCaseName = generic.getSchema.getFullName
val enumCaseValue = generic.get("value")
private def decodeEnum[Z](raw: Any, cases: Schema.Case[Z, _]*): Either[DecodeError, Any] =
raw match {
case enums: GenericData.EnumSymbol =>
decodeGenericEnum(enums.toString, None, cases: _*)
case gr: GenericData.Record =>
val enumCaseName = gr.getSchema.getFullName
if (gr.hasField("value")) {
val enumCaseValue = gr.get("value")
decodeGenericEnum[Z](enumCaseName, Some(enumCaseValue), cases: _*)
} else {
decodeGenericEnum[Z](enumCaseName, None, cases: _*)
}
case _ => Left(DecodeError.MalformedFieldWithPath(Chunk.single("Error"), s"Unknown enum: $raw"))
}

private def decodeGenericEnum[Z](
enumCaseName: String,
enumCaseValue: Option[AnyRef],
cases: Schema.Case[Z, _]*
): Either[DecodeError, Any] =
cases
.find(_.id == enumCaseName)
.map(s => decodeValue(enumCaseValue, s.schema))
.map(s => decodeValue(enumCaseValue.getOrElse(s), s.schema))
.toRight(DecodeError.MalformedFieldWithPath(Chunk.single("Error"), s"Unknown enum value: $enumCaseName"))
.flatMap(identity)
}

private def decodeRecord[A](value: A, schema: Schema.Record[_]) = {
val record = value.asInstanceOf[GenericRecord]
Expand Down Expand Up @@ -454,41 +480,41 @@ object AvroCodec {
else decodeValue(value, schema).map(Some(_))

private def encodeValue[A](a: A, schema: Schema[A]): Any = schema match {
case Schema.Enum1(_, c1, _) => encodeEnum(a, c1)
case Schema.Enum2(_, c1, c2, _) => encodeEnum(a, c1, c2)
case Schema.Enum3(_, c1, c2, c3, _) => encodeEnum(a, c1, c2, c3)
case Schema.Enum4(_, c1, c2, c3, c4, _) => encodeEnum(a, c1, c2, c3, c4)
case Schema.Enum5(_, c1, c2, c3, c4, c5, _) => encodeEnum(a, c1, c2, c3, c4, c5)
case Schema.Enum6(_, c1, c2, c3, c4, c5, c6, _) => encodeEnum(a, c1, c2, c3, c4, c5, c6)
case Schema.Enum1(_, c1, _) => encodeEnum(schema, a, c1)
case Schema.Enum2(_, c1, c2, _) => encodeEnum(schema, a, c1, c2)
case Schema.Enum3(_, c1, c2, c3, _) => encodeEnum(schema, a, c1, c2, c3)
case Schema.Enum4(_, c1, c2, c3, c4, _) => encodeEnum(schema, a, c1, c2, c3, c4)
case Schema.Enum5(_, c1, c2, c3, c4, c5, _) => encodeEnum(schema, a, c1, c2, c3, c4, c5)
case Schema.Enum6(_, c1, c2, c3, c4, c5, c6, _) => encodeEnum(schema, a, c1, c2, c3, c4, c5, c6)
case Schema.Enum7(_, c1, c2, c3, c4, c5, c6, c7, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7)
case Schema.Enum8(_, c1, c2, c3, c4, c5, c6, c7, c8, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8)
case Schema.Enum9(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9)
case Schema.Enum10(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10)
case Schema.Enum11(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11)
case Schema.Enum12(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12)
case Schema.Enum13(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13)
case Schema.Enum14(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14)
case Schema.Enum15(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15)
case Schema.Enum16(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16)
case Schema.Enum17(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17)
case Schema.Enum18(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18)
case Schema.Enum19(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19)
case Schema
.Enum20(_, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, _) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20)
encodeEnum(schema, a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20)
case Schema.Enum21(
_,
c1,
Expand All @@ -514,7 +540,31 @@ object AvroCodec {
c21,
_
) =>
encodeEnum(a, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21)
encodeEnum(
schema,
a,
c1,
c2,
c3,
c4,
c5,
c6,
c7,
c8,
c9,
c10,
c11,
c12,
c13,
c14,
c15,
c16,
c17,
c18,
c19,
c20,
c21
)
case Schema.Enum22(
_,
c1,
Expand Down Expand Up @@ -542,6 +592,7 @@ object AvroCodec {
_
) =>
encodeEnum(
schema,
a,
c1,
c2,
Expand Down Expand Up @@ -580,9 +631,10 @@ object AvroCodec {
case Schema.Optional(schema, _) => encodeOption(schema, a)
case Schema.Tuple2(left, right, _) =>
encodeTuple2(left.asInstanceOf[Schema[Any]], right.asInstanceOf[Schema[Any]], a)
case Schema.Either(left, right, _) => encodeEither(left, right, a)
case Schema.Lazy(schema0) => encodeValue(a, schema0())
case Schema.CaseClass0(_, _, _) => encodePrimitive((), StandardType.UnitType)
case Schema.Either(left, right, _) => encodeEither(left, right, a)
case Schema.Lazy(schema0) => encodeValue(a, schema0())
case Schema.CaseClass0(_, _, _) =>
encodeCaseClass(schema, a, Seq.empty: _*) //encodePrimitive((), StandardType.UnitType)
case Schema.CaseClass1(_, f, _, _) => encodeCaseClass(schema, a, f)
case Schema.CaseClass2(_, f0, f1, _, _) => encodeCaseClass(schema, a, f0, f1)
case Schema.CaseClass3(_, f0, f1, f2, _, _) => encodeCaseClass(schema, a, f0, f1, f2)
Expand Down Expand Up @@ -926,11 +978,20 @@ object AvroCodec {
record
}

private def encodeEnum[Z](value: Z, cases: Schema.Case[Z, _]*): Any = {
private def encodeEnum[Z](schemaRaw: Schema[Z], value: Z, cases: Schema.Case[Z, _]*): Any = {
val schema = AvroSchemaCodec
.encodeToApacheAvro(schemaRaw)
.getOrElse(throw new Exception("Avro schema could not be generated for Enum."))
val fieldIndex = cases.indexWhere(c => c.deconstructOption(value).isDefined)
if (fieldIndex >= 0) {
val subtypeCase = cases(fieldIndex)
encodeValue(subtypeCase.deconstruct(value), subtypeCase.schema.asInstanceOf[Schema[Any]])
if (schema.getType == SchemaAvro.Type.ENUM) {
GenericData.get.createEnum(schema.getEnumSymbols.get(fieldIndex), schema)
} else {

encodeValue(subtypeCase.deconstruct(value), subtypeCase.schema.asInstanceOf[Schema[Any]])

}
} else {
throw new Exception("Could not find matching case for enum value.")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -517,8 +517,8 @@ object AvroSchemaCodec extends AvroSchemaCodec {
}

def hasAvroEnumAnnotation(annotations: Chunk[Any]): Boolean = annotations.exists {
case AvroAnnotations.avroEnum => true
case _ => false
case AvroAnnotations.avroEnum() => true
case _ => false
}

def wrapAvro(schemaAvro: SchemaAvro, name: String, marker: AvroPropMarker): SchemaAvro = {
Expand Down
Loading

0 comments on commit a64a823

Please sign in to comment.