A fast, lightweight, POJO driven, Apache Avro serialization/deserialization library.
While designing this library, the following concerns were in mind:
-
POJO Driven: This is particularly useful for an already existing application that have a set of plain JAVA objects as API. We tried the avro core implementation, but the library adds and additional layer of indirection/de-referencing. Moreover, the mapping between the Record objects/dictionaries and the existing api can quickly turn into a fastidious boilerplate to maintain.
-
Generics and complex types: Existing JAVA API models often leverage the typing features that the core language offers. We need a library that would support any type, including types with generics.
-
Fast: Latency is a critical requirement for modern applications and real-time event processors. Having a library that performs at high standards, is definitely a criteria for its adoption.
-
Low memory footprint: Holding pressure from the heap by minimizing the created objects during the serialization / deserialization process support applications towards meeting their SLA. We need to have lowest possible memory footprint and offer an API for re-using existing objects when applicable.
Currently the library fulfills most of its design objectives and is performing better than many widely
adopted libraries. It has full support for Java generics and complex types (We've stress-tested it with
fields of this kind List<Map<String, Event<Map<String, String>, Model<A,B>>>>
and it passes).
From a performance perspective, Avrodite performs 3x-4x better than its closest mate, between 5x and 15x than Avro core Record API (See benchmarks reports).
Regarding usability, the interaction with Avrodite API would consist of something like so:
//API configuration would happen in your dependency injection layer
Avrodite<AvroStandard, AvroCodec<?>> avrodite = AvroStandardV19.avrodite()
.build();
//once you catch an avrodite instance you can get a codec for a given target as follows:
AvroCodec<Model> codec = avrodite.getCodec(Model.class);
//serialization and deserialization is then trivial
byte[] avroData = codec.encode(new Model());
Model decodedModel = codec.decode(avroData);
//When needed, you can get your schema from the codec instance like so:
Schema schema = codec.getSchema();
The library is modular and was designed with the idea of implementing other serialization formats in the future (JSON for example). Currently the modules consists of :
avrodite-api
: The global API that projects would depends on for serialization/deserialization.avrodite-avro
: An extension and implementation of the public API that provides AVRO custom API (api access to Schema instances for example).avrodite-tools
: Mainly a compile-time/build phase dependency that contains the necessary logic for introspecting your beans and plain objects API.avrodite-tools-avro
: Anavrodite-tools
plugin that generate custom classes for the AVRO binary format.avrodite-avro-maven-plugin
: A maven plugin that abstracts away from you theavrodite-tools
stack to generate your codecs classes during the build phase.avrodite-avro-benchmarks
: A test module that benchmarks the AVRO implementation against other libraries.
Refer to this document for detailed results.
(Higher is better)
Framework | T1 throughput [ ops/ms ] | T1 relative perf. | T2 throughput [ ops/ms ] | T2 relative perf. |
---|---|---|---|---|
avrodite | 1858 | 100.00% | 2208 | 100.00% |
protocolBuffers | 585 | 31.47% | 589 | 26.67% |
avroCoreNoHydration | 132 | 7.08% | 501 | 22.67% |
avroCoreWithHydration | 102 | 5.48% | 239 | 10.82% |
jacksonAvro | 87 | 4.68% | 158 | 7.14% |
jacksonJSON | 88 | 4.74% | 89 | 4.02% |
(Lower is better)
Framework | T1 Heap Allocation Rate [ Byte/op ] | T1 relative perf. | T2 Heap Allocation Rate [ Byte/op ] | T2 relative perf. |
---|---|---|---|---|
avrodite | 1024 | 100.00% | 1024 | 100.00% |
avroCoreNoHydration | 2248 | 219.53% | 2248 | 219.53% |
avroCoreWithHydration | 4840 | 472.66% | 4840 | 472.66% |
protocolBuffers | 6088 | 594.53% | 6088 | 594.53% |
jacksonJSON | 9472 | 925.00% | 9472 | 925.00% |
jacksonAvro | 16264 | 1588.28% | 16296 | 1591.41% |
The following features are planned for the future:
- Enums support (easy).
- Avro Union types (can be useful when you have a supertype with various children) (easy/medium).
- Map a model API evolution to Schema migration (most likely hard).
- Support other serialization formats: A good portion of the library (types introspection, codec compilation) can be re-used to support other formats such as JSON. (average difficulty, depends on format)
Copyright (c) 2020 Yassine Echabbi
Licensed under the Apache License, Version 2.0