Skip to content

Commit

Permalink
feat: implement mismatched model id + improvement
Browse files Browse the repository at this point in the history
  • Loading branch information
vincent-herlemont committed Sep 4, 2023
1 parent 5f2f339 commit 9a3a673
Show file tree
Hide file tree
Showing 13 changed files with 188 additions and 131 deletions.
4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,9 @@ harness = false
name = "overhead_on_bincode"
harness = false

[[bench]]
name = "prepend_bytes"
harness = false

[build-dependencies]
skeptic = "0.13"
77 changes: 39 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ See [concepts](#concepts) for more details.
versions of the data model.
- **Data Consistency**: Ensure that we process the data expected model.
- **Flexibility**: You can use any serialization format you want. More details [here](#setup-your-serialization-format).
- **Performance**: A minimal overhead. More details [here](#performance).
- **Performance**: A minimal overhead (encode: ~20 ns, decode: ~40 ps). More details [here](#performance).

## Usage

Expand Down Expand Up @@ -71,10 +71,31 @@ When not to use it?
- You need to have a human-readable format. (You can use a human-readable format like JSON wrapped in a native model,
but you have to unwrap it to see the data correctly.)

# Status
## Status

Early development. Not ready for production.

## Concepts

In order to understand how the native model works, you need to understand the following concepts.

- **Identity**(`id`): The identity is the unique identifier of the model. It is used to identify the model and
prevent to decode a model into the wrong Rust type.
- **Version**(`version`) The version is the version of the model. It is used to check the compatibility between two
models.
- **Encode**: The encode is the process of converting a model into a byte array.
- **Decode**: The decode is the process of converting a byte array into a model.
- **Downgrade**: The downgrade is the process of converting a model into a previous version of the model.
- **Upgrade**: The upgrade is the process of converting a model into a newer version of the model.

Under the hood, the native model is a thin wrapper around serialized data. The `id` and the `version` are twice encoded with a [`little_endian::U32`](https://docs.rs/zerocopy/latest/zerocopy/byteorder/little_endian/type.U32.html). That represents 8 bytes, that are added at the beginning of the data.

```
+------------------+------------------+------------------------------------+
| ID (4 bytes) | Version (4 bytes)| Data (indeterminate-length bytes) |
+------------------+------------------+------------------------------------+
```

## Setup your serialization format

First, you need to set up your serialization format. You can use any serialization format.
Expand Down Expand Up @@ -146,46 +167,26 @@ struct Cord {

Full example [here](tests/example/example_define_model.rs).

# Concepts
## Performance

In order to understand how the native model works, you need to understand the following concepts.
Native model has
been designed to have a minimal and constant overhead. That means that the overhead is the same
whatever the size of the data. Under the wood we use the [zerocopy](https://docs.rs/zerocopy/latest/zerocopy/) crate
to avoid unnecessary copies.

- **Identity**(`id`): The identity is the unique identifier of the model. It is used to identify the model and
prevent to decode a model into the wrong type.
- **Version**(`version`) The version is the version of the model. It is used to check the compatibility between two
models.
- **Encode**: The encode is the process of converting a model into a byte array.
- **Decode**: The decode is the process of converting a byte array into a model.
- **Downgrade**: The downgrade is the process of converting a model into a previous version of the model.
- **Upgrade**: The upgrade is the process of converting a model into a newer version of the model.
👉 To know the total time of the encode/decode, you need to add the time of your serialization format.

Under the hood, the native model is a thin wrapper around serialized data. The `id` and the `version` are twice encoded with a [`little_endian::U32`](https://docs.rs/zerocopy/latest/zerocopy/byteorder/little_endian/type.U32.html). That represents 8 bytes, that are added at the beginning of the data.
Resume:
- **Encode**: ~20 ns
- **Decode**: ~40 ps

```
+------------------+------------------+------------------------------------+
| ID (4 bytes) | Version (4 bytes)| Data (indeterminate-length bytes) |
+------------------+------------------+------------------------------------+
```

# Performance

This crate is in an early stage of development, so the performance should be improved in the future.
The goal is to have a minimal and constant overhead for all data sizes. It uses the [zerocopy](https://docs.rs/zerocopy/latest/zerocopy/) crate to avoid unnecessary copies.

Current performance:
- Encode time: have overhead that evolves linearly with the data size.
- Decode time: have overhead of ~162 ps for all data sizes.


| data size | encode time (ns/ps/µs/ms) | decode time (ps) |
|:---------------------:|:--------------------------:|:----------------:|
| 1 B | 40.093 ns - 40.510 ns | 161.87 ps - 162.02 ps |
| 1 KiB (1024 B) | 116.45 ns - 116.83 ns | 161.85 ps - 162.08 ps |
| 1 MiB (1048576 B) | 66.697 µs - 67.634 µs | 161.87 ps - 162.18 ps |
| 10 MiB (10485760 B) | 1.5670 ms - 1.5843 ms | 162.40 ps - 163.52 ps |
| 100 MiB (104857600 B) | 63.778 ms - 64.132 ms | 162.71 ps - 165.10 ps |
| data size | encode time (ns) | decode time (ps) |
|:--------------------:|:---------------------:|:-----------------------:|
| 1 B | 19.769 ns - 20.154 ns | 40.526 ps - 40.617 ps |
| 1 KiB | 19.597 ns - 19.971 ns | 40.534 ps - 40.633 ps |
| 1 MiB | 19.662 ns - 19.910 ns | 40.508 ps - 40.632 ps |
| 10 MiB | 19.591 ns - 19.980 ns | 40.504 ps - 40.605 ps |
| 100 MiB | 19.669 ns - 19.867 ns | 40.520 ps - 40.644 ps |

Benchmark of the native model overhead [here](benches/overhead.rs).

To know how much time it takes to encode/decode your data, you need to add this overhead to the time of your serialization format.

10 changes: 5 additions & 5 deletions benches/overhead.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ fn native_model_decode_body<T: Decode>(data: Vec<u8>) -> Result<T, bincode::erro
#[native_model(id = 1, version = 1)]
struct Data(Vec<u8>);

fn wrapper(data: &mut Vec<u8>) {
fn wrap(data: &mut Vec<u8>) {
native_model::wrapper::native_model_encode(data, 1, 1);
}

Expand All @@ -31,16 +31,16 @@ fn criterion_benchmark(c: &mut Criterion) {

// encode
let data = Data(vec![1; nb_bytes]);
let encode_body = native_model_encode_body(&data).unwrap();
let mut encode_body = native_model_encode_body(&data).unwrap();
group.bench_function(BenchmarkId::new("encode", nb_bytes), |b| {
b.iter(|| wrapper(&mut encode_body.clone()))
b.iter(|| wrap(&mut encode_body))
});

// decode
let data = Data(vec![1; nb_bytes]);
let encode_body = native_model::encode(&data).unwrap();
let mut encode_body = native_model::encode(&data).unwrap();
group.bench_function(BenchmarkId::new("decode", nb_bytes), |b| {
b.iter(|| unwrap(&mut encode_body.clone()))
b.iter(|| unwrap(&mut encode_body))
});
}
}
Expand Down
28 changes: 28 additions & 0 deletions benches/prepend_bytes.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/// Found a way to prepend bytes at the beginning of a Vec<u8> with a constant overhead.
use bincode::{Decode, Encode};
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};

fn criterion_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("encode");

// 1 byte, 1KB, 1MB, 10MB, 100MB
for nb_bytes in [1, 1024, 1024 * 1024, 10 * 1024 * 1024, 100 * 1024 * 1024].into_iter() {
group.throughput(criterion::Throughput::Bytes(nb_bytes as u64));

let header: Vec<u8> = vec![0; 4];
let mut data: Vec<u8> = vec![1; nb_bytes];
group.bench_function(BenchmarkId::new("prepend_bytes", nb_bytes), |b| {
b.iter(|| {
// Fastest way to prepend bytes to data
let mut header = header.clone();
header.append(&mut data);
// prepend bytes to data
// let mut header = header.clone();
// header.extend_from_slice(&data);
});
});
}
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
2 changes: 1 addition & 1 deletion native_model_macro/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ pub fn native_model(args: TokenStream, input: TokenStream) -> TokenStream {
let native_model_version_fn = generate_native_model_version(&attrs);
let native_model_encode_body_fn = generate_native_model_encode_body();
let native_model_encode_downgrade_body_fn = generate_native_model_encode_downgrade_body(&attrs);
let native_model_decode_body_fn = generate_native_model_decode_body();
let native_model_decode_body_fn = generate_native_model_decode_body(&attrs);
let native_model_decode_upgrade_body_fn = generate_native_model_decode_upgrade_body(&attrs);

let gen = quote! {
Expand Down
15 changes: 11 additions & 4 deletions native_model_macro/src/method/decode_body.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
use crate::ModelAttributes;
use proc_macro2::TokenStream;
use quote::quote;

pub(crate) fn generate_native_model_decode_body() -> TokenStream {
pub(crate) fn generate_native_model_decode_body(attrs: &ModelAttributes) -> TokenStream {
let id = attrs.id.clone().expect("id is required");
let gen = quote! {
fn native_model_decode_body(data: Vec<u8>) -> Result<Self, native_model::DecodeBodyError> {
native_model_decode_body(data).map_err(|e| native_model::DecodeBodyError {
fn native_model_decode_body(data: Vec<u8>, id: u32) -> Result<Self, native_model::DecodeBodyError> {
println!("id: {}, {}", id, #id);
if id != #id {
return Err(native_model::DecodeBodyError::MismatchedModelId);
}

native_model_decode_body(data).map_err(|e| native_model::DecodeBodyError::DecodeError {
msg: format!("{}", e),
source: e.into(),
})
}
};

gen.into()
}
}
16 changes: 8 additions & 8 deletions native_model_macro/src/method/decode_upgrade_body.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ pub(crate) fn generate_native_model_decode_upgrade_body(attrs: &ModelAttributes)

let model_from_or_try_from = if let Some(from) = native_model_from {
quote! {
#from::native_model_decode_upgrade_body(data, x).map(|a| a.into())
#from::native_model_decode_upgrade_body(data, id, version).map(|a| a.into())
}
} else if let Some((try_from, error_try_from)) = native_model_try_from {
quote! {
let result = #try_from::native_model_decode_upgrade_body(data, x).map(|b| {
let result = #try_from::native_model_decode_upgrade_body(data, id, version).map(|b| {
b.try_into()
.map_err(|e: #error_try_from| native_model::UpgradeError {
msg: format!("{}", e),
Expand All @@ -24,22 +24,22 @@ pub(crate) fn generate_native_model_decode_upgrade_body(attrs: &ModelAttributes)
} else {
quote! {
Err(native_model::Error::UpgradeNotSupported {
from: x,
from: version,
to: Self::native_model_version(),
})
}
};

let gen = quote! {
fn native_model_decode_upgrade_body(data: Vec<u8>, x: u32) -> native_model::Result<Self> {
if x == Self::native_model_version() {
let result = Self::native_model_decode_body(data)?;
fn native_model_decode_upgrade_body(data: Vec<u8>, id: u32, version: u32) -> native_model::Result<Self> {
if version == Self::native_model_version() {
let result = Self::native_model_decode_body(data, id)?;
Ok(result)
} else if x < Self::native_model_version() {
} else if version < Self::native_model_version() {
#model_from_or_try_from
} else {
Err(native_model::Error::UpgradeNotSupported {
from: x,
from: version,
to: Self::native_model_version(),
})
}
Expand Down
2 changes: 1 addition & 1 deletion src/header.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ use zerocopy::{AsBytes, FromBytes, FromZeroes};
#[derive(FromZeroes, FromBytes, AsBytes, Debug)]
#[repr(C)]
pub struct Header {
pub(crate) type_id: U32,
pub(crate) id: U32,
pub(crate) version: U32,
}
13 changes: 9 additions & 4 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,15 @@ pub type DecodeResult<T> = std::result::Result<T, DecodeBodyError>;

#[derive(Error, Debug)]
#[error("Decode body error: {msg}")]
pub struct DecodeBodyError {
pub msg: String,
#[source]
pub source: anyhow::Error,
pub enum DecodeBodyError {
#[error("Mismatched model id")]
MismatchedModelId,
#[error("Decode error: {msg}")]
DecodeError {
msg: String,
#[source]
source: anyhow::Error,
},
}

pub type EncodeResult<T> = std::result::Result<T, EncodeBodyError>;
Expand Down
16 changes: 10 additions & 6 deletions src/model.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ pub trait Model: Sized {

// --------------- Decode ---------------

fn native_model_decode_body(data: Vec<u8>) -> DecodeResult<Self>
fn native_model_decode_body(data: Vec<u8>, id: u32) -> DecodeResult<Self>
where
Self: Sized;

fn native_model_decode_upgrade_body(data: Vec<u8>, version: u32) -> Result<Self>
fn native_model_decode_upgrade_body(data: Vec<u8>, id: u32, version: u32) -> Result<Self>
where
Self: Sized;

Expand All @@ -19,9 +19,13 @@ pub trait Model: Sized {
Self: Sized,
{
let native_model = crate::Wrapper::deserialize(&data[..]).unwrap();
let source_id = native_model.get_id();
let source_version = native_model.get_version();
let result =
Self::native_model_decode_upgrade_body(native_model.value().to_vec(), source_version)?;
let result = Self::native_model_decode_upgrade_body(
native_model.value().to_vec(),
source_id,
source_version,
)?;
Ok((result, source_version))
}

Expand All @@ -40,7 +44,7 @@ pub trait Model: Sized {
Self: Sized,
{
let mut data = self.native_model_encode_body()?;
crate::native_model_encode(
let data = crate::native_model_encode(
&mut data,
Self::native_model_id(),
Self::native_model_version(),
Expand All @@ -54,7 +58,7 @@ pub trait Model: Sized {
{
let version = version.clone();
let mut data = self.native_model_encode_downgrade_body(version)?;
crate::native_model_encode(&mut data, Self::native_model_id(), version);
let data = crate::native_model_encode(&mut data, Self::native_model_id(), version);
Ok(data)
}
}
38 changes: 12 additions & 26 deletions src/wrapper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@ impl<T: ByteSlice> Wrapper<T> {
}

pub fn get_type_id(&self) -> u32 {
self.header.type_id.get()
self.header.id.get()
}

pub fn get_id(&self) -> u32 {
self.header.id.get()
}

pub fn get_version(&self) -> u32 {
Expand All @@ -33,40 +37,22 @@ impl<T: ByteSlice> Wrapper<T> {

impl<T: ByteSliceMut> Wrapper<T> {
pub fn set_type_id(&mut self, type_id: u32) {
self.header.type_id = U32::new(type_id);
self.header.id = U32::new(type_id);
}

pub fn set_version(&mut self, version: u32) {
self.header.version = U32::new(version);
}
}

pub fn native_model_encode(value: &mut Vec<u8>, type_id: u32, version: u32) {
pub fn native_model_encode(data: &mut Vec<u8>, type_id: u32, version: u32) -> Vec<u8> {
let header = Header {
type_id: U32::new(type_id),
id: U32::new(type_id),
version: U32::new(version),
};
let header = header.as_bytes();
value.reserve(header.len());
value.splice(..0, header.iter().cloned());

// Try to do with unsafe code to improve performance but benchmark shows that it's the same
//
// // Add header to the beginning of the vector
// unsafe {
// // get the raw pointer to the vector's buffer
// let ptr = value.as_mut_ptr();
//
// // move the existing elements to the right
// ptr.offset(header.len() as isize)
// .copy_from_nonoverlapping(ptr, value.len());
//
// // copy the elements from the header to the beginning of the vector
// ptr.copy_from_nonoverlapping(header.as_ptr(), header.len());
//
// // update the length of the vector
// value.set_len(value.len() + header.len());
// }
let mut header = header.as_bytes().to_vec();
header.append(data);
header
}

#[cfg(test)]
Expand All @@ -76,7 +62,7 @@ mod tests {
#[test]
fn native_model_deserialize_with_body() {
let mut data = vec![0u8; 8];
native_model_encode(&mut data, 200000, 100000);
let data = native_model_encode(&mut data, 200000, 100000);
assert_eq!(data.len(), 16);
let model = Wrapper::deserialize(&data[..]).unwrap();
assert_eq!(model.get_type_id(), 200000);
Expand Down
Loading

0 comments on commit 9a3a673

Please sign in to comment.