feat: implement mismatched model id + improvement

vincent-herlemont · Sep 4, 2023 · 9a3a673 · 9a3a673
1 parent 5f2f339
commit 9a3a673
Show file tree

Hide file tree

Showing 13 changed files with 188 additions and 131 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -35,5 +35,9 @@ harness = false
 name = "overhead_on_bincode"
 harness = false
 
+[[bench]]
+name = "prepend_bytes"
+harness = false
+
 [build-dependencies]
 skeptic = "0.13"
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ See [concepts](#concepts) for more details.
   versions of the data model.
 - **Data Consistency**: Ensure that we process the data expected model.
 - **Flexibility**: You can use any serialization format you want. More details [here](#setup-your-serialization-format).
-- **Performance**: A minimal overhead. More details [here](#performance).
+- **Performance**: A minimal overhead (encode: ~20 ns, decode: ~40 ps). More details [here](#performance).
 
 ## Usage
 
@@ -71,10 +71,31 @@ When not to use it?
 - You need to have a human-readable format. (You can use a human-readable format like JSON wrapped in a native model,
   but you have to unwrap it to see the data correctly.)
 
-# Status
+## Status
 
 Early development. Not ready for production.
 
+## Concepts
+
+In order to understand how the native model works, you need to understand the following concepts.
+
+- **Identity**(`id`): The identity is the unique identifier of the model. It is used to identify the model and 
+  prevent to decode a model into the wrong Rust type.
+- **Version**(`version`) The version is the version of the model. It is used to check the compatibility between two 
+  models.
+- **Encode**: The encode is the process of converting a model into a byte array.
+- **Decode**: The decode is the process of converting a byte array into a model.
+- **Downgrade**: The downgrade is the process of converting a model into a previous version of the model.
+- **Upgrade**: The upgrade is the process of converting a model into a newer version of the model.
+
+Under the hood, the native model is a thin wrapper around serialized data. The `id` and the `version` are twice encoded with a [`little_endian::U32`](https://docs.rs/zerocopy/latest/zerocopy/byteorder/little_endian/type.U32.html). That represents 8 bytes, that are added at the beginning of the data.
+
+```
++------------------+------------------+------------------------------------+
+|     ID (4 bytes) | Version (4 bytes)| Data (indeterminate-length bytes)  |
++------------------+------------------+------------------------------------+
+```
+
 ## Setup your serialization format
 
 First, you need to set up your serialization format. You can use any serialization format.
@@ -146,46 +167,26 @@ struct Cord {
 
 Full example [here](tests/example/example_define_model.rs).
 
-# Concepts
+## Performance
 
-In order to understand how the native model works, you need to understand the following concepts.
+Native model has
+been designed to have a minimal and constant overhead. That means that the overhead is the same
+whatever the size of the data. Under the wood we use the [zerocopy](https://docs.rs/zerocopy/latest/zerocopy/) crate 
+to avoid unnecessary copies.
 
-- **Identity**(`id`): The identity is the unique identifier of the model. It is used to identify the model and 
-  prevent to decode a model into the wrong type.
-- **Version**(`version`) The version is the version of the model. It is used to check the compatibility between two 
-  models.
-- **Encode**: The encode is the process of converting a model into a byte array.
-- **Decode**: The decode is the process of converting a byte array into a model.
-- **Downgrade**: The downgrade is the process of converting a model into a previous version of the model.
-- **Upgrade**: The upgrade is the process of converting a model into a newer version of the model.
+👉 To know the total time of the encode/decode, you need to add the time of your serialization format.
 
-Under the hood, the native model is a thin wrapper around serialized data. The `id` and the `version` are twice encoded with a [`little_endian::U32`](https://docs.rs/zerocopy/latest/zerocopy/byteorder/little_endian/type.U32.html). That represents 8 bytes, that are added at the beginning of the data.
+Resume:
+- **Encode**: ~20 ns
+- **Decode**: ~40 ps
 
-```
-+------------------+------------------+------------------------------------+
-|     ID (4 bytes) | Version (4 bytes)| Data (indeterminate-length bytes)  |
-+------------------+------------------+------------------------------------+
-```
-
-# Performance
-
-This crate is in an early stage of development, so the performance should be improved in the future.
-The goal is to have a minimal and constant overhead for all data sizes. It uses the [zerocopy](https://docs.rs/zerocopy/latest/zerocopy/) crate to avoid unnecessary copies.
-
-Current performance:
-- Encode time: have overhead that evolves linearly with the data size.
-- Decode time: have overhead of ~162 ps for all data sizes.
-
-
-|       data size       | encode time (ns/ps/µs/ms) | decode time (ps) |
-|:---------------------:|:--------------------------:|:----------------:|
-|          1 B          | 40.093 ns - 40.510 ns      | 161.87 ps - 162.02 ps |
-|    1 KiB (1024 B)     | 116.45 ns - 116.83 ns      | 161.85 ps - 162.08 ps |
-|   1 MiB (1048576 B)   | 66.697 µs - 67.634 µs      | 161.87 ps - 162.18 ps |
-|  10 MiB (10485760 B)  | 1.5670 ms - 1.5843 ms      | 162.40 ps - 163.52 ps |
-| 100 MiB (104857600 B) | 63.778 ms - 64.132 ms      | 162.71 ps - 165.10 ps |
+|      data size       |   encode time (ns)    | decode time (ps)        |
+|:--------------------:|:---------------------:|:-----------------------:|
+|         1 B          | 19.769 ns - 20.154 ns | 40.526 ps - 40.617 ps   |
+|        1 KiB         | 19.597 ns - 19.971 ns | 40.534 ps - 40.633 ps   |
+|        1 MiB         | 19.662 ns - 19.910 ns | 40.508 ps - 40.632 ps   |
+|        10 MiB        | 19.591 ns - 19.980 ns | 40.504 ps - 40.605 ps   |
+|       100 MiB        | 19.669 ns - 19.867 ns | 40.520 ps - 40.644 ps   |
 
 Benchmark of the native model overhead [here](benches/overhead.rs).
 
-To know how much time it takes to encode/decode your data, you need to add this overhead to the time of your serialization format.
-
diff --git a/benches/overhead.rs b/benches/overhead.rs
@@ -14,7 +14,7 @@ fn native_model_decode_body<T: Decode>(data: Vec<u8>) -> Result<T, bincode::erro
 #[native_model(id = 1, version = 1)]
 struct Data(Vec<u8>);
 
-fn wrapper(data: &mut Vec<u8>) {
+fn wrap(data: &mut Vec<u8>) {
     native_model::wrapper::native_model_encode(data, 1, 1);
 }
 
@@ -31,16 +31,16 @@ fn criterion_benchmark(c: &mut Criterion) {
 
         // encode
         let data = Data(vec![1; nb_bytes]);
-        let encode_body = native_model_encode_body(&data).unwrap();
+        let mut encode_body = native_model_encode_body(&data).unwrap();
         group.bench_function(BenchmarkId::new("encode", nb_bytes), |b| {
-            b.iter(|| wrapper(&mut encode_body.clone()))
+            b.iter(|| wrap(&mut encode_body))
         });
 
         // decode
         let data = Data(vec![1; nb_bytes]);
-        let encode_body = native_model::encode(&data).unwrap();
+        let mut encode_body = native_model::encode(&data).unwrap();
         group.bench_function(BenchmarkId::new("decode", nb_bytes), |b| {
-            b.iter(|| unwrap(&mut encode_body.clone()))
+            b.iter(|| unwrap(&mut encode_body))
         });
     }
 }

diff --git a/benches/prepend_bytes.rs b/benches/prepend_bytes.rs
@@ -0,0 +1,28 @@
+/// Found a way to prepend bytes at the beginning of a Vec<u8> with a constant overhead.
+use bincode::{Decode, Encode};
+use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
+
+fn criterion_benchmark(c: &mut Criterion) {
+    let mut group = c.benchmark_group("encode");
+
+    // 1 byte, 1KB, 1MB, 10MB, 100MB
+    for nb_bytes in [1, 1024, 1024 * 1024, 10 * 1024 * 1024, 100 * 1024 * 1024].into_iter() {
+        group.throughput(criterion::Throughput::Bytes(nb_bytes as u64));
+
+        let header: Vec<u8> = vec![0; 4];
+        let mut data: Vec<u8> = vec![1; nb_bytes];
+        group.bench_function(BenchmarkId::new("prepend_bytes", nb_bytes), |b| {
+            b.iter(|| {
+                // Fastest way to prepend bytes to data
+                let mut header = header.clone();
+                header.append(&mut data);
+                // prepend bytes to data
+                // let mut header = header.clone();
+                // header.extend_from_slice(&data);
+            });
+        });
+    }
+}
+
+criterion_group!(benches, criterion_benchmark);
+criterion_main!(benches);
diff --git a/native_model_macro/src/lib.rs b/native_model_macro/src/lib.rs
@@ -94,7 +94,7 @@ pub fn native_model(args: TokenStream, input: TokenStream) -> TokenStream {
     let native_model_version_fn = generate_native_model_version(&attrs);
     let native_model_encode_body_fn = generate_native_model_encode_body();
     let native_model_encode_downgrade_body_fn = generate_native_model_encode_downgrade_body(&attrs);
-    let native_model_decode_body_fn = generate_native_model_decode_body();
+    let native_model_decode_body_fn = generate_native_model_decode_body(&attrs);
     let native_model_decode_upgrade_body_fn = generate_native_model_decode_upgrade_body(&attrs);
 
     let gen = quote! {

diff --git a/native_model_macro/src/method/decode_body.rs b/native_model_macro/src/method/decode_body.rs
@@ -1,15 +1,22 @@
+use crate::ModelAttributes;
 use proc_macro2::TokenStream;
 use quote::quote;
 
-pub(crate) fn generate_native_model_decode_body() -> TokenStream {
+pub(crate) fn generate_native_model_decode_body(attrs: &ModelAttributes) -> TokenStream {
+    let id = attrs.id.clone().expect("id is required");
     let gen = quote! {
-        fn native_model_decode_body(data: Vec<u8>) -> Result<Self, native_model::DecodeBodyError> {
-            native_model_decode_body(data).map_err(|e| native_model::DecodeBodyError {
+        fn native_model_decode_body(data: Vec<u8>, id: u32) -> Result<Self, native_model::DecodeBodyError> {
+            println!("id: {}, {}", id, #id);
+            if id != #id {
+                return Err(native_model::DecodeBodyError::MismatchedModelId);
+            }
+
+            native_model_decode_body(data).map_err(|e| native_model::DecodeBodyError::DecodeError {
                 msg: format!("{}", e),
                 source: e.into(),
             })
         }
     };
 
     gen.into()
-}
+}
diff --git a/native_model_macro/src/method/decode_upgrade_body.rs b/native_model_macro/src/method/decode_upgrade_body.rs
@@ -8,11 +8,11 @@ pub(crate) fn generate_native_model_decode_upgrade_body(attrs: &ModelAttributes)
 
     let model_from_or_try_from = if let Some(from) = native_model_from {
         quote! {
-            #from::native_model_decode_upgrade_body(data, x).map(|a| a.into())
+            #from::native_model_decode_upgrade_body(data, id, version).map(|a| a.into())
         }
     } else if let Some((try_from, error_try_from)) = native_model_try_from {
         quote! {
-            let result = #try_from::native_model_decode_upgrade_body(data, x).map(|b| {
+            let result = #try_from::native_model_decode_upgrade_body(data, id, version).map(|b| {
                 b.try_into()
                     .map_err(|e: #error_try_from| native_model::UpgradeError {
                         msg: format!("{}", e),
@@ -24,22 +24,22 @@ pub(crate) fn generate_native_model_decode_upgrade_body(attrs: &ModelAttributes)
     } else {
         quote! {
             Err(native_model::Error::UpgradeNotSupported {
-                from: x,
+                from: version,
                 to: Self::native_model_version(),
             })
         }
     };
 
     let gen = quote! {
-        fn native_model_decode_upgrade_body(data: Vec<u8>, x: u32) -> native_model::Result<Self> {
-            if x == Self::native_model_version() {
-                let result = Self::native_model_decode_body(data)?;
+        fn native_model_decode_upgrade_body(data: Vec<u8>, id: u32, version: u32) -> native_model::Result<Self> {
+            if version == Self::native_model_version() {
+                let result = Self::native_model_decode_body(data, id)?;
                 Ok(result)
-            } else if x < Self::native_model_version() {
+            } else if version < Self::native_model_version() {
                 #model_from_or_try_from
             } else {
                 Err(native_model::Error::UpgradeNotSupported {
-                    from: x,
+                    from: version,
                     to: Self::native_model_version(),
                 })
             }

diff --git a/src/header.rs b/src/header.rs
@@ -4,6 +4,6 @@ use zerocopy::{AsBytes, FromBytes, FromZeroes};
 #[derive(FromZeroes, FromBytes, AsBytes, Debug)]
 #[repr(C)]
 pub struct Header {
-    pub(crate) type_id: U32,
+    pub(crate) id: U32,
     pub(crate) version: U32,
 }
diff --git a/src/lib.rs b/src/lib.rs
@@ -56,10 +56,15 @@ pub type DecodeResult<T> = std::result::Result<T, DecodeBodyError>;
 
 #[derive(Error, Debug)]
 #[error("Decode body error: {msg}")]
-pub struct DecodeBodyError {
-    pub msg: String,
-    #[source]
-    pub source: anyhow::Error,
+pub enum DecodeBodyError {
+    #[error("Mismatched model id")]
+    MismatchedModelId,
+    #[error("Decode error: {msg}")]
+    DecodeError {
+        msg: String,
+        #[source]
+        source: anyhow::Error,
+    },
 }
 
 pub type EncodeResult<T> = std::result::Result<T, EncodeBodyError>;

diff --git a/src/model.rs b/src/model.rs
@@ -6,11 +6,11 @@ pub trait Model: Sized {
 
     // --------------- Decode ---------------
 
-    fn native_model_decode_body(data: Vec<u8>) -> DecodeResult<Self>
+    fn native_model_decode_body(data: Vec<u8>, id: u32) -> DecodeResult<Self>
     where
         Self: Sized;
 
-    fn native_model_decode_upgrade_body(data: Vec<u8>, version: u32) -> Result<Self>
+    fn native_model_decode_upgrade_body(data: Vec<u8>, id: u32, version: u32) -> Result<Self>
     where
         Self: Sized;
 
@@ -19,9 +19,13 @@ pub trait Model: Sized {
         Self: Sized,
     {
         let native_model = crate::Wrapper::deserialize(&data[..]).unwrap();
+        let source_id = native_model.get_id();
         let source_version = native_model.get_version();
-        let result =
-            Self::native_model_decode_upgrade_body(native_model.value().to_vec(), source_version)?;
+        let result = Self::native_model_decode_upgrade_body(
+            native_model.value().to_vec(),
+            source_id,
+            source_version,
+        )?;
         Ok((result, source_version))
     }
 
@@ -40,7 +44,7 @@ pub trait Model: Sized {
         Self: Sized,
     {
         let mut data = self.native_model_encode_body()?;
-        crate::native_model_encode(
+        let data = crate::native_model_encode(
             &mut data,
             Self::native_model_id(),
             Self::native_model_version(),
@@ -54,7 +58,7 @@ pub trait Model: Sized {
     {
         let version = version.clone();
         let mut data = self.native_model_encode_downgrade_body(version)?;
-        crate::native_model_encode(&mut data, Self::native_model_id(), version);
+        let data = crate::native_model_encode(&mut data, Self::native_model_id(), version);
         Ok(data)
     }
 }
diff --git a/src/wrapper.rs b/src/wrapper.rs
@@ -23,7 +23,11 @@ impl<T: ByteSlice> Wrapper<T> {
     }
 
     pub fn get_type_id(&self) -> u32 {
-        self.header.type_id.get()
+        self.header.id.get()
+    }
+
+    pub fn get_id(&self) -> u32 {
+        self.header.id.get()
     }
 
     pub fn get_version(&self) -> u32 {
@@ -33,40 +37,22 @@ impl<T: ByteSlice> Wrapper<T> {
 
 impl<T: ByteSliceMut> Wrapper<T> {
     pub fn set_type_id(&mut self, type_id: u32) {
-        self.header.type_id = U32::new(type_id);
+        self.header.id = U32::new(type_id);
     }
 
     pub fn set_version(&mut self, version: u32) {
         self.header.version = U32::new(version);
     }
 }
 
-pub fn native_model_encode(value: &mut Vec<u8>, type_id: u32, version: u32) {
+pub fn native_model_encode(data: &mut Vec<u8>, type_id: u32, version: u32) -> Vec<u8> {
     let header = Header {
-        type_id: U32::new(type_id),
+        id: U32::new(type_id),
         version: U32::new(version),
     };
-    let header = header.as_bytes();
-    value.reserve(header.len());
-    value.splice(..0, header.iter().cloned());
-
-    // Try to do with unsafe code to improve performance but benchmark shows that it's the same
-    //
-    // // Add header to the beginning of the vector
-    // unsafe {
-    //     // get the raw pointer to the vector's buffer
-    //     let ptr = value.as_mut_ptr();
-    //
-    //     // move the existing elements to the right
-    //     ptr.offset(header.len() as isize)
-    //         .copy_from_nonoverlapping(ptr, value.len());
-    //
-    //     // copy the elements from the header to the beginning of the vector
-    //     ptr.copy_from_nonoverlapping(header.as_ptr(), header.len());
-    //
-    //     // update the length of the vector
-    //     value.set_len(value.len() + header.len());
-    // }
+    let mut header = header.as_bytes().to_vec();
+    header.append(data);
+    header
 }
 
 #[cfg(test)]
@@ -76,7 +62,7 @@ mod tests {
     #[test]
     fn native_model_deserialize_with_body() {
         let mut data = vec![0u8; 8];
-        native_model_encode(&mut data, 200000, 100000);
+        let data = native_model_encode(&mut data, 200000, 100000);
         assert_eq!(data.len(), 16);
         let model = Wrapper::deserialize(&data[..]).unwrap();
         assert_eq!(model.get_type_id(), 200000);