-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various changes (hopefully improvements) #12
base: master
Are you sure you want to change the base?
Conversation
The output length only needs to be half the input length.
This uses criterions benchmark groups to compare the relative performance between the different implementations at various byte lengths. The criterion reports now provide graphs that overlay the different implementation so you can easily see the relative performance.
Range patterns are deprecated, switch to inclusive ranges instead. Re-export the deprecated function `hex_to` in a separate use statement that allows deprecation warnings.
This isn't an error condition. A decoded empty slice is just an empty slice.
This also removes functions from the public api. hex_decode and hex_decode_unchecked are the only things exposed by default. hex_check_sse, hex_check_fallback, and hex_decode_fallback are only visible when compiled with the 'bench' feature. This means that benches now need to specify --features=bench when running. $ cargo bench --features=bench This makes no changes to the actual implementation and benchmarks confirm that.
Rather than checking to ensure all bytes are within valid ranges, this now checks to ensure no bytes are within invalid ranges. This is mostly to avoid compiler warning when comparing an i32 to 0xffff because we now compare the i32 to zero. The performance is identical.
…e pass over the input data. This speeds up checked decoding substantially, while not changing the performance of unchecked decoding. On my machine this increases throughput when decoding 2 bytes of data by 20%, and when decoding 4096 bytes of data by 116%.
Use a lookup table to determine the offset to add to each byte. This improves performance substantially. decode/faster_hex/4096 thrpt increased 15% decode/faster_hex_unchecked/4096 thrpt increased 54%
compare the implementation against the hex crate.
The benches added unnecessary code that isn't available via the public api and is only benchmarked as a side-effect of the decode benchmark.
Great work @ggriffiniii ! I took a quick pass over it, but it's a big PR. so I need more time to do a proper review. |
I'd propose to use AsMut make public api more generic, like this
|
Yeah, this is a rather large change. By all means take as much time as you need to do a thorough review.
It could be changed to AsMut easily enough, but it doesn't seem worth it to me. The big benefit of AsRef<[u8]> is being able to accept &str and &String as well as byte slices and Vec's. The only standard things that implement AsMut are byte slices, Vec's and Box<[u8]>, which already will auto-deref to &mut [u8] without the explicit AsMut bound. I also favored encode and decode allocating, and having encode_to_slice and decode_to_slice being the more verbose options for those seeking to reuse allocations. The latter is far less common so making it slightly more verbose makes sense and also follows a similar convention to other encoding crates like hex and base64. |
While on the topic of public function signatures, I thought it would be useful to add a couple additional ideas I had. I opted to not include them in this change just to keep the already large change from growing even further, but here it goes. I'll outline the proposals for encoding to keep the focus on the ideas, but in each case there is a matching analog for decoding that should probably be included if you want to pursue it. Provide a convenient return value for encode_to_slicepub fn encode_to_slice<'i, 'o, I>(src: &'i I, dst: &'o mut [u8]) -> Result<&'o str, Error>
where
 I: AsRef<[u8]> + ?Sized; The goal here would just to provide a convenient &str of the encoded data. Users would be free to ignore the return value and the function would behave identically as it does now, but for many users getting back a &str (that's already been validated as utf8) could be a nice convenience. Provide a simpler api for buffer reusepub fn encode_with_buffer<'i, 'o, I>(src: &'i I, dst: &'o mut Vec<u8>) -> &'o str
where
I: AsRef<[u8]> + ?Sized; or pub fn encode_with_buffer<I, B>(src: &I, dst: B) -> String
where
I: AsRef<[u8]> + ?Sized,
B: Into<Vec<u8>>; The goal with both of these options is to allow users to reuse buffers (allocations) when encoding multiple chunks of data. That's the biggest performance benefit allowed by the to_slice functions, but these would allow for the same performance characteristics without the potential errors since the size of the output buffer would be managed internally. Both of these options would be simple wrappers around encode_to_slice. These could all be pursued as subsequent PR's, but we could have the discussion here while the topic of public API is already being discussed. The buffer reuse would be semver compatible change that could be added at any time, the return value change to encode_to_slice would not be semver compatible so a decision on that should be made prior to making another release on crates.io. |
I like directions of changes in this PR. Crate itself is great, I found it when noticed how hex is slow. But I found that documentation and |
Looks like
With update |
Hey y'all, any updates on this PR? |
This PR now stuck point is not timely processing, we can accept the algorithm changes, but for the API we do not want to break the original API now directly, may add new API rather than directly overwrite the original is more acceptable. |
Please break the API. You could have the defacto hex encoding crate as |
Here is a rather large change. This makes changes to the public facing API as well as the internal implementation. The public API changes are to remove the hex_ from function names and reduce what's exported by default. The only public functions exposed by default are now
encode, encode_to_slice, decode, decode_to_slice, decode_to_slice_unchecked
Each of these now also take a generic input of anything that is AsRef<[u8]> to be more convenient for users to encode/decode strings as well as byte buffers. The various fallback and specialized implementations are only exposed when compiled with the "bench" feature. The benchmarks now require that feature to run.
When tested on my machine which is a
Intel(R) Xeon(R) W-2135
(skylake) these changes result in the following benchmark results: