-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework Integer type for speed #254
Conversation
Apparently not much difference for speed. This reverts commit 1a171a9.
Thank you for your PR! I don't know if you saw this discussion the other week, but I've an alternative proposal for improving integer decode speeds. I think this will provide better performance for most cases, as it will allow for the use of smaller container types without the need for branching since we know what integer type the user wants and can error out if bytes exceed the integer size. I think we can use something like this though for codec internal numerics like length and similar. |
Yeah, that will reduce branching. If I understand it correctly, we would need to rework the integer decode logic for every binary codec to get the best out of it? It is a bit more complex to add. We have comparison point now to see the performance impact. However, it would be easier to see if we also optimize other allocations. How about the encoding side? I assume it should be based on the trait as well. I tried one trait based approach earlier, but there the main challenge is to provide same output type for encoding. Single Since there is need to branch the logic based on fixed-size slice vs dynamic vector, I ended up using enum where it was easier. I can try to continue towards that suggestion, unless @repnop has already got started. |
Yes but it should only be changing a couple of lines in each, as AFAIK all of the codecs eventually get to the point where an integer is a bag of big endian bytes and converts it, so it would be just changing that part to use the trait.
Since we're using static dispatch, we can use trait IntegerType {
fn to_bytes_be(&self) -> impl AsRef<[u8]>;
} |
I was thinking that maybe we could avoid some of that code before that point, since the constraints are passed, and if for every codec, we define own implementation for different types, we could hardcode the ranges and format for encodings in that specific constraint. At least in OER (probably it was bad design), it always calculates everything based on the constraint ranges. So some calculations could be potentially skipped (if that is even worth it).
Wow, that was too easy and seems to work well. I need to read more books... |
Closing in favor of #256 |
Hi,
I have been working on with improving Integer type to get better performance.
Partial solution for #76
The current idea is make
Integer
type as enum as follows, after trying out many different approaches:It uses the primitive type by default internally whenever possible to get the best performance, excluding cases where Big integer is used manually. Also, underflows or overflows are automatically converted to Big integer.
BigInt
could be easily changed to another type as well, if we would like to.I am not sure if I over-engineered this one, but I wanted to make it easier to drop the mandated
i128
type requirement in the future if decided, so thatrasn
could be used on targets which do not support that type, as it is#![no_std]
crate.Now, all the internal types can be changed just by changing
PrimitiveInteger
constant.PrimitiveInteger
instead ofi128
when working with constraints. Needs more work to remove the type completely.PrimitiveInteger
, soi128
is dropped from there.i64:MIN
andi64:MAX
is 2xi64:MAX
for UPER, and withi64
type this does not fit and will show zero range. Not sure how unexpected behavior this is, but it is more likely than withi128
, if someone does not usei128
.Overall, BER and OER will get around 2x performance boost for Integers. UPER will get less, as the are many other allocations in the code, which contribute more for the current performance.
OER can also get some significant boosts when allocations are optimized further.
On M2 Pro, overall improvement based on benchmark mentioned in issue #244, when
i128
default used:With
i64
encoding is slightly faster, but decoding slower.I tried different styles, but enum one seem to get the best from both and generics are not exposed to any API.
The downside for this approach is, that when decoding data, it is not always certain what is the inner
Integer
type (Primitive or Big), if type is not constrained.I implemented
Add
Sub
andMul
operations for Integer, because it was easier for some codecs. Maybe it is better to not increase the amount of them, and instead encourage to use internal type ops instead. That would also add maintenance burden.Current code will need next release of Rust (1.79, will be released on 13. June) to work. Otherwise, either one line of unsafe code or more complex code is required to bypass borrow checker.
Also, it seems that it is not easy to automatically test the
i64
type by removingi128
feature, becauserasn-pkix
is dev-dependency on the workspace, and it makesrasn
tests compile with default features, no matter what (unless removed, or features changed for that crate). Maybe we don't need that bench with pkix?Maybe @6d7a also wants to take look since it is kinda big change.