Rework Integer type for speed #254

Nicceboy · 2024-06-03T18:14:46Z

Hi,

I have been working on with improving Integer type to get better performance.
Partial solution for #76

The current idea is make Integer type as enum as follows, after trying out many different approaches:

#[cfg(not(feature = "i128"))]
pub type PrimitiveInteger = i64;

#[cfg(all(target_has_atomic = "128", feature = "i128"))]
pub type PrimitiveInteger = i128;

pub const PRIMITIVE_BYTE_WIDTH: usize = core::mem::size_of::<PrimitiveInteger>()

#[derive(Debug, Clone, Ord, Hash, Eq, PartialEq, PartialOrd)]
#[non_exhaustive]
pub enum Integer {
    Primitive(PrimitiveInteger),
    Big(BigInt),
}

It uses the primitive type by default internally whenever possible to get the best performance, excluding cases where Big integer is used manually. Also, underflows or overflows are automatically converted to Big integer. BigInt could be easily changed to another type as well, if we would like to.

I am not sure if I over-engineered this one, but I wanted to make it easier to drop the mandated i128 type requirement in the future if decided, so that rasn could be used on targets which do not support that type, as it is #![no_std] crate.

Now, all the internal types can be changed just by changing PrimitiveInteger constant.

Removed most hard-coded casts from codecs to use PrimitiveInteger instead of i128 when working with constraints. Needs more work to remove the type completely.
Constraint types are tied to PrimitiveInteger, so i128 is dropped from there.
- Note for someone debugging constraints: constraint range is calculated silently and might wrap or saturated (e.g. range of i64:MIN and i64:MAX is 2x i64:MAX for UPER, and with i64 type this does not fit and will show zero range. Not sure how unexpected behavior this is, but it is more likely than with i128, if someone does not use i128.

Overall, BER and OER will get around 2x performance boost for Integers. UPER will get less, as the are many other allocations in the code, which contribute more for the current performance.
OER can also get some significant boosts when allocations are optimized further.

On M2 Pro, overall improvement based on benchmark mentioned in issue #244, when i128 default used:

With i64 encoding is slightly faster, but decoding slower.

I tried different styles, but enum one seem to get the best from both and generics are not exposed to any API.

The downside for this approach is, that when decoding data, it is not always certain what is the inner Integer type (Primitive or Big), if type is not constrained.

I implemented Add Sub and Mul operations for Integer, because it was easier for some codecs. Maybe it is better to not increase the amount of them, and instead encourage to use internal type ops instead. That would also add maintenance burden.

Current code will need next release of Rust (1.79, will be released on 13. June) to work. Otherwise, either one line of unsafe code or more complex code is required to bypass borrow checker.

Also, it seems that it is not easy to automatically test the i64 type by removing i128 feature, because rasn-pkix is dev-dependency on the workspace, and it makes rasn tests compile with default features, no matter what (unless removed, or features changed for that crate). Maybe we don't need that bench with pkix?

Maybe @6d7a also wants to take look since it is kinda big change.

types

Apparently not much difference for speed. This reverts commit 1a171a9.

XAMPPRocky · 2024-06-03T19:55:59Z

Thank you for your PR! I don't know if you saw this discussion the other week, but I've an alternative proposal for improving integer decode speeds.

#107 (reply in thread)

I think this will provide better performance for most cases, as it will allow for the use of smaller container types without the need for branching since we know what integer type the user wants and can error out if bytes exceed the integer size.

I think we can use something like this though for codec internal numerics like length and similar.

Nicceboy · 2024-06-04T09:00:53Z

Thank you for your PR! I don't know if you saw this discussion the other week, but I've an alternative proposal for improving integer decode speeds.

#107 (reply in thread)

I think this will provide better performance for most cases, as it will allow for the use of smaller container types without the need for branching since we know what integer type the user wants and can error out if bytes exceed the integer size.

I think we can use something like this though for codec internal numerics like length and similar.

Yeah, that will reduce branching. If I understand it correctly, we would need to rework the integer decode logic for every binary codec to get the best out of it? It is a bit more complex to add.

We have comparison point now to see the performance impact. However, it would be easier to see if we also optimize other allocations.

How about the encoding side? I assume it should be based on the trait as well. I tried one trait based approach earlier, but there the main challenge is to provide same output type for encoding. Single to_vec call for primitives will remove all the performance benefits. Big numbers are vectors and primitives are fixed-size slices (or pointers with unsafe).

Since there is need to branch the logic based on fixed-size slice vs dynamic vector, I ended up using enum where it was easier.

I can try to continue towards that suggestion, unless @repnop has already got started.

XAMPPRocky · 2024-06-04T09:58:50Z

Yeah, that will reduce branching. If I understand it correctly, we would need to rework the integer decode logic for every binary codec to get the best out of it? It is a bit more complex to add.

Yes but it should only be changing a couple of lines in each, as AFAIK all of the codecs eventually get to the point where an integer is a bag of big endian bytes and converts it, so it would be just changing that part to use the trait.

Single to_vec call for primitives will remove all the performance benefits. Big numbers are vectors and primitives are fixed-size slices (or pointers with unsafe).

Since we're using static dispatch, we can use impl Trait here to say it returns AsRef<[u8]> and we don't have care about what type we get back, because we just care about getting a &[u8].

trait IntegerType {
    fn to_bytes_be(&self) -> impl AsRef<[u8]>;
}

Nicceboy · 2024-06-04T11:14:40Z

Yes but it should only be changing a couple of lines in each, as AFAIK all of the codecs eventually get to the point where an integer is a bag of big endian bytes and converts it, so it would be just changing that part to use the trait.

I was thinking that maybe we could avoid some of that code before that point, since the constraints are passed, and if for every codec, we define own implementation for different types, we could hardcode the ranges and format for encodings in that specific constraint. At least in OER (probably it was bad design), it always calculates everything based on the constraint ranges. So some calculations could be potentially skipped (if that is even worth it).

Since we're using static dispatch, we can use impl Trait here to say it returns AsRef<[u8]> and we don't have care about what type we get back, because we just care about getting a &[u8].

Wow, that was too easy and seems to work well. I need to read more books...

Nicceboy · 2024-06-07T20:50:14Z

Closing in favor of #256

Nicceboy added 30 commits June 1, 2024 19:14

feat: remove Cargo.lock from .gitignore, add result* related to nix

de6a5a0

Very experimental integer rework

4883b5c

Integer tests pass in OER

006b2a6

No debugs

481cb16

Append instead extend, remove unsed funcs

b1e3f7c

Refactoring..

ff4afc7

Integer refactoring

d0e2c73

Some pointer width gates, group traits

7fb0b83

Refactoring, i128 feature

0981825

More default allocations

5a00bc4

Debug assert for unsafe to be safe...

b627257

Fix BER integer encoding

28dd2a2

Integer comments

fdec8d7

Unsafe not needed, most integers work

be893e5

per: conditional decode based on inner int type with constraints

b2f260d

On numops no default conversion to BigInt

651029c

Checked operations to avoid overflows and automatic coversion to bigger

0111a91

types

Pre-allocate heap based on type size for speed

3f41d16

Revert "Pre-allocate heap based on type size for speed"

d4c509d

Apparently not much difference for speed. This reverts commit 1a171a9.

Simplify PrimitiveInteger

9159b6e

Remove most of the static i128 use

c10f20b

Fix .gitignore, add some integer integration tests, fix ber shifting

775680f

oer: allow useless conversions on ranges.rs

3bf7039

i64 without i128 works, test support missing

a87361a

int optimizations

edd49f8

integer cleanup

9789146

Remove Pow operator from Integer

1e53c16

integer: basic ops and error test

2f00c79

Fix commented unwrap_or, comment

89ce5c0

integer: explicit type for ord

7f26718

Nicceboy added 7 commits June 3, 2024 18:16

Add integer bench

a79690f

integer docs

86ad0e3

Add integer bench

156ab1a

Remove forgotten comment

a84ae00

Cleanup

793dccd

No default capacitys yet

8c990dd

Move general integer_to_bytes function

ab02596

Nicceboy closed this Jun 7, 2024

XAMPPRocky mentioned this pull request Aug 15, 2024

Integer as enum type and optimized constrained and variable-sized integer encoding #289

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework Integer type for speed #254

Rework Integer type for speed #254

Nicceboy commented Jun 3, 2024

XAMPPRocky commented Jun 3, 2024 •

edited

Loading

Nicceboy commented Jun 4, 2024

XAMPPRocky commented Jun 4, 2024

Nicceboy commented Jun 4, 2024

Nicceboy commented Jun 7, 2024

Rework Integer type for speed #254

Rework Integer type for speed #254

Conversation

Nicceboy commented Jun 3, 2024

XAMPPRocky commented Jun 3, 2024 • edited Loading

Nicceboy commented Jun 4, 2024

XAMPPRocky commented Jun 4, 2024

Nicceboy commented Jun 4, 2024

Nicceboy commented Jun 7, 2024

XAMPPRocky commented Jun 3, 2024 •

edited

Loading