Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_integers_using_lower_nibble #127

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dkuku
Copy link

@dkuku dkuku commented Jul 13, 2023

Currently integers are matched by pattern matching on number between 48 and 57 and then substracting 48 ascii code - that is integer 0
As 48 is 0x30 then we can just pattern match that the upper half of the byte is 3 and take the lower half - only checking if it's less than 10.
I prepared a livebook that compares both ways and there is 4-5% speed difference.

Comparison: 
half byte        3.04 K
original         2.93 K - 1.04x slower +13.23 μs

https://gist.github.com/dkuku/9004bbf92cb3b81e8ed6fe211b9c7fc6
I'm not sure how to measure the compilation time but it also should be a bit faster due to generating less code.

@josevalim
Copy link
Member

This is very nice! Instead of adding a special combinator, could we perhaps have some heuristic? Then we could find more occasions to apply this automatically on top of ascii ranges?

@dkuku
Copy link
Author

dkuku commented Jul 14, 2023

@josevalim
I've been thinking about it but I don't think it makes much sense - integer is a really special case where we encode 0x3y to just y.
The only place where it can be useful would be converting a string to octal or binary number - it won't even help when converting from hex representation to hex word because alphabet encoding starts with 0x41 and 0x61 - so the lower boundary still needs to be checked x >0 and x < 7; x = x + 9. which is the same as x>40 and x < 47; x = x - 31
But we don't have combinators for any of these yet. Having another example that makes sense would make easier to design this. Unless you have some ideas where it can be used ?

@josevalim
Copy link
Member

The immediate heuristic I can think of is this:

  1. If the distance between both numbers is < 16 and the rem(first, 16) == 0, then apply it
  2. If the distance between both numbers is < 32 and the rem(first, 32) == 0, then apply it
  3. If the distance between both numbers is < 64 and the rem(first, 64) == 0, then apply it
  4. If the distance between both numbers is < 128 and the rem(first, 128) == 0, then apply it

It could be extended to 2, 4, and 8, but I am not sure if those will be worth it.

@dkuku
Copy link
Author

dkuku commented Jul 14, 2023

I did some testing and it has no point for strings. With integers the benefit is that we have the integer already converted 'for free' and don't have to subtract ?0. This are different return values from the function comparing to what was returned from ascii(?0..?9).
With partial matching on strings there are also other problems. I'm matching on some of the bytes but I need to return everything that was given to the function so there are 2 variables needed: if lets say I want to match on the string "1" I need something like this:
defp function full = <<3::4, part::4>> when part < 1, do: full
or I have to reconstruct the actual variable
defp function <<3::4, part::4>> when part < 1, do: <<3::4, part::4>>
this makes more sense because binary patterns cannot be matched in parallel so I can't match on part of the byte and also on the full byte in one go.
It is also significantly slower:

Comparison: 
original         3.49 K
half_byte        2.37 K - 1.47x slower +134.91 μs

Memory usage statistics:

Name         Memory usage
original        493.70 KB
half_byte       704.64 KB - 1.43x memory usage +210.94 KB

@josevalim
Copy link
Member

I see! Btw, which OTP version are you using for testing? 26 has more recent optimizations.

@dkuku
Copy link
Author

dkuku commented Jul 15, 2023

Yes I'm using 26. But I just tried it on 25 and there my implementation is 3 x slower, when I run the same livebook. So it has no point for anyone with older otp version, So it has no point for anyone with older otp version. Maybe this should wait until elixir requires otp26?

@josevalim
Copy link
Member

Yeah, given the low gain, we should probably hold this one for a while (and also confirm this will hold in future Erlang/OTP versions). :) Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants