Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster division by 1000 #1203

Merged
merged 10 commits into from
Feb 15, 2024
Merged

Faster division by 1000 #1203

merged 10 commits into from
Feb 15, 2024

Conversation

xtonik
Copy link

@xtonik xtonik commented Jan 25, 2024

Division by 1000 was replaced with more performant custom method using multiply and add. I suppose that this tiny method will be inlined, otherwise the code must be copied to places where is called.

Another similar simple optimization was also done - multiplication of int by constant 0.80 in integer (exactly in long) domain.

The validity is verified by attached tests. They are provided as a comparison to original implementation and thus they don't need to be executed again and again, so they are marked as a ignored not to cause running tests slowly - on my PC they take around 10 s.

No behavior changes were done, performance improvement only.

@pjfanning
Copy link
Member

@xtonik thanks for the changes but this code would need to be sigificantly faster in order to justify the extra complexity - at least for me.

@cowtowncoder
Copy link
Member

cowtowncoder commented Jan 25, 2024

@xtonik Is there some article or reference showing the idea behind technique? I can believe that using multiplication and shifting can be faster but.. how much? Is it worth the change?

Maybe you have links to other projects doing this as optimization? (I assume it's a well-known tehnique)


import static org.junit.Assert.assertEquals;

public class ByteQuadsCanonicalizerTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop this test, along with optimization.

@cowtowncoder
Copy link
Member

I think I am open to change, provided there's something published about technique (ideally it's a well-known technique that I wasn't familiar with). Let's drop 80% calculation as it's not worth optimizing.

It'd be good to have jmh-based micro-benchmark as well; maybe I'll modify

https://github.com/cowtowncoder/misc-micro-benchmarks

with it.

@cowtowncoder cowtowncoder changed the title Faster division Faster division by 1000 Jan 26, 2024
@cowtowncoder
Copy link
Member

Ok. So a quick test with misc-microbenchmarks gives these numbers with JDK 17:

Benchmark                      Mode  Cnt  Score   Error  Units
DivBy1000.perfHandOptimized   thrpt    5  5.022 ± 0.037  ops/s
DivBy1000.perfSimpleDivision  thrpt    5  3.566 ± 0.029  ops/s

that is, +50% throughput for operation itself. With JDK 8 bit less; and for some reason I don't quite know, no speedup with JDK 21 (maybe the "unoptimized" case is actually optimized by JDK under the hood).

Whether this translates into measurable improvement for number serialization is open question; I guess it shouldn't hurt.

Also: I modified unit test a bit by moving String concatenation out of busy loop and test over full int range takes only about 2 seconds. I'll probably tweak it a bit to have automated, always run part, and separate manually run full test.

@cowtowncoder
Copy link
Member

@xtonik I think we can make this happen, and although it's hard to know how meaningful for real use improvements are, change is still relatively small and tested (I modified unit test to run through 1/7 of full int range and leave on for CI).

But one thing before I can merge this is that if we don't have a CLA for you, we'd need one (just needs to be sent once before the first contribution). It's this one:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

and usually easiest to print, fill & sign, scan/photo; then email cla at fasterxml dot com.
Once I get it I can proceed with last clean up and merge.

Looking forward to getting this merged, thank you in advance!

@xtonik
Copy link
Author

xtonik commented Jan 27, 2024

@xtonik I think we can make this happen, and although it's hard to know how meaningful for real use improvements are, change is still relatively small and tested (I modified unit test to run through 1/7 of full int range and leave on for CI).

But one thing before I can merge this is that if we don't have a CLA for you, we'd need one (just needs to be sent once before the first contribution). It's this one:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

and usually easiest to print, fill & sign, scan/photo; then email cla at fasterxml dot com. Once I get it I can proceed with last clean up and merge.

Looking forward to getting this merged, thank you in advance!

Ok, I will subscribe it.

@xtonik
Copy link
Author

xtonik commented Jan 27, 2024

Ok. So a quick test with misc-microbenchmarks gives these numbers with JDK 17:

Benchmark                      Mode  Cnt  Score   Error  Units
DivBy1000.perfHandOptimized   thrpt    5  5.022 ± 0.037  ops/s
DivBy1000.perfSimpleDivision  thrpt    5  3.566 ± 0.029  ops/s

that is, +50% throughput for operation itself. With JDK 8 bit less; and for some reason I don't quite know, no speedup with JDK 21 (maybe the "unoptimized" case is actually optimized by JDK under the hood).

Whether this translates into measurable improvement for number serialization is open question; I guess it shouldn't hurt.

Also: I modified unit test a bit by moving String concatenation out of busy loop and test over full int range takes only about 2 seconds. I'll probably tweak it a bit to have automated, always run part, and separate manually run full test.

As you said, the compiler for new Java versions does more optimizations than older ones. The same situation (not remembering from which version) is probably for String.replace(String, String), when given string is one character long, then it is replaced by String.replace(char, char). This is what is about one of my prepared PR.

@xtonik
Copy link
Author

xtonik commented Jan 29, 2024

To get more familiar where the magic numbers came from, there are registered sequences A346495 and A346496 for this.

@cowtowncoder cowtowncoder added the cla-needed PR looks good (although may also require code review), but CLA needed from submitter label Feb 1, 2024
@xtonik
Copy link
Author

xtonik commented Feb 14, 2024

I have just sent my CLA to given email as requested.

If someone is interested and have some time, I have prepared another changes. As they are tiny, it takes couple of minutes to check if some of them is valuable to create PRs.

@cowtowncoder cowtowncoder added 2.17 Issues planned (at earliest) for 2.17 and removed cla-needed PR looks good (although may also require code review), but CLA needed from submitter labels Feb 15, 2024
@cowtowncoder cowtowncoder added this to the 2.17.0 milestone Feb 15, 2024
@cowtowncoder cowtowncoder merged commit b2e382a into FasterXML:2.17 Feb 15, 2024
5 checks passed
cowtowncoder added a commit that referenced this pull request Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.17 Issues planned (at earliest) for 2.17
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants