Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance in CharOperation #3386

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

robstryker
Copy link
Contributor

When no matching characters / replacements are found in the string, the current replace method seems to be performing at an n^2 rate. The performance is actually really really bad.

Specifically, if the indexOf(etc) call returns -1, it clearly means there's no more work left to do here, but instead the existing logic continues to loop, and the indexOf(etc) call itself has a loop.

It seems not only needlessly complicated, but also inefficient.

This code seems to have originated in org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/pdom/util/CharArrayUtils.java, a file that was later renamed to org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/nd/util/CharArrayUtils.java.

The replace(etc) method in CharArrayUtils differs from that in CharOperation. The performance of the version in CharArrayUtils appears normal in comparison to that in CharOperation.

However, I can't tell how long ago this code was written, but, it would appear to me that simply using String functions performs comparably, so there's no reason whatsoever for all this complicated code?

What it does

How to test

Author checklist

@robstryker robstryker force-pushed the CharOperation_replace_perf branch from 4a3ee4d to fb37692 Compare December 4, 2024 17:53
next : for (int i = 0; i < max;) {
int index = indexOf(toBeReplaced, array, true, i);
if (index == -1) {
i++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual bug is here: If index == -1, there is not hit. No need to scan again from the next offset.

What I like about this impl here: it's garbage-free whereas the proposed solution needs to encode the char arrays as ISO or UTF-8.

Do you have time to measure the pressure on the GC if your version of replace is used in tight loops?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, that sounds like a better fix. One should compare the measurement for both solutions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you guys are happier with a smaller change, then I'm fine with it. I'll just abort the loop at that point and be done with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using "break" instead of "continue" and removing the "next" label would be prettier, but i am OK with this fix

@robstryker robstryker force-pushed the CharOperation_replace_perf branch 2 times, most recently from 2b0b3d2 to c721a10 Compare December 10, 2024 15:38
Signed-off-by: Rob Stryker <stryker@redhat.com>
@robstryker robstryker force-pushed the CharOperation_replace_perf branch from 3fe43e5 to 7dc30ce Compare December 10, 2024 21:57
@jukzi jukzi merged commit c2a99e6 into eclipse-jdt:master Dec 11, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants