Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip Stable and unique Lambda and LambdaForm class names #19763

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ThanHenderson
Copy link
Contributor

This WIP patch accompanies ibmruntimes/openj9-openjdk-jdk11#791 and:

  1. enables LambdaForm SCC caching
  2. adds a findInSCC procedure for looking up Lambda and LambdaForm classes
  3. reduces the number of class byte comparisons for Lambdas when creating ROM classes
  4. removes unnecessary special handling for Lambdas due to the addition of a stable class name

Only currently tested for Java versions < 21 due to the index number removal on Lambdas in 21.

Signed-off-by: Nathan Henderson nathan.henderson@ibm.com

@ThanHenderson
Copy link
Contributor Author

@babsingh @hangshao0 Could you guys take a look?

My main point of concern here is "uniqueness" of the uniqueIDs. The Lambda and LambdaForm class names are stable with these uniqueIDs included, and the probability of collisions is low but not impossible. Therefore, during ROM class creation we still do byte comparisons for the SCC cached class and the created class. If we were comfortable with the uniqueness, we could just check the SCC for a class with the same name and avoid the class comparisons altogether.

I am also using SH_CacheMap::findNextROMClass for the early lookup via findInSCC, maybe there is a better API to use here. I chose this one because it was the first API I could get reliably working, while also making it easy to check if there were other classes with the same names in the SCC.

Another minor issue is that I've created the findInSCC procedure in MethodHandleNatives but am also using it during Lambda creation, which isn't ideal because Lambdas aren't part of the MethodHandle component. We'd ideally just place the procedure somewhere more appropriate if we decide to move forward with this work.

I am open to any suggestions.

@hangshao0
Copy link
Contributor

hangshao0 commented Jun 28, 2024

My main point of concern here is "uniqueness" of the uniqueIDs. The Lambda and LambdaForm class names are stable with these uniqueIDs included, and the probability of collisions is low but not impossible.

I am also concerned about potential problems from collisions. Considering the large number of applications running on our JVM, if the wrong class is being used, it will result in unexpected behaviours. It could be very hard for our service and VM engineers to debug and trace back the cause to the collision.

I see the uniqueID is an int. Is there a reason to restrict that to 4 bytes int ? I see you are relying on String.hashCode() for the hashcode, which has much higher collision possibility that algorithms like SHA-256.

@hangshao0
Copy link
Contributor

You may also consider taking advantage of J9_ROM_ADDRESS_SUFFIX to increase the uniqueness. It does not have to be all zeros.

@ThanHenderson
Copy link
Contributor Author

I see the uniqueID is an int. Is there a reason to restrict that to 4 bytes int ? I see you are relying on String.hashCode() for the hashcode, which has much higher collision possibility that algorithms like SHA-256.

@hangshao0 There is no reason for this restriction. I was just using what I had there. With something like SHA-256, are you comfortable with the collision probability such that we can ignore collisions in practice? If so, do you have any suggestions for what to take the SHA-256 of?

@hangshao0
Copy link
Contributor

I guess the unqinueness of the class comes from the strings that you used to calculate the hash code ? Before discussing about switching to things like SHA-256, I take a look at #19371 (comment).

Maybe we can concatenating those strings and store it within the romClass. If we have UTF8* getUniqueID(J9ROMClass *LFClass) to return the concatenated string, we can do string comparison in case of collision.

Another possibility that I can think of is https://github.com/eclipse-openj9/openj9/blob/master/jcl/src/openj9.sharedclasses/share/classes/com/ibm/oti/shared/SharedClassTokenHelperImpl.java. The 2 public APIs there associate the class with a string token. After the class is stored with a string token, findSharedClass() will only return the class if the same same class name and token are provided.

@ThanHenderson
Copy link
Contributor Author

Another possibility that I can think of is https://github.com/eclipse-openj9/openj9/blob/master/jcl/src/openj9.sharedclasses/share/classes/com/ibm/oti/shared/SharedClassTokenHelperImpl.java. The 2 public APIs there associate the class with a string token. After the class is stored with a string token, findSharedClass() will only return the class if the same same class name and token are provided.

Ok, if this were the case then what I think what we would do is just check in the Java code with API you mentioned, continue to ignore LambdaForm classes in ROMClassBuilder's SCStoreTransaction, then just add them to the cache after in the Java code.

And the token I guess could be the stringified SHA hash of the concatenated string of the input parameters.

@fengxue-IS
Copy link
Contributor

fengxue-IS commented Jul 25, 2024

Another possibility that I can think of is https://github.com/eclipse-openj9/openj9/blob/master/jcl/src/openj9.sharedclasses/share/classes/com/ibm/oti/shared/SharedClassTokenHelperImpl.java. The 2 public APIs there associate the class with a string token. After the class is stored with a string token, findSharedClass() will only return the class if the same same class name and token are provided.

Ok, if this were the case then what I think what we would do is just check in the Java code with API you mentioned, continue to ignore LambdaForm classes in ROMClassBuilder's SCStoreTransaction, then just add them to the cache after in the Java code.

And the token I guess could be the stringified SHA hash of the concatenated string of the input parameters.

Is there any reason preventing from simply doing a string concat of string tokens as the unique id (which use suffix of the lambda classname)? any hashing of the tokens cannot guarantee the uniqueness. As the target is to only do a name/id check and remove the need for class byte comparison, then having a slightly long name comparison to prevent collision is reasonable.

lambdaClassName = targetClass.getName().replace('.', '/') + "$$Lambda$" + uniqueID;
where uniqueID = targetClass.getName() + invokedType.toString() + ...

@ThanHenderson
Copy link
Contributor Author

We can do the string concatenation and avoid hashing.

But we still have two options here:
One where we keep the LambdaForm class name the same as it was previously, don't cache them during ROM class building, and cache them from the Java code with these APIs https://github.com/eclipse-openj9/openj9/blob/master/jcl/src/openj9.sharedclasses/share/classes/com/ibm/oti/shared/SharedClassTokenHelperImpl.java

Or just edit this current patch to use the concatenated string rather than the hash.

@hangshao0
Copy link
Contributor

Or just edit this current patch to use the concatenated string rather than the hash.

This is an easier solution.

@ThanHenderson
Copy link
Contributor Author

ThanHenderson commented Jul 26, 2024

Is there any reason preventing from simply doing a string concat of string tokens as the unique id

This is fine for the LambdaForms, except for the stringified LambdaForm parameter, but for Lambdas we need to be careful here because stringifying all the tokens may contain characters that are invalid in class names and are rejected (e.g. any of the ...Type parameters may contain []).

For parameter strings that may contain those characters, what are your proposals? I was thinking we could still take the hashCode() of each of those parameters strings and just append it to the string.

edit: Actually, we can just string replace the invalid characters with _

@fengxue-IS
Copy link
Contributor

Actually, we can just string replace the invalid characters with _

This seem to be the simplest solution, any hashcode we use could result in a collision which will complicate the cache/lookup logic.

@ThanHenderson
Copy link
Contributor Author

In the extensions patch, I've replaced all occurrences of hashCode with the raw strings concatenated with the special characters replaced with _. (for whatever reason, replaceAll was causing build failures, so currently there are just a bunch of replace calls chained.)

If we are happy on a high level with using the stringified version rather than the hash based version, what about the use of SH_CacheMap::findNextROMClass for the early lookup in findInSCC. @hangshao0 is there a better API that you think should be used here? When I was trying to get the other APIs to work (the hooks for example) I was having a very difficult time; this was just the one that I got working.

@hangshao0
Copy link
Contributor

If we are happy on a high level with using the stringified version rather than the hash based version, what about the use of SH_CacheMap::findNextROMClass for the early lookup in findInSCC. @hangshao0 is there a better API that you think should be used here?

You can add a new API like j9shr_jcl_findOrphanROMClassByUniqueID() which is a wrapper of SH_CacheMap::findNextROMClass() and put this new API into the function table of SCAbstractAPI.

@ThanHenderson ThanHenderson force-pushed the stable-lambda branch 6 times, most recently from b391468 to 8771e92 Compare August 1, 2024 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants