-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix JNI wrapper memory leak #2178
base: main
Are you sure you want to change the base?
Conversation
to Java GC. Use the Cleaner API requiring Java 9+ to avoid copying tensor data Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Gong Su <gong_su@hotmail.com>
src/Runtime/jni/jniwrapper.c
Outdated
(*env)->GetDirectBufferAddress(env, java_data), jni_data != NULL, | ||
jecpt_cls, "jni_data=%p", jni_data); | ||
|
||
free(jni_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into this during testing of my older finalize()
-based PR, but jni_data
points to the aligned allocation, not to the start of the allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are saying but I'm a bit surprised why this would happen. Basically, when you call NewDirectByteBuffer
, you give it the pointer to the memory block. Now when you call GetDirectBufferAddress
, it's going to give you a different (aligned) pointer? I mean I gave it the pointer, now I want it back, why would it change it behind my back?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK maybe you mean onnx-mlir might give you an aligned pointer that's not the start of the allocation? OK in that case we need fiddle a bit like the offset thing you did. Itchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I looked at the conversion on your finalize PR and you said:
I thought about doing that but I wanted to avoid a limitation with java - all of the integral based arguments are signed. With an offset, which can be signed, I can keep the semantics more or less the same. With a Cleaner based approach I can create another DirectByteBuffer holding the actual allocation variable which the cleaner can use to clean up - no offset calculation required.
So I think instead of passing in the offset, we could just pass in the aligned pointer as jlong, right? As you said later, it doesn't really matter if it's signed or not. 64-bit is 64-bit. It only matters how you interpret it. Doing that would save us from having to keep another flag and doing the math.
I tried this in our build container but I got errors trying to build onnx-mlir
I'm guessing it's because in our builder image we only install Java 1.8 since that's what we ultimately target running the models on. I can move that to Java 11. However, will that cause any issues then with the compiled model .jar files being compatible to run on Java 8? |
I think you should keep your 1.8 and use finalize if that's what you are targeting. The finalize code is actually very simple. You can replace OMTensor.java with this. |
I'd greatly prefer to not have to force source code changes into our build that's not in the community. It creates tech debt and becomes a maintenance nightmare in the long term. If that's the only way to do have the code "support" both, then I'd greatly lean towards either only supporting |
I thought you just want to do a quick test on 1.8 yourself. That's certainly not the only way to support both. We can have cmake pick the right version depending on which Java version you are using. As I mentioned in another thread, finalize does not work on Java 11 (compile gives warning, runtime has no effect) so we can't really have only finalize. |
Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Gong Su <gong_su@hotmail.com>
@gongsu832 do you need help from MS folks for the Window failure; or maybe just re-update to the latest and see if this now works? |
@gongsu832 @cjvolzka is there a resolution for this PR? |
A fix was already merged to address this issue in a way that works with Java 1.8. I believe this PR is to replace that with something that aligns better with later Java 17 standards. However, one of our users (and as far as I know they are the only/primary user of the Java interface) will still need Java 1.8 support for at least another year, if not two. Given the long time before this could merge, I'd say it can be closed, but I leave that up to @gongsu832 |
I'd like to keep it open just so we don't forget and also I can rebase the patch to the latest code. So I converted it draft. |
@gongsu832 Still desirable? If so can we reactualizes it? Or should we close? |
This PR serves as a reminder that when Java 1.8 goes out of service we should revisit the issue and converge on a single and simpler direct byte buffer cleaner. So I'd keep it open unless there is some problem with doing that. |
Fix JNI wrapper memory leak due to NewDirectByteBuffer not subjecting to Java GC. Use the Cleaner API requiring Java 9+ to avoid copying tensor data