Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vision camera v3 #1121

Merged
merged 92 commits into from
Mar 26, 2024
Merged

Vision camera v3 #1121

merged 92 commits into from
Mar 26, 2024

Conversation

jtklein
Copy link
Collaborator

@jtklein jtklein commented Feb 9, 2024

Now that support for tap to focus on Android has landed in react-native-vision-camera, another try to switch to vision-camera v3 from v2.

@jtklein jtklein marked this pull request as ready for review March 13, 2024 09:27
@jtklein jtklein marked this pull request as draft March 13, 2024 13:43
@jtklein jtklein marked this pull request as ready for review March 14, 2024 13:07
@jtklein jtklein requested a review from kueda March 14, 2024 18:18
@jtklein
Copy link
Collaborator Author

jtklein commented Mar 14, 2024

@kueda I have tested this on debug and release builds and from what I can tell it does work as it should is now finally a replacement for vision-camera v2 for all features that we used. The code I think is mostly fine and I would merge it in, however, I had so many attempts with this library that I mainly need a fresh pair of eyes, so if you could test the camera once again with these changes and check the code that would be fab.
On iOS the experience should stay the same, from what I have tested the average time it takes to process a frame is the same compared to using vision-camera v2.
On Android, this PR adds around 400-500ms to the time it takes to process one frame on average. This might be a blocker or we merge for iOS and keep iterating for Android. There are some tools for v3 that we can use to speed up the processing that are not available for v2. I have tried e.g. replacing our code for resizing in our vision-plugin with this https://github.com/mrousavy/vision-camera-resize-plugin and it did make the average time faster again.

@kueda
Copy link
Member

kueda commented Mar 14, 2024

I'll give it a test this afternoon, @jtklein. Any idea where the slowdown in Android is coming from? Is it a problem with vision-camera itself, or perhaps with our frame processor?

Copy link
Member

@kueda kueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly this seems to work fine, with one blocking and one non-blocking issue.

Blocking: a prediction in the ARCamera seems to linger on screen for at least 5 seconds after moving away from that subject, even when you use the debug tool to change numStoredResults to 1. So it feels like the default behavior here has changed, and that numStoredResults is not having an effect.

Non-blocking: in Android (Pixel 8, Android 14) if I walk around for a few minutes with the ARCamera open identifying some plants, I get this crash fairly consistently:
Screenshot_20240314-134113. I don't think this needs to block for MVP since it's only on Android, but it is fairly annoying for an Android user like me.

@jtklein
Copy link
Collaborator Author

jtklein commented Mar 25, 2024

@kueda there was indeed a bug with numStoredResults. I have updated the branch: if you set numStoredResults to 1 now you should always see the last frame's result. So, if you move from one object to the next it should change results pretty quickly (depending on the speed of frame processing though).
I have added the age of the last result on the screen in debug mode, so, e.g. if you have 4 results stored and you change from a species with a high prediction score to one with a lower you will see the age of the result ticking up until the new species is the best result. It will still take quite some time for the result to change to the new species depending on the phone's speed when the score of the second species is smaller.
A prediction in the ARCamera might still linger on screen for longer than 5 seconds after moving away from that subject. I chatted with Alex a bit about that and one way to mitigate that I implemented now is to use a linear weighting in the frames' results, so older frames' results are now weighed less than newer ones. I feel that it improves the lingering aspect when you move from one subject to the next one now, although it does not help when moving from one subject to no subject.
However, the long lingering of a prediction should not happen anymore if you set numStoredResults to 1.
Also, changing to a lower number for numStoredResults should now be possible while keeping the ARCamera open.

All in all, I think I addressed the blockers fully, let me know what you think.
I did not look at the non-blocker issue.

Copy link
Member

@kueda kueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job finding and patching that missing method in react-native-worklets-core! I can confirm that "num stored results" is working, and the age of results thing is useful (though it might benefit from debug styling so everyone knows it's just a debugging feature; doesn't need to block).

I'm still experiencing the Android crash after ~5+ minutes of continuous usage, so that might be worth an issue after you merge.

@jtklein jtklein merged commit 0e0a656 into main Mar 26, 2024
12 of 14 checks passed
@jtklein jtklein deleted the vision-camera-3 branch March 29, 2024 17:35
@jtklein jtklein restored the vision-camera-3 branch March 29, 2024 17:35
@jtklein jtklein deleted the vision-camera-3 branch June 18, 2024 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants