[Hack Week] A generic solution for resolving the ANRs caused by accessing local database #20937

jarvislin · 2024-06-04T15:28:49Z

This is a Hack Week project, this PR also fixes #20850 for demonstrating how to use the helper to update the current codebase.

Why

There are a number of Android ANR errors reported on Sentry, and some of these ANRs are caused by local database access. The likely cause is that some code is accessing the local database without switching to the background thread, which can lead to ANRs in certain situations.

To address this issue, I have created a generic helper that can be used to easily handle thread switching. This helper can be used by anyone who encounters a similar situation in the future.

P.S. This helper is not limited to accessing the database; this helper can also be used for any requirement that needs to be executed on a IO thread and then switched back to the main thread.

Use with Caution

This helper looks convenient, but the goal is not to use it extensively. Good code should have a well-designed architecture and proper implementations, which wouldn't require this helper. It's meant for quick fixes in the code's incorrect implementations. However, in the long run, we should aim for a complete refactor.

Note

While working on #20851, I found that the solution to modify the original blocking operation to thread switching introduced a race condition that caused UI errors. The culprit was a piece of code operating on another thread. This means that the original code needs to be refactored further to avoid this issue. It's crucial to carefully check for potential race conditions when dealing with similar problems.

To Test:

Sign in JP app.
Go to the Reader tab.
Click on a post.
Click on a tag in that post.
Click Subscribe/unsubscribe button on the tag header.
It should work as usual.
Done, thank you!

Post	Tag Header

Regression Notes

Potential unintended areas of impact
- reader
What I did to test those areas of impact (or what existing automated tests I relied on)
- manual
What automated tests I added (or what prevented me from doing so)
- n/a

PR Submission Checklist:

I have completed the Regression Notes.
I have considered adding accessibility improvements for my changes.
I have considered if this change warrants user-facing release notes and have added them to RELEASE-NOTES.txt if necessary.

Testing Checklist (strike-out the not-applying and unnecessary ones):

skipped

…irements

dangermattic · 2024-06-04T15:31:25Z

	1 Message
📖	This PR is still a Draft: some checks will be skipped.

Generated by 🚫 Danger

sonarcloud · 2024-06-04T15:36:00Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

wpmobilebot · 2024-06-04T15:52:52Z

📲 You can test the changes from this Pull Request in WordPress by scanning the QR code below to install the corresponding build.

	App Name	WordPress
	Flavor	Jalapeno
	Build Type	Debug
	Version	pr20937-032d503
	Commit	`032d503`
	Direct Download	`wordpress-prototype-build-pr20937-032d503.apk`

Note: Google Login is not supported on these builds.

wpmobilebot · 2024-06-04T15:55:21Z

📲 You can test the changes from this Pull Request in Jetpack by scanning the QR code below to install the corresponding build.

	App Name	Jetpack
	Flavor	Jalapeno
	Build Type	Debug
	Version	pr20937-032d503
	Commit	`032d503`
	Direct Download	`jetpack-prototype-build-pr20937-032d503.apk`

Note: Google Login is not supported on these builds.

codecov · 2024-06-04T16:08:55Z

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 40.93%. Comparing base (56f0686) to head (032d503).
Report is 2 commits behind head on trunk.

Files	Patch %	Lines
...org/wordpress/android/datasets/AsyncTaskHandler.kt	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            trunk   #20937      +/-   ##
==========================================
- Coverage   40.93%   40.93%   -0.01%     
==========================================
  Files        1518     1519       +1     
  Lines       69550    69554       +4     
  Branches    11473    11473              
==========================================
  Hits        28473    28473              
- Misses      38491    38495       +4     
  Partials     2586     2586

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

antonis

Thank you for your work on this @jarvislin 🙇
I've tested the specific use case and it works as expected for me 🎉
The introduced AsyncHandler helper makes a lot of sense and adds structure to the code especially when calling from java 🥇

thomashorta

I see this PR has already been merged, but looking at it I felt it could be worth a post-merge Review to maybe rethink if this makes sense to be kept as-is.

I understand this PR is more of a quick solution to some of the ANR problems we have, but I have concerns about its rationale being a bit too simplistic and potentially introducing other issues that degrade user experience and are harder to track (non-crashing/behavioral bugs).

Besides the technical concerns I pointed below in the code comments, I feel like each place in the codebase that currently calls DB/network code from the main thread needs to be investigated individually, and the long-term solution for most of them is likely some refactor to work correctly with async and concurrent calls.

That said it might make sense to still have some quick fix like this for some stuff, but we need to be aware of the risks.

thomashorta · 2024-06-05T16:25:33Z

WordPress/src/main/java/org/wordpress/android/datasets/AsyncTaskHandler.kt

+     */
+    @JvmStatic
+    fun <T> load(backgroundTask: () -> T, callback: AsyncTaskCallback<T>) {
+        CoroutineScope(Dispatchers.IO).launch {


I have a few concerns about the usage of Coroutines here.

CoroutineScope

The CoroutineScope is being created ad hoc instead of using an existing Scope or Context/Job, which is usually attached to some lifecycle-aware component. This is essentially the same as using the GlobalScope, which should be avoided.

This means that the Job launched here is completely standalone, bypassing one of the biggest advantages introduced by coroutines: structured concurrency.

In practice, this means that even if the caller of the coroutine dies, the callback lambda will still be called (since the Scope is never canceled) which can cause issues if said callback references something that was destroyed.

TBH I'm not sure what would be the best solution here, but using an existing Scope that's close to the component that launches this Job would be preferred, all-in-all, thinking about this helper class, I'd say receiving the CoroutineScope via argument would make more sense so the caller can pass an appropriate Scope.

Hardcoded Dispatchers.IO

This is another thing that should be avoided, not only because it's hard to test, but also because, based on the name of this class (async TASK), it looks like this is supposed to be a generic "offload-to-background" helper.

This means that passing a backgroundTask that runs computational work could be expected, but that kind of work should be run in the Dispatchers.Default, while Dispatchers.IO should be used only for IO/waiting tasks. More info here.

The solution for this should probably be to receive the CoroutineDispatcher as an argument since the caller would be the one who knows what kind of task is being run.

@thomashorta Thank you for pointing out the potential issues in the current implementation.

I'd say receiving the CoroutineScope via argument would make more sense so the caller can pass an appropriate Scope.

Yeah this is also the way to handle the lifecycle stuff properly, will update it later.

For the Hardcoded Dispatchers.IO, I'm not sure if I should receive the CoroutineDispatcher as an argument since the initial idea was to address issues related to accessing the database, opening it up to all dispatchers seems a bit over-engineered to me.

Therefore, I propose we keep this part as is and handle it through renaming for now. If a future need arises, we can then add the dispatcher argument. What do you think?

P.S. I think this helper is more related to the legacy code, especially in Java. Those codebase write in Kotlin should be able to handle the requirements without using this helper. So the code will use this helper might be less than we expected (and will be less after some refactors).

thomashorta · 2024-06-05T16:29:11Z

WordPress/src/main/java/org/wordpress/android/ui/reader/adapters/ReaderPostAdapter.java

-        final boolean isAskingToFollow = !ReaderTagTable.isFollowedTagName(currentTag.getTagSlug());
-
-        final String slugForTracking = currentTag.getTagSlug();
+        AsyncTaskHandler.load(


I understand the usage here was more of an example of how to use this new AsyncTaskHandler but in this case (and I guess many other cases that are currently using DBs/network in the main thread) I feel the legacy code is in a bad state and the real fix should be a complete refactor.

For instance: this is a RecyclerView Adapter and it's dealing with data loading, which is clearly a bad practice, so a more coherent solution would be refactoring how the data is updated in the Adapter, probably moving the data loading to the appropriate ViewModel (which would have access to a proper CoroutineScope and ways of doing async work) and propagating the list state correctly to the Activity/Fragment containing the Adapter.

My approach is to first address the root cause with minimal changes and then look for opportunities to refactor. Refactoring involves more extensive changes, and dealing with legacy code introduces even more uncertainty. Therefore, I avoid making major changes right away unless I'm very confident or have plenty of time. It's a bit like a philosophical trade-off, but I agree that these related files need a complete refactor. In the long run, I believe the final outcome will be consistent.

jarvislin · 2024-06-06T09:13:49Z

@thomashorta Thank you so much for reviewing this PR. I plan to create a follow-up PR to handle the issues in the current implementation. And I just updated the section of Use with Caution.

Jarvis Lin added 2 commits June 4, 2024 11:42

Create an alternative of AsyncTask for handling thread-switching requ…

83c6a66

…irements

Wrap the database-accessing code with AsyncTaskHandler

3eb24e4

Add a comment

032d503

jarvislin added [Type] ANR Application Not Responding Reader labels Jun 4, 2024

jarvislin added this to the 25.1 milestone Jun 4, 2024

jarvislin marked this pull request as ready for review June 4, 2024 15:47

jarvislin requested a review from antonis June 4, 2024 15:53

antonis approved these changes Jun 5, 2024

View reviewed changes

antonis merged commit a24e5d1 into trunk Jun 5, 2024
23 checks passed

antonis deleted the hackweek/database-anr branch June 5, 2024 10:47

thomashorta reviewed Jun 5, 2024

View reviewed changes

jarvislin mentioned this pull request Jun 6, 2024

Fix a potential issue related to lifecycle #20943

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hack Week] A generic solution for resolving the ANRs caused by accessing local database #20937

[Hack Week] A generic solution for resolving the ANRs caused by accessing local database #20937

jarvislin commented Jun 4, 2024 •

edited

Loading

dangermattic commented Jun 4, 2024

sonarcloud bot commented Jun 4, 2024

wpmobilebot commented Jun 4, 2024

wpmobilebot commented Jun 4, 2024

codecov bot commented Jun 4, 2024

antonis left a comment

thomashorta left a comment

thomashorta Jun 5, 2024 •

edited

Loading

jarvislin Jun 6, 2024 •

edited

Loading

jarvislin Jun 6, 2024 •

edited

Loading

thomashorta Jun 5, 2024

jarvislin Jun 6, 2024

jarvislin commented Jun 6, 2024 •

edited

Loading

[Hack Week] A generic solution for resolving the ANRs caused by accessing local database #20937

[Hack Week] A generic solution for resolving the ANRs caused by accessing local database #20937

Conversation

jarvislin commented Jun 4, 2024 • edited Loading

Why

Use with Caution

Note

To Test:

Regression Notes

PR Submission Checklist:

Testing Checklist (strike-out the not-applying and unnecessary ones):

dangermattic commented Jun 4, 2024

sonarcloud bot commented Jun 4, 2024

Quality Gate passed

wpmobilebot commented Jun 4, 2024

wpmobilebot commented Jun 4, 2024

codecov bot commented Jun 4, 2024

Codecov Report

antonis left a comment

Choose a reason for hiding this comment

thomashorta left a comment

Choose a reason for hiding this comment

thomashorta Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

CoroutineScope

Hardcoded Dispatchers.IO

jarvislin Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

jarvislin Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

thomashorta Jun 5, 2024

Choose a reason for hiding this comment

jarvislin Jun 6, 2024

Choose a reason for hiding this comment

jarvislin commented Jun 6, 2024 • edited Loading

jarvislin commented Jun 4, 2024 •

edited

Loading

thomashorta Jun 5, 2024 •

edited

Loading

jarvislin Jun 6, 2024 •

edited

Loading

jarvislin Jun 6, 2024 •

edited

Loading

jarvislin commented Jun 6, 2024 •

edited

Loading