Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spotify OOM crash #40886

Closed
4 of 6 tasks
ShivanKaul opened this issue Sep 5, 2024 · 10 comments · Fixed by brave/brave-core#25877
Closed
4 of 6 tasks

Spotify OOM crash #40886

ShivanKaul opened this issue Sep 5, 2024 · 10 comments · Fixed by brave/brave-core#25877
Assignees
Labels
crash OS/Desktop priority/P2 A bad problem. We might uplift this to the next planned release. QA/No release-notes/exclude webcompat/not-shields-related Sites are breaking because of something other than Shields.
Milestone

Comments

@ShivanKaul
Copy link
Collaborator

Description

See the following reports:

  1. Out of memory for Spotify and Instagram reels #36356
  2. Spotify web player crashes every 10 minutes #40311
  3. https://community.brave.com/t/spotify-keeps-crashing-in-brave/532078/101

@atuchin-m suggested that this might be happening because of several requestIdleCallback trigged by reCaptcha JS on the spotify site: #36356 (comment)

Doesn't look related to Shields or Playlist from reports (disabling either doesn't help).

Steps to reproduce

  1. Open the Web version of Spotify
  2. Play music
  3. Go to a different tab and let Spotify run

Actual result

The tab complains of high memory usage, and eventually crashes.

Expected result

Tab should not crash.

Reproduces how often

Easily reproduced

Brave version (brave://version info)

All

Channel information

  • release (stable)
  • beta
  • nightly

Reproducibility

  • with Brave Shields disabled
  • with Brave Rewards disabled
  • in the latest version of Chrome

Miscellaneous information

No response

@ShivanKaul ShivanKaul added priority/P2 A bad problem. We might uplift this to the next planned release. webcompat/not-shields-related Sites are breaking because of something other than Shields. OS/Desktop labels Sep 5, 2024
@kjozwiak
Copy link
Member

kjozwiak commented Sep 17, 2024

So the above is still happening. I was listening to some music while working and it crashed after about ~2hrs. It was running in a tab and when I switched to it to pause the music, the WebView crashed. Using the following on Win 11 x64:

Brave | 1.72.12 Chromium: 129.0.6668.42 (Official Build) nightly (64-bit)
-- | --
Revision | 96e71bd286af7b7f34dd2fc749810eaf231c55a5
OS | Windows 11 Version 23H2 (Build 22631.4169)

Crashes:

  • b6fd1200-9ce4-970c-0000-000000000000
  • 08911700-9ce4-970c-0000-000000000000

Screenshot 2024-09-17 164819

[ 00 ] RaiseException
[ 01 ] partition_alloc::internal::OnNoMemoryInternal(unsigned __int64) ( oom.cc:41 )
[ 02 ] partition_alloc::TerminateBecauseOutOfMemory(unsigned __int64) ( oom.cc:64 )
[ 03 ] partition_alloc::internal::OnNoMemory(unsigned __int64) ( oom.cc:74 )
[ 04 ] partition_alloc::internal::PartitionExcessiveAllocationSize(unsigned __int64) ( partition_oom.cc:19 )
[ 05 ] partition_alloc::internal::`anonymous namespace'::PartitionDirectMap(partition_alloc::PartitionRoot *,partition_alloc::internal::AllocFlags,unsigned __int64,unsigned __int64) ( partition_bucket.cc:275 )
[ 06 ] partition_alloc::internal::PartitionBucket::AllocNewSlotSpan(partition_alloc::PartitionRoot *,partition_alloc::internal::AllocFlags,unsigned __int64) ( partition_bucket.cc:641 )
[ 07 ] partition_alloc::internal::PartitionBucket::SlowPathAlloc(partition_alloc::PartitionRoot *,partition_alloc::internal::AllocFlags,unsigned __int64,unsigned __int64,partition_alloc::internal::SlotSpanMetadata * *,bool *) ( partition_bucket.cc:1363 )
[ 08 ] partition_alloc::PartitionRoot::AllocFromBucket(partition_alloc::internal::PartitionBucket *,unsigned __int64,unsigned __int64,unsigned __int64 *,unsigned __int64 *,bool *) ( partition_root.h:1282 )
[ 09 ] partition_alloc::PartitionRoot::AllocInternalNoHooks(unsigned __int64,unsigned __int64) ( partition_root.h:2158 )
[ 10 ] allocator_shim::internal::PartitionMalloc(unsigned __int64,void *) ( allocator_shim_default_dispatch_to_partition_alloc.cc:204 )
[ 11 ] base::allocator::dispatcher::internal::DispatcherImpl<base::PoissonAllocationSampler>::AllocFn(unsigned __int64,void *) ( dispatcher_internal.h:129 )
[ 12 ] ShimMalloc(unsigned __int64,void *) ( shim_alloc_functions.h:112 )
[ 13 ] malloc(unsigned __int64) ( allocator_shim_override_ucrt_symbols_win.h:86 )
[ 14 ] _malloc_base(unsigned __int64) ( internal.cc:98 )
[ 15 ] operator new(unsigned __int64) ( new_scalar.cpp:36 )
[ 16 ] url::Origin::GetURL() ( origin.cc:159 )
[ 17 ] content_settings::`anonymous namespace'::GetOriginOrURL(blink::WebFrame const *) ( brave_content_settings_agent_impl.cc:58 )
[ 18 ] RtlUnwind
[ 19 ] RtlUnwind
[ 20 ] RtlUnwind
[ 21 ] blink::ScriptedIdleTaskController::ScheduleCallback(int,unsigned int) ( scripted_idle_task_controller.cc:123 )
[ 22 ] 0xaaaaaaaaaaaaaaaa

@kjozwiak kjozwiak added the crash label Sep 17, 2024
@atuchin-m
Copy link
Contributor

We identified and upstreamed a memory hog fix to Chromium. In fact, the primary issue is in the site JS and fixing C++ hog doesn't help enough (but probably extended a lifetime a little).
The issue is hard to debug and can't be reproduced on a local build because of DRM protection on spotify.

The real issue is ReCaptcha. Why?

  1. An idle Spotify tab registers a lot of idleCallback. It starts with a few one, but after 10-20 minutes we get >1000. That is the reason why it crashes the renderer. All of the requestIdleCallback calls are ReCaptcha-related. (screenshot 1 here)

  2. Here is the list of the origins with ScriptedIdleTaskController crashes during the last week. All of them uses ReCaptcha.

image

@atuchin-m
Copy link
Contributor

atuchin-m commented Sep 18, 2024

I'm trying to make a simplified example to reproduce this issue.
The steps:

  1. start a local https server
  2. make index.html with the following content. <api_key> should match to a test domain (i.e. open.spotify.com):
<!doctype html>
<html lang="en">
    <head>
        <script src="https://www.google.com/recaptcha/enterprise.js?render=<api_key>" async="" defer=""></script>
    </head>
    <body>
    </body>
</html>
  1. Redirect the test domain to 127.0.0.1 via OS hosts file.
  2. Launch the browser with a clean profile (to bypass HSTS) and visit https://<test-domain>/index.html. Ignore SSL warnings.
  3. Start recoding js perf trace via devtools, leave the page in the background for 2 min.

Actual result: the devtools trace shows a bunch of idle callbacks in a row.
image

Expected result (from Chrome): only few idle callbacks in a row.
image

@atuchin-m
Copy link
Contributor

Here is the screencast of the issue (Brave):
https://github.com/user-attachments/assets/a00d4bb1-52d6-463f-af27-6f8c54b0ceeb

@atuchin-m
Copy link
Contributor

@atuchin-m
Copy link
Contributor

--disable-features=BraveRoundTimeStamps resolves the issue.
The reason is that the feature reduces the timer resolution. It breaks somethings in the script logic: it start to rescedule the callback again and again eating all the memory.

The feature was implemented here: brave/brave-core#15309

@atuchin-m
Copy link
Contributor

The steps to verify:

Option 1.

  1. https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php in tab 1
  2. switch the tab, tab1 should be inactive during all steps
  3. Wait 30 sec.
  4. Measure the tab memory using the build-in task manager
  5. Wait 10 minutes
  6. Measure the memory again
    Expected result: <= 100 MB memory usage
    Actual result: > 100MB memory usage, the usage is slowly increasing.

Option 2

  1. Play music on spotify.com
  2. Switch the tab, putting it to the background
  3. Wait 20 minutes.
  4. Measure the tab memory using the build-in task manager

Expected result: <= 300 MB memory usage
Actual result: > 500MB memory usage, the usage is slowly increasing.

@atuchin-m
Copy link
Contributor

I'm going to disable the feature in the code too: brave/brave-core#25877
The issue to find a way how to resolve the conflict and re-enable the feature: #41472

cc @ShivanKaul @arthuredelstein

@kjozwiak
Copy link
Member

kjozwiak commented Oct 8, 2024

We pushed brave/brave-variations#1215 via Griffin for the time being which resolves the above issue. Both Nightly & BETA were set as 100% and Release set to 5% for the time being. If everything looks stable in the next few days, we'll increase Release to ~25% or higher.

As per #40886 (comment), we'll also push this change via b-c so a Griffin study isn't needed.

@atuchin-m
Copy link
Contributor

QA/No because disabling the feature was tested here: brave/brave-variations#1215

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash OS/Desktop priority/P2 A bad problem. We might uplift this to the next planned release. QA/No release-notes/exclude webcompat/not-shields-related Sites are breaking because of something other than Shields.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants