Better stacktrace data for non-file routes behind login wall #8656

cstavitsky · 2023-07-25T22:19:12Z

Submitting on behalf of a customer (see linked Jira ticket for customer-specific details):

Problem Statement

In some cases of the same issue, stack traces have rich context, but in other cases, there is no context. The pages in question involve HAML

Example A (already behaves how we want)

URL: http://some-client.myexamplesite.com/admin:

When we visit the actual URL and look at the page source, we can see that line 289 is present and matches the source context shown in Sentry:

Great -- this is what we want.

Example B (want different behavior)

URL: http://another-client.myexamplesite.com/admin:

When we visit the actual URL and look at the page source, we see that line 407, where the error occurred, does not exist:

The desired behavior is to have similar detailed source context even for errors behind login walls.

Why this happens

I and colleague @Prithvirajkumar spoke internally to @kamilogorek on Slack, who mentioned that the problem with the missing stack trace is that the captured error is pointing to a route, in the case of the example it’s /admin.

The problem is that the captured error is pointing to a route, in this case it's /admin.
There’s no such artifact, as it’s not a file, so we reach out to scrape the page.
We do so successfully for pages that have no login wall (if we view the page source in our browser without logging in, we see the line where the error was triggered)
Whereas, for events with no context, it’s getting a “login” page served, effectively not being able to resolve it (if we visit the page source in our browser without logging in, the line number where the error occurred does not exist, indicating that it took place behind a login wall)

Solution Brainstorm

Our customer writes:

Because our haml sections are legacy code they're also more prone to errors, so being able to pinpoint the code is pretty important for us. Unfortunately the majority of our important errors happen while users are logged in.
Because Bugsnag does provide context for these same errors, it looks like there's room for improvement. They must be capturing that context as the event happens and sending it along with the initial payload. I'm going to look at doing the same and capture/send it as metadata, but that won't show up as a stack trace. Can I make a feature request for parity with Bugsnag here?

I've confirmed that Bugsnag sends this helpful data with the event. Here's a screenshot for reference

The customer also provided this screenshot from Bugsnag:

The ideal situation would be to have stack trace information shown even for errors occurring behind login-walled routes, such as /admin in the example.

This could require capturing and including stacktrace data at the time of the crash rather than scraping the page and encountering the login wall.

Product Area

Issues

┆Issue is synchronized with this Jira Improvement by Unito

The text was updated successfully, but these errors were encountered:

getsantry · 2023-07-25T22:24:56Z

Routing to @getsentry/product-owners-issues for triage ⏲️

tavisatjane · 2023-07-26T20:01:57Z

I've developed a work around that leverages the error-stack-parser and stack-generator packages to create a stack trace at error time. I then attach it to event.extras in beforeSend

Lms24 · 2023-07-27T08:01:37Z

I'm going to write down some thoughts here after reading the issue and doing some research. Feel free to correct whatever is wrong but to me the following things are important:

I'm assuming it's not possible to match the code with a source map artifact because it's directly embedded in the html file.
To at least get the original JS code, we try to scrape the file during symbolication which we fail to do for the pages behind a log in wall. I don't think there's much we can do here.
The request is to collect context lines in the Browser SDK instead.
- We do this for node with the ContextLines integration but not for browser because we generally can't access the JS files within the browser.
- However, we can read the browser html and extract context lines from it, in case the error happens in html-embedded JS
- For reference, bugsnag is doing this
- If we do this, we would probably do it in the form of a new integration
- I'm concerned about the bundle size increase though so we need to discuss if this should be a default integration or an opt-in one.

AbhiPrasad · 2023-07-27T14:23:42Z

I'd do this with an opt-in integration, since not all applications have this setup. For those who do, they can pay the extra bundle size cost.

Lms24 · 2023-07-31T09:49:54Z

@tavisatjane I opened #8670 to collect the source context in browser. Feel free to check out the PR. We'll release this soon in version 7.61.0. You'll need to add it as an integration form @sentry/integrations to your Sentry.init setup:

import { ContextLines } from '@sentry/integrations';

Sentry.init({
  // rest of your config
  integrations: [new ContextLines(), /* other integrations */]
});

Lms24 · 2023-07-31T13:08:19Z

Reopening this because we discovered a lot of limitations and problems with this intergation. We'll sync how to proceed here but to unblock our next release, I had to revert the PR. More context: #8680

ndmanvar · 2023-07-31T17:16:23Z

@smeubank

Lms24 · 2023-08-01T14:07:13Z

I'll leave some notes in this reply around possible approaches towards getting reliable context lines and their limitations.

TL;DR: There's no reliable way to capture the raw HTML that was served to the browser and sometimes, that's not even desireable. This also applies to how Bugsnag are currently obtaining context lines and they're facing the same limitations.

Option 1 - Accessing HTML via browser APIs

Browsers provide certain APIs with which we can access processed versions of the page's HTML. For example, document.documentElement.(inner|outer)HTML or the new XMLSerializer().serializeToString(document).

What's important to understand with these APIs is that they will not return raw HTML that was served to the browser but a processed version that can significantly differ from the raw source. This means that the processed HTML's line numbers are very likely not in line with the stack frames' line numbers.

Simple example:
This raw html

<!DOCTYPE html>

<html>
  
  <body>
  
    <button id="inline-error-btn" onclick="throw new Error('Error with context lines')">Click me</button>
  </body>
</html>

is returned by document.documentElement.outerHTML (split by \n) as

[
"<html><head></head><body>",
"  ",
"    <button id=\"inline-error-btn\" onclick=\"throw new Error('Error with context lines')\">Click me</button>"
"  ",
"",
"</body></html>",
]

If this button was clicked in the browser, the stack frame's line number is 7. However, the index of the processed html lines would be 2 (i.e. line number 3). We can account for some of these modified lines by shifting the beginning of the returned array back a few elements. However, as soon as users or frameworks add arbitrary line breaks, for instance between <DOCTYPE> and <html> we can no longer automatically account for it because browsers omit these line breaks when returning processed HTML.

Browser differences:

Firefox/Gecko reports errors as in the example with lineno 1 in a separate script to the HTML. So there's no way to actually obtain the code from the HTML.
Webkit/Safari sometimes seems to throw a general "Script error" instead of the actual error message. We filter these errors out by default, as they proved to be very noisy in the past.

Option 2: Accessing HTML by re-fetching the page

While event processors shouldn't be async, we could also try to fetch the HTML page again with window.fetch and access the original page from there. This is basically what symbolicator is doing in the Sentry backend. However, while we might have more success than symbolicator in obtaining the HTML, there are still no guarantees that we can always obtain valid HTML:

For instance, the server could respond with a different version of HTML than at the initial page load
Authentication might still be limiting or changing the server's response depending on the authentication mechanism. Perhaps, CORS rules are configured in a way that the fetch request would be blocked.
Users or frameworks could make modifications to the HTML (i.e. add script blocks that throw errors, insert elements/nodes) that have effects on the stack frame <-> html line correlation. My initial testing has shown that inserted script tags are not influencing the reported line numbers but there are so many ways to do this and to modify the DOM that it's hard to imagine that this always stays consistent.

Decision

For the time being, our approach will be to rely on the browser's processed HTML and keep things simple and best-effort. We can revisit this; perhaps add option 2 and use option 1 as a fallback if the request fails. I just want to point out that there's not going to be a perfect solution to this and the limitations mentioned in this post and in #8670 will still apply. Furthermore, as already mentioned, bugsnag is also accessing processed HTML, so they're confronted with the same limitations.

cstavitsky added the Sync: Jira apply to auto-create a Jira shadow ticket label Jul 25, 2023

getsantry bot added the Waiting for: Product Owner label Jul 26, 2023

roggenkemper removed the Waiting for: Product Owner label Jul 26, 2023

roggenkemper transferred this issue from getsentry/sentry Jul 26, 2023

Lms24 self-assigned this Jul 31, 2023

Lms24 mentioned this issue Jul 31, 2023

feat(browser): Add ContextLines integration for html-embedded JS stack frames #8670

Merged

Lms24 added Type: Feature Package: browser Issues related to the Sentry Browser SDK labels Jul 31, 2023

Lms24 mentioned this issue Jul 31, 2023

feat(js): Add documentation for new ContextLines integration getsentry/sentry-docs#7564

Merged

Lms24 closed this as completed in #8670 Jul 31, 2023

Lms24 mentioned this issue Jul 31, 2023

Revert "feat(browser): Add ContextLines integration for html-embedded JS stack frames (#8670)" #8680

Merged

Lms24 reopened this Jul 31, 2023

Lms24 mentioned this issue Aug 1, 2023

feat(integrations): Add ContextLines integration for html-embedded JS stack frames #8699

Merged

Lms24 closed this as completed in #8699 Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better stacktrace data for non-file routes behind login wall #8656

Better stacktrace data for non-file routes behind login wall #8656

cstavitsky commented Jul 25, 2023 •

edited by sync-by-unito bot

Loading

getsantry bot commented Jul 25, 2023

tavisatjane commented Jul 26, 2023

Lms24 commented Jul 27, 2023 •

edited

Loading

AbhiPrasad commented Jul 27, 2023

Lms24 commented Jul 31, 2023 •

edited

Loading

Lms24 commented Jul 31, 2023 •

edited

Loading

ndmanvar commented Jul 31, 2023

Lms24 commented Aug 1, 2023 •

edited

Loading

Better stacktrace data for non-file routes behind login wall #8656

Better stacktrace data for non-file routes behind login wall #8656

Comments

cstavitsky commented Jul 25, 2023 • edited by sync-by-unito bot Loading

Problem Statement

Example A (already behaves how we want)

Example B (want different behavior)

Why this happens

Solution Brainstorm

Product Area

getsantry bot commented Jul 25, 2023

tavisatjane commented Jul 26, 2023

Lms24 commented Jul 27, 2023 • edited Loading

AbhiPrasad commented Jul 27, 2023

Lms24 commented Jul 31, 2023 • edited Loading

Lms24 commented Jul 31, 2023 • edited Loading

ndmanvar commented Jul 31, 2023

Lms24 commented Aug 1, 2023 • edited Loading

Option 1 - Accessing HTML via browser APIs

Browser differences:

Option 2: Accessing HTML by re-fetching the page

Decision

cstavitsky commented Jul 25, 2023 •

edited by sync-by-unito bot

Loading

Lms24 commented Jul 27, 2023 •

edited

Loading

Lms24 commented Jul 31, 2023 •

edited

Loading

Lms24 commented Jul 31, 2023 •

edited

Loading

Lms24 commented Aug 1, 2023 •

edited

Loading