Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better stacktrace data for non-file routes behind login wall #8656

Closed
cstavitsky opened this issue Jul 25, 2023 · 8 comments · Fixed by #8670 or #8699
Closed

Better stacktrace data for non-file routes behind login wall #8656

cstavitsky opened this issue Jul 25, 2023 · 8 comments · Fixed by #8670 or #8699
Assignees
Labels
Package: browser Issues related to the Sentry Browser SDK Sync: Jira apply to auto-create a Jira shadow ticket

Comments

@cstavitsky
Copy link

cstavitsky commented Jul 25, 2023

Submitting on behalf of a customer (see linked Jira ticket for customer-specific details):

Problem Statement

In some cases of the same issue, stack traces have rich context, but in other cases, there is no context. The pages in question involve HAML

Example A (already behaves how we want)

URL: http://some-client.myexamplesite.com/admin:

Screenshot 2023-07-25 at 11 12 11 AM

When we visit the actual URL and look at the page source, we can see that line 289 is present and matches the source context shown in Sentry:

Screenshot 2023-07-26 at 9 16 37 AM

Great -- this is what we want.

Example B (want different behavior)

URL: http://another-client.myexamplesite.com/admin:

Screenshot 2023-07-25 at 11 11 32 AM

When we visit the actual URL and look at the page source, we see that line 407, where the error occurred, does not exist:

Screenshot 2023-07-26 at 9 16 48 AM

The desired behavior is to have similar detailed source context even for errors behind login walls.

Why this happens

I and colleague @Prithvirajkumar spoke internally to @kamilogorek on Slack, who mentioned that the problem with the missing stack trace is that the captured error is pointing to a route, in the case of the example it’s /admin.

  • The problem is that the captured error is pointing to a route, in this case it's /admin.
  • There’s no such artifact, as it’s not a file, so we reach out to scrape the page.
  • We do so successfully for pages that have no login wall (if we view the page source in our browser without logging in, we see the line where the error was triggered)
  • Whereas, for events with no context, it’s getting a “login” page served, effectively not being able to resolve it (if we visit the page source in our browser without logging in, the line number where the error occurred does not exist, indicating that it took place behind a login wall)

Solution Brainstorm

Our customer writes:

Because our haml sections are legacy code they're also more prone to errors, so being able to pinpoint the code is pretty important for us. Unfortunately the majority of our important errors happen while users are logged in.
Because Bugsnag does provide context for these same errors, it looks like there's room for improvement. They must be capturing that context as the event happens and sending it along with the initial payload. I'm going to look at doing the same and capture/send it as metadata, but that won't show up as a stack trace. Can I make a feature request for parity with Bugsnag here?

I've confirmed that Bugsnag sends this helpful data with the event. Here's a screenshot for reference

Screenshot 2023-07-24 at 11 14 32 AM

The customer also provided this screenshot from Bugsnag:

Screenshot 2023-07-25 at 3 39 09 PM

The ideal situation would be to have stack trace information shown even for errors occurring behind login-walled routes, such as /admin in the example.

This could require capturing and including stacktrace data at the time of the crash rather than scraping the page and encountering the login wall.

Product Area

Issues

┆Issue is synchronized with this Jira Improvement by Unito

@cstavitsky cstavitsky added the Sync: Jira apply to auto-create a Jira shadow ticket label Jul 25, 2023
@getsantry
Copy link

getsantry bot commented Jul 25, 2023

Routing to @getsentry/product-owners-issues for triage ⏲️

@tavisatjane
Copy link

I've developed a work around that leverages the error-stack-parser and stack-generator packages to create a stack trace at error time. I then attach it to event.extras in beforeSend
Screenshot 2023-07-26 at 12 52 59 PM

@roggenkemper roggenkemper transferred this issue from getsentry/sentry Jul 26, 2023
@Lms24
Copy link
Member

Lms24 commented Jul 27, 2023

I'm going to write down some thoughts here after reading the issue and doing some research. Feel free to correct whatever is wrong but to me the following things are important:

  • I'm assuming it's not possible to match the code with a source map artifact because it's directly embedded in the html file.
  • To at least get the original JS code, we try to scrape the file during symbolication which we fail to do for the pages behind a log in wall. I don't think there's much we can do here.
  • The request is to collect context lines in the Browser SDK instead.
    • We do this for node with the ContextLines integration but not for browser because we generally can't access the JS files within the browser.
    • However, we can read the browser html and extract context lines from it, in case the error happens in html-embedded JS
    • For reference, bugsnag is doing this
    • If we do this, we would probably do it in the form of a new integration
    • I'm concerned about the bundle size increase though so we need to discuss if this should be a default integration or an opt-in one.

@AbhiPrasad
Copy link
Member

I'd do this with an opt-in integration, since not all applications have this setup. For those who do, they can pay the extra bundle size cost.

@Lms24
Copy link
Member

Lms24 commented Jul 31, 2023

@tavisatjane I opened #8670 to collect the source context in browser. Feel free to check out the PR. We'll release this soon in version 7.61.0. You'll need to add it as an integration form @sentry/integrations to your Sentry.init setup:

import { ContextLines } from '@sentry/integrations';

Sentry.init({
  // rest of your config
  integrations: [new ContextLines(), /* other integrations */]
});

@Lms24
Copy link
Member

Lms24 commented Jul 31, 2023

Reopening this because we discovered a lot of limitations and problems with this intergation. We'll sync how to proceed here but to unblock our next release, I had to revert the PR. More context: #8680

@Lms24 Lms24 reopened this Jul 31, 2023
@ndmanvar
Copy link
Contributor

@smeubank

@Lms24
Copy link
Member

Lms24 commented Aug 1, 2023

I'll leave some notes in this reply around possible approaches towards getting reliable context lines and their limitations.

TL;DR: There's no reliable way to capture the raw HTML that was served to the browser and sometimes, that's not even desireable. This also applies to how Bugsnag are currently obtaining context lines and they're facing the same limitations.

Option 1 - Accessing HTML via browser APIs

Browsers provide certain APIs with which we can access processed versions of the page's HTML. For example, document.documentElement.(inner|outer)HTML or the new XMLSerializer().serializeToString(document).

What's important to understand with these APIs is that they will not return raw HTML that was served to the browser but a processed version that can significantly differ from the raw source. This means that the processed HTML's line numbers are very likely not in line with the stack frames' line numbers.

Simple example:
This raw html

<!DOCTYPE html>

<html>
  
  <body>
  
    <button id="inline-error-btn" onclick="throw new Error('Error with context lines')">Click me</button>
  </body>
</html>

is returned by document.documentElement.outerHTML (split by \n) as

[
"<html><head></head><body>",
"  ",
"    <button id=\"inline-error-btn\" onclick=\"throw new Error('Error with context lines')\">Click me</button>"
"  ",
"",
"</body></html>",
]

If this button was clicked in the browser, the stack frame's line number is 7. However, the index of the processed html lines would be 2 (i.e. line number 3). We can account for some of these modified lines by shifting the beginning of the returned array back a few elements. However, as soon as users or frameworks add arbitrary line breaks, for instance between <DOCTYPE> and <html> we can no longer automatically account for it because browsers omit these line breaks when returning processed HTML.

Browser differences:

  • Firefox/Gecko reports errors as in the example with lineno 1 in a separate script to the HTML. So there's no way to actually obtain the code from the HTML.
  • Webkit/Safari sometimes seems to throw a general "Script error" instead of the actual error message. We filter these errors out by default, as they proved to be very noisy in the past.

Option 2: Accessing HTML by re-fetching the page

While event processors shouldn't be async, we could also try to fetch the HTML page again with window.fetch and access the original page from there. This is basically what symbolicator is doing in the Sentry backend. However, while we might have more success than symbolicator in obtaining the HTML, there are still no guarantees that we can always obtain valid HTML:

  • For instance, the server could respond with a different version of HTML than at the initial page load
  • Authentication might still be limiting or changing the server's response depending on the authentication mechanism. Perhaps, CORS rules are configured in a way that the fetch request would be blocked.
  • Users or frameworks could make modifications to the HTML (i.e. add script blocks that throw errors, insert elements/nodes) that have effects on the stack frame <-> html line correlation. My initial testing has shown that inserted script tags are not influencing the reported line numbers but there are so many ways to do this and to modify the DOM that it's hard to imagine that this always stays consistent.

Decision

For the time being, our approach will be to rely on the browser's processed HTML and keep things simple and best-effort. We can revisit this; perhaps add option 2 and use option 1 as a fallback if the request fails. I just want to point out that there's not going to be a perfect solution to this and the limitations mentioned in this post and in #8670 will still apply. Furthermore, as already mentioned, bugsnag is also accessing processed HTML, so they're confronted with the same limitations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Package: browser Issues related to the Sentry Browser SDK Sync: Jira apply to auto-create a Jira shadow ticket
Projects
Archived in project
6 participants