Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC: Distributed error reporting #12489

Open
wants to merge 84 commits into
base: develop
Choose a base branch
from
Open

Conversation

akolson
Copy link
Member

@akolson akolson commented Jul 25, 2024

Summary

This pr implements the distributed error reporting feature as part of the Google Summer of Code(GSoC) 2024 program. See the epic for details.

References

Closes #12214

Reviewer guidance

All tests associated with the implementation must run successfully


Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • Automated test coverage is satisfactory
  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

thesujai and others added 30 commits June 6, 2024 01:27
Distributed error reporting: Setting up the database that stores all the errors
…ask2

Distributed error reporting: Create model to store captured Errors
…ask3

Distributed error reporting: Middleware to catch runtime exception in backend
…ask4

Distributed error reporting: Endpoint /api/errorreports/report to store frontend error
@akolson akolson changed the title Distributed error reporting GSoC: Distributed error reporting Jul 29, 2024
akolson and others added 10 commits August 10, 2024 02:17
remove installation_type and release_version\n move request_time_to_error to context\n remove sensitive info from the requests info\n only use traceback and error_msg to fingerprint an error_report
DER: Move request_time_to_error to context and remove sensitive info from the requests info
Copy link
Member

@bjester bjester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main blockers: we should add more logic to remove parameters like passwords from request data, and we should have request timeouts configured on the report requests

},
};

Resource.client = jest.fn();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, whenever you mock something in Python or JS, you want to ensure that the original implementation can be restored after the test is completed. That approach can keep tests from interfering with other tests, because of their use of mocks.

Since this is a direct replace of Resource.client, there isn't a way for it to be restored. So it would be better to use mock.spyOn or mock.replaceProperty here, and do so in the beforeEach. Then instead of clearAllMocks in afterEach (which only clears the mock state), I would suggest using restoreAllMocks as that would ensure any mocks are restored to what they should be (assuming the appropriate approach was used to create the mock in the first place).

height: window.screen.height,
available_width: window.screen.availWidth,
available_height: window.screen.availHeight,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was discussion about using the screen size breakpoints instead of the actual width and height. Is that the case, because it doesn't look like it? The reason is that it protects privacy. Specific sizes can be used to identify users, which reduces the anonymity of the data

request_headers.pop("Cookie", None)

request_get = dict(request.GET)
request_get.pop("token", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, probably for POST, we should ensure passwords are not sent?

error_report.context = context

error_report.save()
logger.error("ErrorReports: Database updated.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels more like info-type logging?

ping_once(started, server=server)
pingback_id = ping_once(started, server=server)
if pingback_id:
ping_error_reports.enqueue(args=(server, pingback_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is creating two different pathways that hinges on the pingback_id. In utils.py, there already exists logic dependent on if "id" in data:, which is the same condition here. It seems like this fits alongside the existing logic there.

Comment on lines 81 to 95
Vue.config.errorHandler = function (err, vm) {
logging.error(`Unexpected Error: ${err}`);
const error = new VueErrorReport(err, vm);
ErrorReportResource.report(error);
};

window.addEventListener('error', e => {
logging.error(`Unexpected Error: ${e.error}`);
const error = new JavascriptErrorReport(e);
ErrorReportResource.report(error);
});

window.addEventListener('unhandledrejection', event => {
event.preventDefault();
logging.error(`Unhandled Rejection: ${event.reason}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the unhandledrejection listener will prevent default logging of the error, so in regards to that and the other logging statements, I'm concerned whether these are suppressing necessary log information, i.e. a stack trace, that developers would need? If logging.error outputs a stack trace, that may not be the same trace as the error itself.

join_url(server, "/api/v1/errors/report/"),
data=errors_json,
headers={"Content-Type": "application/json"},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is using raw python requests, lets ensure this has explicit timeouts configured, and ideally separate timeouts for connection vs request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEV: backend Python, databases, networking, filesystem... DEV: frontend gsoc A GSoC project task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distributed error reporting
5 participants