Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Network Error Logging #2421

Merged
merged 69 commits into from
Nov 6, 2023
Merged

Implement Network Error Logging #2421

merged 69 commits into from
Nov 6, 2023

Conversation

oioki
Copy link
Member

@oioki oioki commented Aug 21, 2023

The first part of the NEL (Network Error Logging) implementation.

  • Adds a new Nel EventType
  • Adds a new Nel protocol with minimal validation (i.e. only specific fields are captured)
  • Adds a new /api/:project_id/nel/ endpoint for NEL reports ingestion
    • simple content type validation
    • splits payload into separate envelopes (a single HTTP NEL request could contain several independent reports)
  • user.ip_address field is set to the IP address of the request (real user's IP address)
  • An event is enriched with browser information derived from the request's User-Agent header

Related PRs:
getsentry/sentry#55135
getsentry/sentry-kafka-schemas#177

@oioki oioki requested review from olksdr and mdtro August 21, 2023 14:09
@jan-auer jan-auer changed the title Hackweek: NEL hackweek: NEL Aug 25, 2023
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do nel reports respect rate limits for errors now? Can we test that? See https://github.com/getsentry/relay/pull/2421/files#r1360800171.

Also, this convo is still open, but feel free to ignore if you think it's not worth it.

py/CHANGELOG.md Outdated
@@ -3,6 +3,7 @@
## Unreleased

- Add `scraping_attempts` field to the event schema. ([#2575](https://github.com/getsentry/relay/pull/2575))
- Added support for NEL (Network Error Logging) reports ingestion. ([#2421](https://github.com/getsentry/relay/pull/2421))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
- Added support for NEL (Network Error Logging) reports ingestion. ([#2421](https://github.com/getsentry/relay/pull/2421))
- Add context for NEL (Network Error Logging) reports to the event schema. ([#2421](https://github.com/getsentry/relay/pull/2421))

Comment on lines 53 to 55
/// NOTE: This is the structure used inside the Event (serialization is based on Annotated
/// infrastructure). We also use a version of this structure to deserialize from raw JSON
/// via serde.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice catch, removed .

where
C: Fn(usize) -> bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that the condition for returning None is different for the two functions., this makes the helper function less helpful than I thought. Does the condition have to be different though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the condition are different and must stay this way

@olksdr
Copy link
Contributor

olksdr commented Oct 17, 2023

LGTM. Do nel reports respect rate limits for errors now? Can we test that? See https://github.com/getsentry/relay/pull/2421/files#r1360800171.

Also, this convo is still open, but feel free to ignore if you think it's not worth it.

@jjbayer yeah. I will still take time and look into adding a test.

Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good stuff 🚀

@@ -27,6 +28,7 @@ pub use self::metrics_rate_limits::*;
pub use self::multipart::*;
#[cfg(feature = "processing")]
pub use self::native::*;
pub use self::nel::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just export the items we need?

@@ -19,6 +20,7 @@ pub use cloud_resource::*;
pub use device::*;
pub use gpu::*;
pub use monitor::*;
pub use nel::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here wrt star exports.

Copy link
Member

@jan-auer jan-auer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks great and is in shippable state.

See a number of smaller comments attached, and feel free to merge without another review from me.

relay-event-schema/src/protocol/contexts/nel.rs Outdated Show resolved Hide resolved
/// If request failed, the type of its network error. If request succeeded, "ok".
pub error_type: Annotated<String>,
/// Server IP where the requests was sent to.
#[metastructure(pii = "true")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should we rather mark that as pii = "maybe"? If that is the target server IP, it is still sensitive information but does not identify users.
  • Could we use the IpAddr type here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// The time between the start of the resource fetch and when it was completed or aborted.
pub elapsed_time: Annotated<u64>,
/// If request failed, the phase of its network error. If request succeeded, "application".
pub phase: Annotated<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we introduce an enumeration for the known phases, with an Other(String) variant as fallback? According to the specification, the known phases are dns, connection, and application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// Server IP where the requests was sent to.
#[metastructure(pii = "true")]
pub server_ip: Annotated<String>,
/// The time between the start of the resource fetch and when it was completed or aborted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please document, which unit this time is specified in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

//!
//! NEL is a browser feature that allows reporting of failed network requests from the client side.
//! W3C Editor's Draft: <https://w3c.github.io/network-error-logging/>
//! MDN: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Network_Error_Logging>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider linking to the NEL context type for these docs, that way they don't have to be maintained in two places when we add more information to them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 16 to 17
#[error("unexpected format")]
InvalidNel,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the InvalidNel error variant is not instantiated anywhere, and the only other variant covers deserialization. Could we simplify this by calling Annotated::from_json_bytes(value).map_err(ProcessingError::InvalidJson)? directly in event_from_nel_item like the other deserializing functions do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -444,6 +444,9 @@ pub struct Request {
/// HTTP request method.
pub method: Annotated<String>,

/// HTTP protocol.
pub protocol: Annotated<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we introduce an enum for the well-known protocols? Particularly, it would be nice to normalize the casing of the protocols, so that both https and HTTPS can be sent in, but they are always stored in the same notation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find the list of the protocols to be used here. The doc is not really clear on this.
I can postpone this to later PRs

return Ok(StatusCode::UNSUPPORTED_MEDIA_TYPE.into_response());
}

let items: Vec<&RawValue> =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, allocating a vector here should be fine. However, since there's currently no limit placed on the number of NEL reports in an envelope - neither in the endpoint nor where it gets sent into validation (common.rs) -- let's please be sure we have enough visibility into this endpoint to add more safety measures in the future.

If this turns out to be a problem, we can easily make this a streaming deserializer.

use relay_protocol::Annotated;

/// Enriches the event with new values using the provided [`NetworkReportRaw`].
pub fn enrich_nel_event(event: &mut Event, nel: Annotated<NetworkReportRaw>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this function be moved into relay-normalize?

Either way works, as there's currently no clear criteria for where normalization is called and where the code is located. Particularly, this is not part of the current "store normalizer" but rather called during deserialization like for security reports.

However, in an effort to reduce the amount of schema-specific business logic that resides in relay-server, a better place for these sorts of functions would be the normalize crate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can move it to normalize. It was initially there, but then was moved back to the server utils.

}

// Set the timestamp on the event when it actually occurred.
let now: DateTime<Utc> = Utc::now();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the receive timestamp from the envelope instead of Utc::now()? Relay might have delayed processing the item for various reasons, but the receive timestamp is guaranteed to be as accurate as possible.

@oioki oioki merged commit de5e0fa into master Nov 6, 2023
20 checks passed
@oioki oioki deleted the hackweek/nel branch November 6, 2023 12:02
oioki added a commit to getsentry/sentry that referenced this pull request Nov 9, 2023
Sentry backend part of the NEL ([Network Error
Logging](https://developer.mozilla.org/en-US/docs/Web/HTTP/Network_Error_Logging))
implementation.

* Added a new `nel` event type and interface
* Special logic for metadata and culprit generation

To be deployed after merging Relay part
(getsentry/relay#2421) and releasing
`sentry-relay` python package.

---------

Co-authored-by: mdtro <matthew.trostel@sentry.io>
Co-authored-by: Joris Bayer <joris.bayer@sentry.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants