-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Network Error Logging #2421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do nel reports respect rate limits for errors now? Can we test that? See https://github.com/getsentry/relay/pull/2421/files#r1360800171.
Also, this convo is still open, but feel free to ignore if you think it's not worth it.
py/CHANGELOG.md
Outdated
@@ -3,6 +3,7 @@ | |||
## Unreleased | |||
|
|||
- Add `scraping_attempts` field to the event schema. ([#2575](https://github.com/getsentry/relay/pull/2575)) | |||
- Added support for NEL (Network Error Logging) reports ingestion. ([#2421](https://github.com/getsentry/relay/pull/2421)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
- Added support for NEL (Network Error Logging) reports ingestion. ([#2421](https://github.com/getsentry/relay/pull/2421)) | |
- Add context for NEL (Network Error Logging) reports to the event schema. ([#2421](https://github.com/getsentry/relay/pull/2421)) |
/// NOTE: This is the structure used inside the Event (serialization is based on Annotated | ||
/// infrastructure). We also use a version of this structure to deserialize from raw JSON | ||
/// via serde. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice catch, removed .
where | ||
C: Fn(usize) -> bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed that the condition for returning None
is different for the two functions., this makes the helper function less helpful than I thought. Does the condition have to be different though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the condition are different and must stay this way
@jjbayer yeah. I will still take time and look into adding a test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good stuff 🚀
relay-server/src/utils/mod.rs
Outdated
@@ -27,6 +28,7 @@ pub use self::metrics_rate_limits::*; | |||
pub use self::multipart::*; | |||
#[cfg(feature = "processing")] | |||
pub use self::native::*; | |||
pub use self::nel::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just export the items we need?
@@ -19,6 +20,7 @@ pub use cloud_resource::*; | |||
pub use device::*; | |||
pub use gpu::*; | |||
pub use monitor::*; | |||
pub use nel::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here wrt star exports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this looks great and is in shippable state.
See a number of smaller comments attached, and feel free to merge without another review from me.
/// If request failed, the type of its network error. If request succeeded, "ok". | ||
pub error_type: Annotated<String>, | ||
/// Server IP where the requests was sent to. | ||
#[metastructure(pii = "true")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should we rather mark that as
pii = "maybe"
? If that is the target server IP, it is still sensitive information but does not identify users. - Could we use the
IpAddr
type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/// The time between the start of the resource fetch and when it was completed or aborted. | ||
pub elapsed_time: Annotated<u64>, | ||
/// If request failed, the phase of its network error. If request succeeded, "application". | ||
pub phase: Annotated<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we introduce an enumeration for the known phases, with an Other(String)
variant as fallback? According to the specification, the known phases are dns
, connection
, and application
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/// Server IP where the requests was sent to. | ||
#[metastructure(pii = "true")] | ||
pub server_ip: Annotated<String>, | ||
/// The time between the start of the resource fetch and when it was completed or aborted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please document, which unit this time is specified in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
//! | ||
//! NEL is a browser feature that allows reporting of failed network requests from the client side. | ||
//! W3C Editor's Draft: <https://w3c.github.io/network-error-logging/> | ||
//! MDN: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Network_Error_Logging> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider linking to the NEL context type for these docs, that way they don't have to be maintained in two places when we add more information to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
#[error("unexpected format")] | ||
InvalidNel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the InvalidNel
error variant is not instantiated anywhere, and the only other variant covers deserialization. Could we simplify this by calling Annotated::from_json_bytes(value).map_err(ProcessingError::InvalidJson)?
directly in event_from_nel_item
like the other deserializing functions do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -444,6 +444,9 @@ pub struct Request { | |||
/// HTTP request method. | |||
pub method: Annotated<String>, | |||
|
|||
/// HTTP protocol. | |||
pub protocol: Annotated<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we introduce an enum for the well-known protocols? Particularly, it would be nice to normalize the casing of the protocols, so that both https
and HTTPS
can be sent in, but they are always stored in the same notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find the list of the protocols to be used here. The doc is not really clear on this.
I can postpone this to later PRs
return Ok(StatusCode::UNSUPPORTED_MEDIA_TYPE.into_response()); | ||
} | ||
|
||
let items: Vec<&RawValue> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, allocating a vector here should be fine. However, since there's currently no limit placed on the number of NEL reports in an envelope - neither in the endpoint nor where it gets sent into validation (common.rs
) -- let's please be sure we have enough visibility into this endpoint to add more safety measures in the future.
If this turns out to be a problem, we can easily make this a streaming deserializer.
relay-server/src/utils/nel.rs
Outdated
use relay_protocol::Annotated; | ||
|
||
/// Enriches the event with new values using the provided [`NetworkReportRaw`]. | ||
pub fn enrich_nel_event(event: &mut Event, nel: Annotated<NetworkReportRaw>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this function be moved into relay-normalize
?
Either way works, as there's currently no clear criteria for where normalization is called and where the code is located. Particularly, this is not part of the current "store normalizer" but rather called during deserialization like for security reports.
However, in an effort to reduce the amount of schema-specific business logic that resides in relay-server
, a better place for these sorts of functions would be the normalize crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can move it to normalize. It was initially there, but then was moved back to the server utils.
relay-server/src/utils/nel.rs
Outdated
} | ||
|
||
// Set the timestamp on the event when it actually occurred. | ||
let now: DateTime<Utc> = Utc::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the receive timestamp from the envelope instead of Utc::now()
? Relay might have delayed processing the item for various reasons, but the receive timestamp is guaranteed to be as accurate as possible.
Co-authored-by: Jan Michael Auer <mail@jauer.org>
Sentry backend part of the NEL ([Network Error Logging](https://developer.mozilla.org/en-US/docs/Web/HTTP/Network_Error_Logging)) implementation. * Added a new `nel` event type and interface * Special logic for metadata and culprit generation To be deployed after merging Relay part (getsentry/relay#2421) and releasing `sentry-relay` python package. --------- Co-authored-by: mdtro <matthew.trostel@sentry.io> Co-authored-by: Joris Bayer <joris.bayer@sentry.io>
The first part of the NEL (Network Error Logging) implementation.
Nel
EventTypeNel
protocol with minimal validation (i.e. only specific fields are captured)/api/:project_id/nel/
endpoint for NEL reports ingestionuser.ip_address
field is set to the IP address of the request (real user's IP address)User-Agent
headerRelated PRs:
getsentry/sentry#55135
getsentry/sentry-kafka-schemas#177