-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x-pack/filebeat/input/http_endpoint: make input GA #39410
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
} | ||
|
||
// Ready signals that the batch has been fully consumed. Only | ||
// after the batch is marked as "ready" can the lumberjack batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lumberjack batch
@@ -124,7 +128,8 @@ func (h *handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
} | |||
} | |||
|
|||
if err = h.publishEvent(obj, headers); err != nil { | |||
acker.Add() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With end-to-end acknowledgements the goal is to be able to provide a delivery guarantee that the event(s) was written to the destination output. So I think this needs to relay the feedback to the caller that the event was written to the output.
I envision this meaning that a request blocks until the input receives all of the ACKs that it expects. This leads to a few follow-up behavioral questions:
- Introducing blocking behavior could be a viewed as a breaking change. Perhaps the waiting behavior should be opt-in and have a timeout like via ?wait_for_completion_timeout=30.
- Should there be any "circuit breakers" and how should the API respond when it overloaded (e.g. receiving more requests or more events than it can process in a timely manner)? We want to avoid a DoS situation if we have some high volume clients and an output that is slow or down. (This one is relevant even without E2E ack'ing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, but it means that we are keeping connections open while some (unrelated) internal process may block (I think that this is partially your point in 2.).
I think that this is a dangerous thing to do in the general case, but I also think that it would be reasonable to do as an opt-in that the user can choose in the configuration. This has the happy consequence that this would non-breaking. Note that I had not read 1. before I wrote this, so these are independent discoveries (honest). Looking at 1., I think that having a timeout would be sensible. So, suggest configuring based on the presence of a timeout; if present, then the request will block until the ACK or the timeout (and add a timeout failing metric), if absent do not block. This intentionally does not add a block without timeout (wait forever) since it does not make any sense in the real world.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me.
@@ -391,17 +402,21 @@ func newInputMetrics(id string) *inputMetrics { | |||
apiErrors: monitoring.NewUint(reg, "api_errors_total"), | |||
batchesReceived: monitoring.NewUint(reg, "batches_received_total"), | |||
batchesPublished: monitoring.NewUint(reg, "batches_published_total"), | |||
batchesACKedTotal: monitoring.NewUint(reg, "batches_acked_total"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to add the new metrics to the table in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that the lumberjack input does not have any docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment under Author's Notes at #32175 (comment). Whether or not to formally advertise it is something I still debate. Feedback welcomed. It is only used and exposed as part of the barracuda integration at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes perfect sense.
This pull request is now in conflicts. Could you fix it? 🙏
|
This pull request is now in conflicts. Could you fix it? 🙏
|
ef8bd51
to
6ec5edd
Compare
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
var k string | ||
for k = range q { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That confused me a little bit, the for
loop is there to easily extract the key from the map q
, is that correct?
Could you add a little comment to make it clear why using a for loop on a map of length 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought this was idiomatic, but I will add a comment.
Rationale for inclusion of break over iteration over all one keys is https://godbolt.org/z/z95q8K77f.
@@ -64,18 +69,40 @@ type handler struct { | |||
} | |||
|
|||
func (h *handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
txID := h.nextTxID() | |||
h.log.Debugw("request", "url", r.URL, "tx_id", txID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about debug logs in hot paths like that. If there is a high throughput of incoming requests and a user turns on debug logging, this can flood the logs.
It's not a blocker, I just wanted to raise my concern and let you decide what is best for this input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this.
This pull request is now in conflicts. Could you fix it? 🙏
|
The endpoint will enforce end-to-end ACK when a URL query parameter | ||
`wait_for_completion_timeout` with a duration is provided. For example | ||
`http://localhost:8080/?wait_for_completion_timeout=1m` will wait up | ||
to 1min for the event to be published to the cluster and then return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to 1min for the event to be published to the cluster and then return | |
to 1 minute for the event to be published to the cluster and then return |
`http://localhost:8080/?wait_for_completion_timeout=1m` will wait up | ||
to 1min for the event to be published to the cluster and then return | ||
the user-defined response message. In the case that the publication | ||
does not happen within the timeout duration, the HTTP response will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not happen within the timeout duration, the HTTP response will | |
does not complete within the timeout duration, the HTTP response will |
This pull request is now in conflicts. Could you fix it? 🙏
|
The filename coming from integrations may include a * which is intended to be replaced with the data stream ID. The code in place does not do this, so add it.
…ames This is a partial backport of elastic#39410 to fix the missing replacement of "*" with the input ID.
…ames This is a partial backport of elastic#39410 to fix the missing replacement of "*" with the input ID.
Proposed commit message
See title.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Author's Checklist
The timeout ACK behaviour is tested manually using
elastic-package
using the the version here injected into the stack.The first instance without a timeout returned immediately, the second took ~5s and the last ~2s. These are consistent with expected behaviour.
How to test this PR locally
Related issues
Use cases
Screenshots
Logs