Question on throttling: is there a non-lossy option for stretching a stream versus dropping events #21879

johnhtodd · 2024-11-25T17:36:05Z

johnhtodd
Nov 25, 2024

Question on a variant of "throttle": We have a need to spread out a number of events transmitted over time, but we do not want to lose events.

Why? We have many devices at the very edge of reasonable connectivity - a few mbps, at best. We must share that capacity with our other services. If we send giant blocks of data from Vector (which is the end result of aggregation processes) then we clog the network for a few minutes as we try to send that through, leading to failures on other services. These are "pulses" of data which are only for a minute or so every hour or two. The goal is to flatten the large event/bandwidth spikes into a low, long duration stream of events being transmitted.

Our goal is to stretch out the transmission, but not lose any events. The "throttle" transform loses events from how I understand it. That is not desirable.

Is adding this functionality simply a modification to be made to "throttle" so that it moves events that do not fit in a "bucket" to the next "bucket", instead of being dropped? Eventually, a maximum number of buffered events to be shifted needs to be specified so it does not become an infinitely-growing queue, but it seems that a non-lossy version of the throttle function may be useful.

Or can "throttle" do this already? The documentation is not particularly clear.

We are using kafka, which has no "request.concurrency". The receiver will accept messages as fast as they're sent - the backpressure does not exist from the remote side until we are using 100% of the bandwidth, at which point the damage is being done. There needs to be an artificial maximum applied within Vector.

I thought maybe the sink was the right place for this, but after more consideration it seems that a generic method would make more sense as an addition to "throttle" since that sort-of does most of what is desired already, except for the fact that throttle discards messages instead of delaying them. The challenge with doing this in "throttle" is that the batching on the sink may short-circuit this concept, and perhaps the sink is the only place it can be done while still getting the benefits of compression from larger batches (the larger the batch, the better the compression.)

We could consider writing a patch for this either to extend "throttle" or to modify the kafka sink depending on which direction it makes sense to pursue.

Answered by johnhtodd

Nov 27, 2024

@pront Thanks for the answers. So it looks like for my particular needs (kafka and probably protobuf) that there would have to be some work done in the sink code for both of those to incorporate an artificial smoothing/rate limiting mechanism to allow for buffering and (eventually) backpressure that would not result in lost messages. I'll throw this on the code-to-be-written pyre.

View full answer

pront · 2024-11-25T17:47:25Z

pront
Nov 25, 2024
Maintainer

Hi @johnhtodd, I took a quick look at the implementation and as it stands, it can only discard events. It should be easy to add another droppped port like in https://github.com/vectordotdev/vector/blob/master/src/transforms/remap.rs#L371-L376 to allow other components to consume those events. That should immediately address the first concern i.e. dropped events.

Your use case though doesn't sound as simple as consuming the dropped events, maybe you would like to chain throttle transforms to emulate multiple buckets. But let me know if I misunderstood what you need here.

2 replies

johnhtodd Nov 25, 2024
Author

Hi @johnhtodd, I took a quick look at the implementation and as it stands, it can only discard events. It should be easy to add another droppped port like in https://github.com/vectordotdev/vector/blob/master/src/transforms/remap.rs#L371-L376 to allow other components to consume those events. That should immediately address the first concern i.e. dropped events.

Your use case though doesn't sound as simple as consuming the dropped events, maybe you would like to chain throttle transforms to emulate multiple buckets. But let me know if I misunderstood what you need here.

I think ingesting the dropped events and re-submitting them into the queue would cause some unusual behaviors in that kind of loop, especially with low-volume transmissions. (I can imagine how this would end up with "convulsions" as most events would be dropped and then re-submitted... amplifying terribleness.)

Yes, the concept would be to chain throttle transforms smoothly. The more I think about it though the more I think this needs to be sink-specific because of the batching problem. This speaks to a more generic sink configuration option which would be similar to the "max_bandwidth" that S3 has (but does it work?) or possibly creating a new syntax on each sink which would be "max_events_per_second". An artificial ARC backpressure knob, but available not just on HTTP sources is I think what is this starts to look like.

johnhtodd Nov 25, 2024
Author

clarification: when I say "batching problem" I mean that batching is done at the sink, and batching implies that the sink collects a lot of events, compresses them, and then sends them out as fast as possible. I don't want to have the throttling applied in VRL, and then have the sink transmit them at a low rate with poor compression (only a few at a time in the compression symbol set) so this speaks to the sink being the place where the rate limiting should be done, post-compression.

johnhtodd · 2024-11-25T17:47:57Z

johnhtodd
Nov 25, 2024
Author

I see a similar question here but no answer that seems to be workable: #19896

0 replies

johnhtodd · 2024-11-25T18:19:16Z

johnhtodd
Nov 25, 2024
Author

For that matter, a bit of a meta-question which I'm sure has been answered before but I see request.rate_limit_num as a thing in many other sinks - why is it not universally applicable?

1 reply

pront Nov 27, 2024
Maintainer

Ah yes, this is part of the TowerRequestConfig. Some sinks support this config and use it in their implementation.

why is it not universally applicable?

Good question.

All tower based services can benefit from a TowerRequestConfig. But then again, not all sinks rely on the tower middleware. E.g. it would not be applicable to a kafka service. The headers of course would only apply to HTTP based APIs.

johnhtodd · 2024-11-27T20:21:06Z

johnhtodd
Nov 27, 2024
Author

@pront Thanks for the answers. So it looks like for my particular needs (kafka and probably protobuf) that there would have to be some work done in the sink code for both of those to incorporate an artificial smoothing/rate limiting mechanism to allow for buffering and (eventually) backpressure that would not result in lost messages. I'll throw this on the code-to-be-written pyre.

1 reply

pront Nov 27, 2024
Maintainer

Good summary. I can imagine at least a few options for this generic mechanism. It does sound generically beneficial but also a bit complex. Might need an RFC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on throttling: is there a non-lossy option for stretching a stream versus dropping events #21879

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Question on throttling: is there a non-lossy option for stretching a stream versus dropping events #21879

johnhtodd Nov 25, 2024

Replies: 4 comments · 4 replies

pront Nov 25, 2024 Maintainer

johnhtodd Nov 25, 2024 Author

johnhtodd Nov 25, 2024 Author

johnhtodd Nov 25, 2024 Author

johnhtodd Nov 25, 2024 Author

pront Nov 27, 2024 Maintainer

johnhtodd Nov 27, 2024 Author

pront Nov 27, 2024 Maintainer

johnhtodd
Nov 25, 2024

Replies: 4 comments 4 replies

pront
Nov 25, 2024
Maintainer

johnhtodd Nov 25, 2024
Author

johnhtodd Nov 25, 2024
Author

johnhtodd
Nov 25, 2024
Author

johnhtodd
Nov 25, 2024
Author

pront Nov 27, 2024
Maintainer

johnhtodd
Nov 27, 2024
Author

pront Nov 27, 2024
Maintainer