-
Question on a variant of "throttle": We have a need to spread out a number of events transmitted over time, but we do not want to lose events. Why? We have many devices at the very edge of reasonable connectivity - a few mbps, at best. We must share that capacity with our other services. If we send giant blocks of data from Vector (which is the end result of aggregation processes) then we clog the network for a few minutes as we try to send that through, leading to failures on other services. These are "pulses" of data which are only for a minute or so every hour or two. The goal is to flatten the large event/bandwidth spikes into a low, long duration stream of events being transmitted. Our goal is to stretch out the transmission, but not lose any events. The "throttle" transform loses events from how I understand it. That is not desirable. Is adding this functionality simply a modification to be made to "throttle" so that it moves events that do not fit in a "bucket" to the next "bucket", instead of being dropped? Eventually, a maximum number of buffered events to be shifted needs to be specified so it does not become an infinitely-growing queue, but it seems that a non-lossy version of the throttle function may be useful. Or can "throttle" do this already? The documentation is not particularly clear. We are using kafka, which has no "request.concurrency". The receiver will accept messages as fast as they're sent - the backpressure does not exist from the remote side until we are using 100% of the bandwidth, at which point the damage is being done. There needs to be an artificial maximum applied within Vector. I thought maybe the sink was the right place for this, but after more consideration it seems that a generic method would make more sense as an addition to "throttle" since that sort-of does most of what is desired already, except for the fact that throttle discards messages instead of delaying them. The challenge with doing this in "throttle" is that the batching on the sink may short-circuit this concept, and perhaps the sink is the only place it can be done while still getting the benefits of compression from larger batches (the larger the batch, the better the compression.) We could consider writing a patch for this either to extend "throttle" or to modify the kafka sink depending on which direction it makes sense to pursue. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
Hi @johnhtodd, I took a quick look at the implementation and as it stands, it can only discard events. It should be easy to add another Your use case though doesn't sound as simple as consuming the dropped events, maybe you would like to chain throttle transforms to emulate multiple buckets. But let me know if I misunderstood what you need here. |
Beta Was this translation helpful? Give feedback.
-
I see a similar question here but no answer that seems to be workable: #19896 |
Beta Was this translation helpful? Give feedback.
-
For that matter, a bit of a meta-question which I'm sure has been answered before but I see request.rate_limit_num as a thing in many other sinks - why is it not universally applicable? |
Beta Was this translation helpful? Give feedback.
-
@pront Thanks for the answers. So it looks like for my particular needs (kafka and probably protobuf) that there would have to be some work done in the sink code for both of those to incorporate an artificial smoothing/rate limiting mechanism to allow for buffering and (eventually) backpressure that would not result in lost messages. I'll throw this on the code-to-be-written pyre. |
Beta Was this translation helpful? Give feedback.
@pront Thanks for the answers. So it looks like for my particular needs (kafka and probably protobuf) that there would have to be some work done in the sink code for both of those to incorporate an artificial smoothing/rate limiting mechanism to allow for buffering and (eventually) backpressure that would not result in lost messages. I'll throw this on the code-to-be-written pyre.