Fix bug that allows check_rate to succeed 2x in sequence even if the bucket limit is 1 attempt per hour #99

rschef · 2024-06-06T12:25:41Z

Problem

When check_rate is called for the first time, it's expected that ms_to_next_bucket would be qual to scale_ms, but that's not the case at the moment due to:

Utils.stamp_key/2 is currently creating a variable bucket key for the same ID and scale_ms. The bucket key changes every scale_ms milliseconds.
Redis/ETS backends are not expiring the bucket key after scale_ms, instead they rely that a new key is provided (1)

This bug allows check_rate to be called 2x in sequence even if the bucket limit is 1 attempt per hour. Both calls could receive :allow if timed when the bucket is about to expire (see how to reproduce the bug below).

How to reproduce the bug

# Install hammer

Application.put_env(
  :hammer,
  :backend,
  {Hammer.Backend.ETS,
   [
     ets_table_name: :hammer_backend_ets_buckets,
     expiry_ms: :timer.hours(24),
     cleanup_interval_ms: :timer.hours(24)
   ]}
)

Mix.install([:hammer])

# Reproduce the bug

bucket_key = "bucket_key:#{:rand.uniform()}"
scale_ms = :timer.seconds(20)
max_attempts = 1

ms_about_to_expire = 100

wait_for_timing_fn = fn ->
  {:ok, {0, 1, ms_to_next_bucket, nil, nil}} =
    Hammer.inspect_bucket(bucket_key, scale_ms, max_attempts)
 
  if ms_to_next_bucket >= ms_about_to_expire do
    :timer.sleep(ms_to_next_bucket - ms_about_to_expire)
  end

  :ready
end

:ready = wait_for_timing_fn.()

{:allow, 1} = Hammer.check_rate(bucket_key, scale_ms, max_attempts)

:timer.sleep(ms_about_to_expire)

{:allow, 1} = Hammer.check_rate(bucket_key, scale_ms, max_attempts)

Proposed solution

This PR fixes this issue by:

Returning a fixed key on Utils.stamp_key/2 for the same bucket ID and scale_ms
Requiring the Redis/ETS backends to expire the bucket after scale_ms. I will open a PR on hammer-backend-redis if this PR is accepted: Expire bucket after scale_ms nash-io/hammer-backend-redis#1

ruslandoga · 2024-11-16T07:28:23Z

👋 @rschef

The proposed approach would flood the message queue for hot keys. The problem you are describing is an inherent limitation of the fixed window counter. There are alternative fixes that don't require sending messages. The easiest is probably what's called a bursty counter which can be implemented with the current API by making two checks, one for the original interval with the total limit, and one for a shorter interval with a fraction of the limit. That would smooth out the number of requests allowed.

Please see #103 for more information.

rschef · 2024-11-19T14:17:56Z

Hi @ruslandoga,

you are right for the ETS implementation and I'm fine with an alternative solution for it.

I'm more interested about the Redis implementation and there we can use the EXPIRE command to expire the keys.

ruslandoga · 2024-11-19T14:32:36Z

Would you be able to write the commands sent to Redis with your approach?

ExHammer/hammer-backend-redis#56 contains (probably) the next version of the Redis backend, so it would be best if you could adapt that version.

My understanding right now is that you would do something like this:

  @doc false
  def hit(name, prefix, key, scale, limit, increment, timeout) do
    now = System.system_time(:millisecond)
    full_key = "#{prefix}:#{key}"

    commands = [
      ["INCRBY", full_key, increment],
      ["EXPIRE", full_key, div(scale, 1000), "NX"]
    ]

    [count, _] =
      Redix.pipeline!(name, commands, timeout: timeout)

    if count <= limit do
      {:allow, count}
    else
      # TODO
      retry_after = -1 # is there a way to find out when the active EXPIRE will run?
      {:deny, retry_after}
    end
  end

Is that right?

Fix ms_to_next_bucket

e7afaaa

ruslandoga mentioned this pull request Nov 14, 2024

Inconsistent Bucket Duration in Hammer for Periodic Rate Limits #103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug that allows check_rate to succeed 2x in sequence even if the bucket limit is 1 attempt per hour #99

Fix bug that allows check_rate to succeed 2x in sequence even if the bucket limit is 1 attempt per hour #99

rschef commented Jun 6, 2024 •

edited

Loading

ruslandoga commented Nov 16, 2024 •

edited

Loading

rschef commented Nov 19, 2024

ruslandoga commented Nov 19, 2024 •

edited

Loading

Fix bug that allows check_rate to succeed 2x in sequence even if the bucket limit is 1 attempt per hour #99

Are you sure you want to change the base?

Fix bug that allows check_rate to succeed 2x in sequence even if the bucket limit is 1 attempt per hour #99

Conversation

rschef commented Jun 6, 2024 • edited Loading

Problem

How to reproduce the bug

Proposed solution

ruslandoga commented Nov 16, 2024 • edited Loading

rschef commented Nov 19, 2024

ruslandoga commented Nov 19, 2024 • edited Loading

rschef commented Jun 6, 2024 •

edited

Loading

ruslandoga commented Nov 16, 2024 •

edited

Loading

ruslandoga commented Nov 19, 2024 •

edited

Loading