Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug that allows check_rate to succeed 2x in sequence even if the bucket limit is 1 attempt per hour #99

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rschef
Copy link

@rschef rschef commented Jun 6, 2024

Problem

When check_rate is called for the first time, it's expected that ms_to_next_bucket would be qual to scale_ms, but that's not the case at the moment due to:

  1. Utils.stamp_key/2 is currently creating a variable bucket key for the same ID and scale_ms. The bucket key changes every scale_ms milliseconds.
  2. Redis/ETS backends are not expiring the bucket key after scale_ms, instead they rely that a new key is provided (1)

This bug allows check_rate to be called 2x in sequence even if the bucket limit is 1 attempt per hour. Both calls could receive :allow if timed when the bucket is about to expire (see how to reproduce the bug below).

How to reproduce the bug

# Install hammer

Application.put_env(
  :hammer,
  :backend,
  {Hammer.Backend.ETS,
   [
     ets_table_name: :hammer_backend_ets_buckets,
     expiry_ms: :timer.hours(24),
     cleanup_interval_ms: :timer.hours(24)
   ]}
)

Mix.install([:hammer])

# Reproduce the bug

bucket_key = "bucket_key:#{:rand.uniform()}"
scale_ms = :timer.seconds(20)
max_attempts = 1

ms_about_to_expire = 100

wait_for_timing_fn = fn ->
  {:ok, {0, 1, ms_to_next_bucket, nil, nil}} =
    Hammer.inspect_bucket(bucket_key, scale_ms, max_attempts)
 
  if ms_to_next_bucket >= ms_about_to_expire do
    :timer.sleep(ms_to_next_bucket - ms_about_to_expire)
  end

  :ready
end

:ready = wait_for_timing_fn.()

{:allow, 1} = Hammer.check_rate(bucket_key, scale_ms, max_attempts)

:timer.sleep(ms_about_to_expire)

{:allow, 1} = Hammer.check_rate(bucket_key, scale_ms, max_attempts)

Proposed solution

This PR fixes this issue by:

  1. Returning a fixed key on Utils.stamp_key/2 for the same bucket ID and scale_ms
  2. Requiring the Redis/ETS backends to expire the bucket after scale_ms. I will open a PR on hammer-backend-redis if this PR is accepted: Expire bucket after scale_ms nash-io/hammer-backend-redis#1

@ruslandoga
Copy link
Contributor

ruslandoga commented Nov 16, 2024

👋 @rschef

The proposed approach would flood the message queue for hot keys. The problem you are describing is an inherent limitation of the fixed window counter. There are alternative fixes that don't require sending messages. The easiest is probably what's called a bursty counter which can be implemented with the current API by making two checks, one for the original interval with the total limit, and one for a shorter interval with a fraction of the limit. That would smooth out the number of requests allowed.

Please see #103 for more information.

@rschef
Copy link
Author

rschef commented Nov 19, 2024

Hi @ruslandoga,

you are right for the ETS implementation and I'm fine with an alternative solution for it.

I'm more interested about the Redis implementation and there we can use the EXPIRE command to expire the keys.

@ruslandoga
Copy link
Contributor

ruslandoga commented Nov 19, 2024

Would you be able to write the commands sent to Redis with your approach?

ExHammer/hammer-backend-redis#56 contains (probably) the next version of the Redis backend, so it would be best if you could adapt that version.

My understanding right now is that you would do something like this:

  @doc false
  def hit(name, prefix, key, scale, limit, increment, timeout) do
    now = System.system_time(:millisecond)
    full_key = "#{prefix}:#{key}"

    commands = [
      ["INCRBY", full_key, increment],
      ["EXPIRE", full_key, div(scale, 1000), "NX"]
    ]

    [count, _] =
      Redix.pipeline!(name, commands, timeout: timeout)

    if count <= limit do
      {:allow, count}
    else
      # TODO
      retry_after = -1 # is there a way to find out when the active EXPIRE will run?
      {:deny, retry_after}
    end
  end

Is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants