Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fault-tolerance for cache system errors #577

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
123 changes: 123 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ See the [Backing & Hacking blog post](https://www.kickstarter.com/backing-and-ha
- [Customizing responses](#customizing-responses)
- [RateLimit headers for well-behaved clients](#ratelimit-headers-for-well-behaved-clients)
- [Logging & Instrumentation](#logging--instrumentation)
- [Fault Tolerance & Error Handling](#fault-tolerance--error-handling)
- [Built-in error handling](#built-in-error-handling)
- [Expose Rails cache errors to Rack::Attack](#expose-rails-cache-errors-to-rackattack)
- [Configure cache timeout](#configure-cache-timeout)
- [Failure cooldown](#failure-cooldown)
- [Custom error handling](#custom-error-handling)
- [Testing](#testing)
- [How it works](#how-it-works)
- [About Tracks](#about-tracks)
Expand Down Expand Up @@ -395,6 +401,123 @@ ActiveSupport::Notifications.subscribe(/rack_attack/) do |name, start, finish, r
end
```

## Fault Tolerance & Error Handling

Rack::Attack has a mission-critical dependency on your [cache store](#cache-store-configuration).
If the cache system experiences an outage, it may cause severe latency within Rack::Attack
and lead to an overall application outage.

This section explains how to configure your application and handle errors in order to mitigate issues.

### Built-in error handling

By default, Rack::Attack "does the right thing" when errors occur:

- If the error is a Redis or Dalli cache error, Rack::Attack allows the error and allow the request.
johnnyshields marked this conversation as resolved.
Show resolved Hide resolved
- Otherwise, Rack::Attack re-raises the error. The request will fail.

All errors will trigger a failure cooldown (see below), regardless of whether they are allowed or raised.

### Expose Rails cache errors to Rack::Attack

If you are using Rack::Attack with Rails cache, by default, Rails cache will **suppress**
any such errors, and Rack::Attack will not be able to handle them properly as per above.
This can be dangerous: if your cache is timing out due to high request volume,
for example, Rack::Attack will continue to blindly send requests to your cache and worsen the problem.

When using Rails cache with `:redis_cache_store`, you'll need to expose errors to Rack::Attack
with a custom error handler as follows:

```ruby
# in your Rails config
config.cache_store = :redis_cache_store,
{ # ...
error_handler: -> (method:, returning:, exception:) do
raise exception if Rack::Attack.calling?
end
}
```

Rails `:mem_cache_store` and `:dalli_store` suppress all Dalli errors. The recommended
workaround is to set a [Rack::Attack-specific cache configuration](#cache-store-configuration).

### Configure cache timeout

In your application config, it is recommended to set your cache timeout to 0.1 seconds or lower.
Please refer to the [Rails Guide](https://guides.rubyonrails.org/caching_with_rails.html).

```ruby
# Set 100 millisecond timeout on Redis
config.cache_store = :redis_cache_store,
{ # ...
connect_timeout: 0.1,
read_timeout: 0.1,
write_timeout: 0.1
}
```

To use different timeout values specific to Rack::Attack, you may set a
[Rack::Attack-specific cache configuration](#cache-store-configuration).

### Failure cooldown

When any error occurs, Rack::Attack becomes disabled for a 60 seconds "cooldown" period.
This prevents a cache outage from adding timeout latency on each Rack::Attack request.
All errors trigger the failure cooldown, regardless of whether they are allowed or handled.
You can configure the cooldown period as follows:

```ruby
# in initializers/rack_attack.rb

# Disable Rack::Attack for 5 minutes if any cache failure occurs
Rack::Attack.failure_cooldown = 300

# Do not use failure cooldown
Rack::Attack.failure_cooldown = nil
```

### Custom error handling

For most use cases, it is not necessary to re-configure Rack::Attack's default error handling.
However, there are several ways you may do so.

First, you may specify the list of errors to allow as an array of Class and/or String values.

```ruby
# in initializers/rack_attack.rb
Rack::Attack.allowed_errors += [MyErrorClass, 'MyOtherErrorClass']
```

Alternatively, you may define a custom error handler as a Proc. The error handler will receive all errors,
regardless of whether they are on the allow list. Your handler should return either `:allow`, `:block`,
or `:throttle`, or else re-raise the error; other returned values will allow the request.

```ruby
# Set a custom error handler which blocks allowed errors
# and raises all others
Rack::Attack.error_handler = -> (error) do
if Rack::Attack.allow_error?(error)
Rails.logger.warn("Blocking error: #{error}")
:block
else
raise(error)
end
end
```

Lastly, you can define the error handlers as a Symbol shortcut:

```ruby
# Handle all errors with block response
Rack::Attack.error_handler = :block

# Handle all errors with throttle response
Rack::Attack.error_handler = :throttle

# Handle all errors by allowing the request
Rack::Attack.error_handler = :allow
```

## Testing

A note on developing and testing apps using Rack::Attack - if you are using throttling in particular, you will
Expand Down
138 changes: 120 additions & 18 deletions lib/rack/attack.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,18 @@ class IncompatibleStoreError < Error; end
autoload :Fail2Ban, 'rack/attack/fail2ban'
autoload :Allow2Ban, 'rack/attack/allow2ban'

THREAD_CALLING_KEY = 'rack.attack.calling'
DEFAULT_FAILURE_COOLDOWN = 60
DEFAULT_ALLOWED_ERRORS = %w[Dalli::DalliError Redis::BaseError].freeze

class << self
attr_accessor :enabled, :notifier, :throttle_discriminator_normalizer
attr_accessor :enabled,
:notifier,
:throttle_discriminator_normalizer,
:error_handler,
:allowed_errors,
:failure_cooldown

attr_reader :configuration

def instrument(request)
Expand All @@ -57,6 +67,39 @@ def reset!
cache.reset!
end

def failed!
@last_failure_at = Time.now
end

def failure_cooldown?
return false unless @last_failure_at && failure_cooldown
Time.now < @last_failure_at + failure_cooldown
end

def allow_error?(error)
allowed_errors&.any? do |ignored_error|
case ignored_error
when String then error.class.ancestors.any? {|a| a.name == ignored_error }
else error.is_a?(ignored_error)
end
end
end

def calling?
!!thread_store[THREAD_CALLING_KEY]
end

def with_calling
thread_store[THREAD_CALLING_KEY] = true
yield
ensure
thread_store[THREAD_CALLING_KEY] = nil
end

def thread_store
defined?(RequestStore) ? RequestStore.store : Thread.current
end

extend Forwardable
def_delegators(
:@configuration,
Expand Down Expand Up @@ -84,7 +127,11 @@ def reset!
)
end

# Set defaults
# Set class defaults
self.failure_cooldown = DEFAULT_FAILURE_COOLDOWN
self.allowed_errors = DEFAULT_ALLOWED_ERRORS.dup

# Set instance defaults
@enabled = true
@notifier = ActiveSupport::Notifications if defined?(ActiveSupport::Notifications)
@throttle_discriminator_normalizer = lambda do |discriminator|
Expand All @@ -100,32 +147,87 @@ def initialize(app)
end

def call(env)
return @app.call(env) if !self.class.enabled || env["rack.attack.called"]
return @app.call(env) if !self.class.enabled || env["rack.attack.called"] || self.class.failure_cooldown?

env["rack.attack.called"] = true
env['rack.attack.called'] = true
johnnyshields marked this conversation as resolved.
Show resolved Hide resolved
env['PATH_INFO'] = PathNormalizer.normalize_path(env['PATH_INFO'])
request = Rack::Attack::Request.new(env)
result = :allow

self.class.with_calling do
result = get_result(request)
rescue StandardError => error
return do_error_response(error, request, env)
end

do_response(result, request, env)
end

private

def get_result(request)
if configuration.safelisted?(request)
@app.call(env)
:allow
elsif configuration.blocklisted?(request)
# Deprecated: Keeping blocklisted_response for backwards compatibility
if configuration.blocklisted_response
configuration.blocklisted_response.call(env)
else
configuration.blocklisted_responder.call(request)
end
:block
elsif configuration.throttled?(request)
# Deprecated: Keeping throttled_response for backwards compatibility
if configuration.throttled_response
configuration.throttled_response.call(env)
else
configuration.throttled_responder.call(request)
end
:throttle
else
configuration.tracked?(request)
@app.call(env)
:allow
end
end

def do_response(result, request, env)
johnnyshields marked this conversation as resolved.
Show resolved Hide resolved
case result
when :block then do_block_response(request, env)
when :throttle then do_throttle_response(request, env)
else @app.call(env)
end
end

def do_block_response(request, env)
# Deprecated: Keeping blocklisted_response for backwards compatibility
if configuration.blocklisted_response
configuration.blocklisted_response.call(env)
else
configuration.blocklisted_responder.call(request)
end
end

def do_throttle_response(request, env)
# Deprecated: Keeping throttled_response for backwards compatibility
if configuration.throttled_response
configuration.throttled_response.call(env)
else
configuration.throttled_responder.call(request)
end
end

def do_error_response(error, request, env)
self.class.failed!
result = error_result(error, request, env)
result ? do_response(result, request, env) : raise(error)
end

def error_result(error, request, env)
handler = self.class.error_handler
if handler
error_handler_result(handler, error, request, env)
elsif self.class.allow_error?(error)
:allow
end
end

def error_handler_result(handler, error, request, env)
result = handler

if handler.is_a?(Proc)
args = [error, request, env].first(handler.arity)
johnnyshields marked this conversation as resolved.
Show resolved Hide resolved
result = handler.call(*args) # may raise error
end

%i[block throttle].include?(result) ? result : :allow
end
end
end
30 changes: 8 additions & 22 deletions lib/rack/attack/store_proxy/dalli_proxy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,34 +24,26 @@ def initialize(client)
end

def read(key)
rescuing do
with do |client|
client.get(key)
end
with do |client|
client.get(key)
end
end

def write(key, value, options = {})
rescuing do
with do |client|
client.set(key, value, options.fetch(:expires_in, 0), raw: true)
end
with do |client|
client.set(key, value, options.fetch(:expires_in, 0), raw: true)
end
end

def increment(key, amount, options = {})
rescuing do
with do |client|
client.incr(key, amount, options.fetch(:expires_in, 0), amount)
end
with do |client|
client.incr(key, amount, options.fetch(:expires_in, 0), amount)
end
end

def delete(key)
rescuing do
with do |client|
client.delete(key)
end
with do |client|
client.delete(key)
end
end

Expand All @@ -66,12 +58,6 @@ def with
end
end
end

def rescuing
yield
rescue Dalli::DalliError
nil
end
end
end
end
Expand Down
Loading