Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: concurrent connections limiter #751

Merged
merged 8 commits into from
Feb 27, 2024
Merged

Conversation

nlwstein
Copy link
Contributor

@nlwstein nlwstein commented Feb 9, 2024

Summary of changes

Asana Ticket: 🍎 Reject API requests when there are too many concurrent requests

This provides a new rate limiter that is mainly useful for clamping down on too many active streaming connections. It is also able to track and control long-running "static" requests (for example, trying to load all of the predictions at once). It is configurable on a per-user basis in the admin panel (-1 to disable). The largest value between user config and the base config is used.

The static concurrency locks are released in real-time, whereas the streaming ones depend on the hibernate loop cycle, so there can be a latency of under a minute for releasing those. The limits I have added to the config file are guidance for testing, not necessarily for production.

Opinion: I am not sure if memcached is the ideal tool for this, but I used it because I don't want to introduce new infrastructure if possible and it was already in-use for a similar job. Changing out the storage mechanism in the future should be relatively easy.

The whole rate limiting system seems like a good use case for Redis (or perhaps there's something new that's even better), primarily because:

  1. You can more efficiently filter / upsert nested data, or even just on a key prefix basis.
  2. It has a concept of locks and transactions

@paulswartz
Copy link
Member

Can we deploy this to one of the dev-* environments?

Copy link

github-actions bot commented Feb 9, 2024

Coverage of commit 70db24d

Summary coverage rate:
  lines......: 88.2% (4236 of 4800 lines)
  functions..: 70.9% (2269 of 3201 functions)
  branches...: no data found

Files changed coverage rate:
                                                                        |Lines       |Functions  |Branches    
  Filename                                                              |Rate     Num|Rate    Num|Rate     Num
  ============================================================================================================
  apps/api_accounts/lib/api_accounts/key.ex                             |75.0%      4|70.0%    10|    -      0
  apps/api_web/lib/api_web.ex                                           |85.7%     14|83.3%     6|    -      0
  apps/api_web/lib/api_web/api_controller_helpers.ex                    |96.2%     53|93.8%    16|    -      0
  apps/api_web/lib/api_web/event_stream.ex                              |82.5%     57|84.6%    13|    -      0
  apps/api_web/lib/api_web/plugs/rate_limiter_concurrent.ex             |15.0%     20|33.3%     6|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache.ex                     | 0.0%      6| 0.0%     3|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache/supervisor.ex          |43.8%     16|40.0%     5|    -      0
  apps/api_web/lib/api_web/rate_limiter/rate_limiter_concurrent.ex      |35.7%     56|37.5%    16|    -      0
  apps/api_web/lib/api_web/user.ex                                      |83.3%      6| 100%     5|    -      0

Download coverage report

Copy link

github-actions bot commented Feb 9, 2024

Coverage of commit f8aabfd

Summary coverage rate:
  lines......: 88.2% (4236 of 4800 lines)
  functions..: 70.9% (2269 of 3201 functions)
  branches...: no data found

Files changed coverage rate:
                                                                        |Lines       |Functions  |Branches    
  Filename                                                              |Rate     Num|Rate    Num|Rate     Num
  ============================================================================================================
  apps/api_accounts/lib/api_accounts/key.ex                             |75.0%      4|70.0%    10|    -      0
  apps/api_web/lib/api_web.ex                                           |85.7%     14|83.3%     6|    -      0
  apps/api_web/lib/api_web/api_controller_helpers.ex                    |96.2%     53|93.8%    16|    -      0
  apps/api_web/lib/api_web/event_stream.ex                              |84.2%     57|84.6%    13|    -      0
  apps/api_web/lib/api_web/plugs/rate_limiter_concurrent.ex             |15.0%     20|33.3%     6|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache.ex                     | 0.0%      6| 0.0%     3|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache/supervisor.ex          |43.8%     16|40.0%     5|    -      0
  apps/api_web/lib/api_web/rate_limiter/rate_limiter_concurrent.ex      |35.7%     56|37.5%    16|    -      0
  apps/api_web/lib/api_web/user.ex                                      |83.3%      6| 100%     5|    -      0

Download coverage report

@paulswartz
Copy link
Member

@nlwstein hmm, something looks weird about that log. for the first two requests, "concurrent" + "limit" == "remaining", but for the last request, "concurrent" + "limit" != "remaining". That makes me think we either didn't count a request, or one expired from the lock without being actually disconnected.

@nlwstein
Copy link
Contributor Author

@nlwstein hmm, something looks weird about that log. for the first two requests, "concurrent" + "limit" == "remaining", but for the last request, "concurrent" + "limit" != "remaining". That makes me think we either didn't count a request, or one expired from the lock without being actually disconnected.

@paulswartz I don't think that concurrent field is coming from my code. I'm not familiar with it and suspect it comes from another package.

My log statement begins with the module name:

        Logger.error(
          "ApiWeb.Plugs.RateLimiterConcurrent event=request_statistics api_user=#{conn.assigns.api_user.id} at_limit=#{at_limit?} remaining=#{remaining - 1} limit=#{limit} event_stream=#{event_stream?}"
        )

@paulswartz
Copy link
Member

That's right, the concurrent log is coming from the existing concurrency tracker. But since there's only one node, it should always agree with the new concurrency tracker and it doesn't look like that's the case.

Copy link
Member

@paulswartz paulswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putting in a temporary review so that it drops off the reminder list. open issue is that the existing concurrency logger does not always agree with the new concurrency limiter, and we're not sure why.

Copy link

Coverage of commit dceca73

Summary coverage rate:
  lines......: 88.2% (4234 of 4800 lines)
  functions..: 70.9% (2268 of 3200 functions)
  branches...: no data found

Files changed coverage rate:
                                                                        |Lines       |Functions  |Branches    
  Filename                                                              |Rate     Num|Rate    Num|Rate     Num
  ============================================================================================================
  apps/api_accounts/lib/api_accounts/key.ex                             |75.0%      4|70.0%    10|    -      0
  apps/api_web/lib/api_web.ex                                           |85.7%     14|83.3%     6|    -      0
  apps/api_web/lib/api_web/api_controller_helpers.ex                    |96.2%     53|93.8%    16|    -      0
  apps/api_web/lib/api_web/event_stream.ex                              |82.5%     57|84.6%    13|    -      0
  apps/api_web/lib/api_web/plugs/rate_limiter_concurrent.ex             |15.0%     20|33.3%     6|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache.ex                     | 0.0%      6| 0.0%     3|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache/supervisor.ex          |43.8%     16|40.0%     5|    -      0
  apps/api_web/lib/api_web/rate_limiter/rate_limiter_concurrent.ex      |35.7%     56|37.5%    16|    -      0
  apps/api_web/lib/api_web/user.ex                                      |83.3%      6| 100%     5|    -      0

Download coverage report

@nlwstein
Copy link
Contributor Author

putting in a temporary review so that it drops off the reminder list. open issue is that the existing concurrency logger does not always agree with the new concurrency limiter, and we're not sure why.

I tried to reproduce this locally, and could not seem to make it happen. The counters were synced, even having the same 'ghost' requests after putting my laptop to sleep for the night 😂

My guess as to what happened is that a bunch of requests were opened and closed in a short period of time, simultaneously, and when that log fired there was a race between when the request tracker updated its logging metadata, and this cleanup filter which fires right before that particular log message is emitted. It also relies on a (configurable) heartbeat, so it's possible that it failed a heartbeat check, reducing the count of monitored processes on this feature before the callback for the request tracker fired.

I also redeployed to dev-blue and it didn't seem to happen again.

Maybe once we deploy this we should write a Splunk query to determine how frequently this happens? Ideally, it's just the heartbeat mechanism doing it's thing 🙏

@paulswartz
Copy link
Member

I see it happening a lot in Splunk?

@bklebe bklebe removed their request for review February 21, 2024 19:55
Copy link

Coverage of commit 66796ad

Summary coverage rate:
  lines......: 88.2% (4234 of 4800 lines)
  functions..: 70.9% (2268 of 3200 functions)
  branches...: no data found

Files changed coverage rate:
                                                                        |Lines       |Functions  |Branches    
  Filename                                                              |Rate     Num|Rate    Num|Rate     Num
  ============================================================================================================
  apps/api_accounts/lib/api_accounts/key.ex                             |75.0%      4|70.0%    10|    -      0
  apps/api_web/lib/api_web.ex                                           |85.7%     14|83.3%     6|    -      0
  apps/api_web/lib/api_web/api_controller_helpers.ex                    |96.2%     53|93.8%    16|    -      0
  apps/api_web/lib/api_web/event_stream.ex                              |82.5%     57|84.6%    13|    -      0
  apps/api_web/lib/api_web/plugs/rate_limiter_concurrent.ex             |15.0%     20|33.3%     6|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache.ex                     | 0.0%      6| 0.0%     3|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache/supervisor.ex          |43.8%     16|40.0%     5|    -      0
  apps/api_web/lib/api_web/rate_limiter/rate_limiter_concurrent.ex      |35.7%     56|37.5%    16|    -      0
  apps/api_web/lib/api_web/user.ex                                      |83.3%      6| 100%     5|    -      0

Download coverage report

Copy link
Member

@paulswartz paulswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty review pending more investigation

Copy link
Member

@paulswartz paulswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a chat with Firas in Slack. Summary:

  • merge this as is
  • continue to investigate the discrepancy between the original counter and the distributed counter
  • if we're unable to make progress on the accuracy of the distributed counter by 2024-03-31, we'll roll it back and discuss other approaches

@nlwstein nlwstein merged commit fedc026 into master Feb 27, 2024
6 checks passed
@nlwstein nlwstein deleted the nlws-new-connections-limiter branch February 27, 2024 15:58
Copy link

Coverage of commit 446928f

Summary coverage rate:
  lines......: 88.2% (4235 of 4800 lines)
  functions..: 70.9% (2268 of 3200 functions)
  branches...: no data found

Files changed coverage rate:
                                                                        |Lines       |Functions  |Branches    
  Filename                                                              |Rate     Num|Rate    Num|Rate     Num
  ============================================================================================================
  apps/api_accounts/lib/api_accounts/key.ex                             |75.0%      4|70.0%    10|    -      0
  apps/api_web/lib/api_web.ex                                           |85.7%     14|83.3%     6|    -      0
  apps/api_web/lib/api_web/api_controller_helpers.ex                    |96.2%     53|93.8%    16|    -      0
  apps/api_web/lib/api_web/event_stream.ex                              |84.2%     57|84.6%    13|    -      0
  apps/api_web/lib/api_web/plugs/rate_limiter_concurrent.ex             |15.0%     20|33.3%     6|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache.ex                     | 0.0%      6| 0.0%     3|    -      0
  apps/api_web/lib/api_web/rate_limiter/memcache/supervisor.ex          |43.8%     16|40.0%     5|    -      0
  apps/api_web/lib/api_web/rate_limiter/rate_limiter_concurrent.ex      |35.7%     56|37.5%    16|    -      0
  apps/api_web/lib/api_web/user.ex                                      |83.3%      6| 100%     5|    -      0

Download coverage report

meagharty added a commit that referenced this pull request Mar 1, 2024
meagharty added a commit that referenced this pull request Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants