Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(admin-auth): Add TTL cache for admin roles #4358

Merged
merged 7 commits into from
Jun 26, 2023

Conversation

davidtsuk
Copy link
Contributor

Currently, we set up a user's roles every time we make a request which is inefficient as it requires reading from a file and making multiple API requests to google. In order to improve performance, I've added a TTL cache with a 10 min TTL. Since it's unlikely for roles and groups to change, we should consider increasing this TTL in the future.

@codecov
Copy link

codecov bot commented Jun 16, 2023

Codecov Report

Patch coverage: 54.01% and project coverage change: -0.33 ⚠️

Comparison is base (7b480a2) 91.05% compared to head (b070fe2) 90.72%.

❗ Current head b070fe2 differs from pull request most recent head f78fa66. Consider uploading reports for the commit f78fa66 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4358      +/-   ##
==========================================
- Coverage   91.05%   90.72%   -0.33%     
==========================================
  Files         793      797       +4     
  Lines       38886    39254     +368     
  Branches      226      245      +19     
==========================================
+ Hits        35408    35614     +206     
- Misses       3448     3608     +160     
- Partials       30       32       +2     
Impacted Files Coverage Δ
setup.py 0.00% <0.00%> (ø)
snuba/admin/auth_roles.py 94.20% <ø> (ø)
snuba/admin/static/api_client.tsx 2.13% <0.00%> (-0.35%) ⬇️
snuba/cli/dlq_consumer.py 0.00% <0.00%> (ø)
snuba/migrations/group_loader.py 95.00% <ø> (ø)
snuba/migrations/groups.py 97.82% <ø> (ø)
...allocation_policies/bytes_scanned_window_policy.py 96.05% <ø> (-0.20%) ⬇️
snuba/query/snql/discover_entity_selection.py 98.11% <ø> (ø)
snuba/settings/settings_self_hosted.py 0.00% <0.00%> (ø)
snuba/settings/settings_test.py 100.00% <ø> (ø)
... and 23 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@davidtsuk davidtsuk marked this pull request as ready for review June 16, 2023 19:25
@davidtsuk davidtsuk requested a review from a team as a code owner June 16, 2023 19:25
Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's caching the reading of the file (which shouldn't be that slow). Do we have evidence that this is the thing that needs to be cached? Is it caching the API calls somewhere that I don't see?

@@ -68,7 +72,14 @@ def get_iam_roles_from_file(user: AdminUser) -> Sequence[str]:
def _set_roles(user: AdminUser) -> AdminUser:
# todo: depending on provider convert user email
# to subset of DEFAULT_ROLES based on IAM roles
iam_roles = get_iam_roles_from_file(user)
key = f"roles-{user.email}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're caching the file reading here. Is reading from the local file really taking that much time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function name is a bit misleading here, but this function reads the file and also makes a call to check whether a member is in a group and that's where the API is being called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function should be renamed get_iam_roles_for_user.

@davidtsuk davidtsuk requested a review from volokluev June 16, 2023 21:22
@@ -68,7 +72,14 @@ def get_iam_roles_from_file(user: AdminUser) -> Sequence[str]:
def _set_roles(user: AdminUser) -> AdminUser:
# todo: depending on provider convert user email
# to subset of DEFAULT_ROLES based on IAM roles
iam_roles = get_iam_roles_from_file(user)
key = f"roles-{user.email}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function should be renamed get_iam_roles_for_user.

@@ -359,6 +364,31 @@ def test_get_iam_roles(caplog: Any) -> None:
tool_role,
]

iam_file.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very large and should be split up into multiple tests.

You've also tested the happy path but the unhappy path is not tested. Example: what if the redis connection fails, does that mean nobody can access admin?

Also, is _set_roles a private module function? If so, can this be tested with the public API? If not, then maybe the API is not properly thought out.

Comment on lines 76 to 82
iam_roles_str = redis_client.get(key)
iam_roles = rapidjson.loads(iam_roles_str) if iam_roles_str else None
if not iam_roles:
iam_roles = get_iam_roles_from_file(user)
redis_client.set(key, rapidjson.dumps(iam_roles))
redis_client.expire(key, settings.ADMIN_ROLES_REDIS_TTL)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation tightly couples authentication to caching. If your cache fails to update, you'll fail the request. this should be done in such a way that:

  1. The distinction between cache and auth is clear
  2. If the caching fails the app still works

iam_roles = rapidjson.loads(iam_roles_str) if iam_roles_str else None
if not iam_roles:
iam_roles = get_iam_roles_from_file(user)
redis_client.set(key, rapidjson.dumps(iam_roles))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight performance nit: set already has an expiration parameter

@davidtsuk davidtsuk requested a review from volokluev June 19, 2023 14:25
@davidtsuk
Copy link
Contributor Author

Added additional tests and handling for redis failure.

@@ -137,6 +138,9 @@ class RedisClientKey(Enum):
RedisClientKey.OPTIMIZE: _initialize_specialized_redis_cluster(
settings.REDIS_CLUSTERS["optimize"]
),
RedisClientKey.ADMIN_AUTH: _initialize_specialized_redis_cluster(
settings.REDIS_CLUSTERS["admin_auth"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a PR to add admin_auth to the prod settings as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, I'll make a PR right now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will need to go out before this can be merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stuff is super inconvenient. I wonder if we can make it so that the prod_settings file can override individual keys of the default settings file, and does not have to redefine the entire variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stuff is super inconvenient. I wonder if we can make it so that the prod_settings file can override individual keys of the default settings file, and does not have to redefine the entire variable

@untitaker I made a draft PR to do this here.

@davidtsuk davidtsuk requested a review from evanh June 26, 2023 15:18
@davidtsuk
Copy link
Contributor Author

@evanh ops pr has been merged

@davidtsuk davidtsuk merged commit a367a37 into master Jun 26, 2023
20 checks passed
@davidtsuk davidtsuk deleted the david/feat/admin-roles-cache branch June 26, 2023 15:59
@sentry-io
Copy link

sentry-io bot commented Jun 26, 2023

Suspect Issues

This pull request has been deployed and Sentry has observed the following issues:

  • ‼️ KeyError: 'admin_auth' 'admin_auth' KeyError snuba/redis.py <module> F... View Issue

Have questions? Reach out to us in the #proj-github-pr-comments channel.

Did you find this useful? React with a 👍 or 👎

@getsentry-bot
Copy link
Contributor

PR reverted: f65f5ad

getsentry-bot added a commit that referenced this pull request Jun 26, 2023
This reverts commit a367a37.

Co-authored-by: lynnagara <1779792+lynnagara@users.noreply.github.com>
@rahul-kumar-saini rahul-kumar-saini restored the david/feat/admin-roles-cache branch June 26, 2023 18:48
@davidtsuk davidtsuk deleted the david/feat/admin-roles-cache branch June 26, 2023 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants