Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filters/auth: use sync.Map for tokeninfo cache #3267

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AlexanderYastrebov
Copy link
Member

  • use sync.Map for tokeninfo cache to avoid synchronizing all callers on a single mutex
  • evict stale entries periodically instead of least recently used
  • store token expiration time instead of creation time
                                                  │      master      │                HEAD                 │
                                                  │      sec/op      │   sec/op     vs base                │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8               275.5n ±  6%   170.1n ± 4%  -38.26% (p=0.000 n=10)
TokeninfoCache/tokens=2,cacheSize=2,p=0-8               492.9n ± 21%   176.8n ± 2%  -64.12% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=100,p=0-8           455.9n ±  7%   165.5n ± 1%  -63.70% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8       593.4n ±  4%   179.8n ± 4%  -69.71% (p=0.000 n=10)
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           2571424.0n ±  0%   149.7n ± 3%  -99.99% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        2579227.5n ±  0%   139.3n ± 1%  -99.99% (p=0.000 n=10)
geomean                                                 7.903µ         162.9n       -97.94%

                                                  │   master   │                  HEAD                   │
                                                  │    B/op    │    B/op      vs base                    │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8           344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=2,cacheSize=2,p=0-8           344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=0-8       344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8   368.0 ± 1%    350.0 ± 0%     -4.89% (p=0.000 n=10)
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           27.00 ± 0%   344.00 ± 0%  +1174.07% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        27.00 ± 7%   344.00 ± 0%  +1174.07% (p=0.000 n=10)
geomean                                             149.0         345.0        +131.62%
¹ all samples are equal

                                                  │    master    │              HEAD              │
                                                  │  allocs/op   │ allocs/op   vs base            │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8           3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=2,cacheSize=2,p=0-8           3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=0-8       3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8   3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           0.000 ± 0%     3.000 ± 0%  ? (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        0.000 ± 0%     3.000 ± 0%  ? (p=0.000 n=10)
geomean                                                        ²   3.000       ?
¹ all samples are equal
² summaries must be >0 to compute geomean

@AlexanderYastrebov AlexanderYastrebov added the major moderate risk, for example new API, small filter changes that have no risk like refactoring or logs label Oct 9, 2024
@AlexanderYastrebov AlexanderYastrebov force-pushed the filters/auth/tokeninfocache-syncmap branch from 67cd561 to 813bf2a Compare October 9, 2024 11:01
info: info,
href: c.history.PushFront(token),
func (c *tokeninfoCache) evictLoop() {
ticker := time.NewTicker(time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense this time.Minute configurable at the next PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure which other value would be good.

mu sync.Mutex
cache map[string]*entry
// least recently used token at the end
history *list.List
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you would like to keep tokeninfo filter lock-free, that is why you are thinking about not keeping history any more.

However, there is a data structure which acts like a list and has lock-free implementations (at least in theory). https://en.wikipedia.org/wiki/Skip_list

Do you think it makes sense to try to obtain this data structure to have lock-free history access and not evict random items?

Copy link
Member Author

@AlexanderYastrebov AlexanderYastrebov Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was used to evict oldest entries when cache grows over the limit.
This change simplifies this by simply removing random items if number of cached items is over the size limit.
It also adds a metric for monitoring.

In production setup one should have cache size set large enough such that entries are never evicted due to overflow. This way there is no need to complicate eviction algorithm, keep access history or use clever datastructures.


// Evict random entries until the cache size is within limits
if c.count.Load() > int64(c.size) {
c.cache.Range(func(key, value any) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm, the random entries here comes from the random sort of the map, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the iteration order is no defined for maps and hence for sync.Map as it uses map internally.

@@ -111,7 +118,7 @@ func TestTokeninfoCache(t *testing.T) {

assert.Equal(t, int32(1), authRequests, "expected no request to auth sever")
assert.Equal(t, token, info["uid"])
assert.Equal(t, float64(595), info["expires_in"], "expected TokenTTLSeconds - truncate(delay)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still quite not obvious why 595 changed to 594?

Is it related to this part of code?
https://github.com/zalando/skipper/pull/3267/files#diff-a2721f4aa66d24557c036d686d4e3344636871b2de93fb75266b8da83d511891L79

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, previous version adjusted cached expires_in based on the time elapsed since entry was cached.
Since the number must be integer it truncated the elapsed time:

new_expires_in = expires_in - truncateToSeconds(now-cachedAt)

this version stores expiration date instead of cached date and calculates expires_in from it:

new_expires_in = truncateToSeconds(infoExpiresAt-now)

This test moves clock by 5.7 seconds so previous version resulted in:

new_expires_in = expires_in - truncateToSeconds(now-cachedAt) = 600 - truncateToSeconds(cachedAt+5.7-cachedAt) =  truncateToSeconds(5.7) = 595

and this version results in:

new_expires_in = truncateToSeconds(infoExpiresAt-now) = truncateToSeconds(cachedAt+600-(cachedAt+5.7)) = truncateToSeconds(594.3) = 594

@AlexanderYastrebov AlexanderYastrebov force-pushed the filters/auth/tokeninfocache-syncmap branch from 813bf2a to 818dd4d Compare October 10, 2024 11:22
* use sync.Map for tokeninfo cache to avoid synchronizing all callers
  on a single mutex
* evict stale entries periodically instead of least recently used
* store token expiration time instead of creation time

```
                                                  │      master      │                HEAD                 │
                                                  │      sec/op      │   sec/op     vs base                │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8               275.5n ±  6%   170.1n ± 4%  -38.26% (p=0.000 n=10)
TokeninfoCache/tokens=2,cacheSize=2,p=0-8               492.9n ± 21%   176.8n ± 2%  -64.12% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=100,p=0-8           455.9n ±  7%   165.5n ± 1%  -63.70% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8       593.4n ±  4%   179.8n ± 4%  -69.71% (p=0.000 n=10)
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           2571424.0n ±  0%   149.7n ± 3%  -99.99% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        2579227.5n ±  0%   139.3n ± 1%  -99.99% (p=0.000 n=10)
geomean                                                 7.903µ         162.9n       -97.94%

                                                  │   master   │                  HEAD                   │
                                                  │    B/op    │    B/op      vs base                    │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8           344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=2,cacheSize=2,p=0-8           344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=0-8       344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8   368.0 ± 1%    350.0 ± 0%     -4.89% (p=0.000 n=10)
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           27.00 ± 0%   344.00 ± 0%  +1174.07% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        27.00 ± 7%   344.00 ± 0%  +1174.07% (p=0.000 n=10)
geomean                                             149.0         345.0        +131.62%
¹ all samples are equal

                                                  │    master    │              HEAD              │
                                                  │  allocs/op   │ allocs/op   vs base            │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8           3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=2,cacheSize=2,p=0-8           3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=0-8       3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8   3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           0.000 ± 0%     3.000 ± 0%  ? (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        0.000 ± 0%     3.000 ± 0%  ? (p=0.000 n=10)
geomean                                                        ²   3.000       ?
¹ all samples are equal
² summaries must be >0 to compute geomean
```

Signed-off-by: Alexander Yastrebov <alexander.yastrebov@zalando.de>
@AlexanderYastrebov AlexanderYastrebov force-pushed the filters/auth/tokeninfocache-syncmap branch from 818dd4d to c429f5e Compare January 3, 2025 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major moderate risk, for example new API, small filter changes that have no risk like refactoring or logs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants