LoadBalancerCacheManager supports refresh cache, not just expire cache #1215

jizhuozhi · 2023-03-14T12:11:39Z

Is your feature request related to a problem? Please describe.
Spring Cloud LoadBalancer now only supports expired caching, that is, after a period of time, the cache item will become invalid, and when the same content is requested again, it will re-submit a request to the registry blockly to obtain the service instance. In smooth service there will be spikes.

In addition to causing spikes, this also poses a challenge to high availability. When the cache expires, all cache items are evicted. At this time, there are no related service instances in memory. If the registry goes down, the entire None of the client instances can submit a request to the service provider, and the service is completely unavailable. In a refresh cache, this will have no effect. If the registry goes down, we just don't get the latest data, and slightly older data is better than no data (in fact, the service instance changes very frequently low, the impact of old data is negligible).

Describe the solution you'd like
Based on this, I think it is necessary to provide refresh cache instead of just expire cache

A feasible solution is to support configuring the refresh interval and use DiscoveryClient to grab service instances, and at the same time support configuring the expiration time to avoid using expired data for a long time (exposing the problem)

setCaffeine(Caffeine.newBuilder()
    .initialCapacity(properties.getCapacity())
    .refreshAfterWrite(properties.getTtl().dividedBy(2))
    .expireAfterWrite(properties.getTtl())
    .softValues());
setCacheLoader((key) -> discoveryClient.getInstances((String) key));

xinlunanxinlunan · 2023-07-12T09:40:38Z

Hello, our calling relationship A->B, B has two instances (B1,B2). Set the ttl to 30s, and continue to send requests to A. If B2 is offline, A's cache is not updated. As A result, an error occurs when A calls B2. In this case, traffic must be disabled for the cache of A to be updated. I think this is a very big bug that needs to be updated or not.

jizhuozhi · 2023-07-12T09:49:45Z

Hello, @xinlunanxinlunan.

First of all, this is not a bug, what you need is a health check before invoking remote service instance.

Secondly, refreshing means that if there is a new list, we will replace it, but it is irresponsible to clear it if the new list cannot be obtained.

In your scenario, if 50% of the instances are unavailable, then use the history list when the new service list cannot be obtained, and the availability rate will only drop to 50%, but if all the instances are directly expired, then the service cannot be obtained, the availability rate is 0%.

Finally, this Issue is not for discussing when to update, you'd better create a new issue, I will not continue to discuss this topic in this.

sobelek · 2023-09-27T14:48:40Z

@jizhuozhi @OlgaMaciaszek
Hey. I have just stumbled upon this issue. K8s API got overloaded and started returning 429. In the same time cache expired and got cleared so for some time we were left with no instances for loadbalancer (Even tho there were healthy instances).

Is there any known walk-around for this?

I was thinking about using HealthCheck loadbalancer with:

spring.cloud.loadbalancer.health-check.refetch-instances-interval: true
spring.cloud.loadbalancer.health-check.refetch-instances-interval: 30s
spring.cloud.loadbalancer.health-check.repeat-health-check: false
spring.cloud.loadbalancer.health-check.update-results-list: false

This seems like that should behave just like 30s cache but it would not clear cache after refetch-interval. WDYT?

jizhuozhi · 2023-10-04T15:47:30Z

Hello, @sobelek.

What your need in this scene is not scheduling HealthCheck api, but as same as the core of this Proposal

If the registry goes down, the entire None of the client instances can submit a request to the service provider, and the service is completely unavailable. In a refresh cache, this will have no effect.

And scheduling HealthCheck api could be implemented via composing Caching and HealthCheck suppliers.

sobelek · 2023-10-04T16:36:53Z

Hey.
I know I would prefer refreshWrite cache to be available but as it is not I am looking into other possible solution to achieve the same result

jizhuozhi · 2023-10-04T16:51:07Z

Hello, @sobelek . Thanks for your reply.

We may need to align a boundary issue. Generally speaking, health checks check the health status of an instance when given one from instance list, but it has no responsibility to change the instance list. If the logic of refetching instances is incorporated into health checks, the boundary between service discovery and health checks will be broken, and the responsibilities of health checks will no longer be single.

Of course, I tried to abstract your solution. In fact, what we need is still an event source that triggers an event regularly to notify the loadbalancer that the instance list needs to be refreshed, so our solution is the same.

sobelek · 2023-10-04T16:57:53Z

Fully agree with a boundary between discovery and loadbalancing. Thanks for clarifying this.

Event driven discovery or refreshing cache would both be great and would resolve to original issue👍

jizhuozhi · 2023-10-04T17:10:56Z

Hello, @sobelek.

As an aside, and in my experience, usually only application layer protocols that support multiplexing (such as Apache Dubbo) will perform health checks on the client side, because only a simple ping-pong request is required. But for HTTP/1 (Spring MVC or Webflux), because each request is a new TCP connection, when the number of instances is very large, the health check may issue a large number of on-the-fly requests and leave lots of CLOSE_WAIT connections after the request ends. As a result, normal requests cannot be processed. The same is true for the server side.

sobelek · 2023-10-09T10:10:14Z

@jizhuozhi

Yea, true that. We are actually using k8s so healthchecking is done by k8s out of the box.

Back to original issue. Do you have any idea if you guys are going to work on refreshAfter cache for spring loadbalancer?

jizhuozhi · 2023-10-09T10:48:20Z

Hello, @sobelek . There is no out box solution for all CacheManager, but Caffeine provided refreshAfterWrite out of box. So you could custom a new CacheManager based on CaffeineBasedLoadBalancerCacheManager with your service discovery client.

sobelek · 2023-10-10T10:33:46Z

Thanks, I am trying to do this. Would you like me to also make a PR here when finished?

jizhuozhi · 2023-10-10T10:36:21Z

Thanks, I am trying to do this. Would you like me to also make a PR here when finished?

Of course, welcome. This Issue has been assigned to @OlgaMaciaszek. Do you have any good suggestions @OlgaMaciaszek ?

OlgaMaciaszek · 2023-10-31T14:49:17Z

Thanks, @jizhuozhi, makes sense; however, I'd like to see a separate property for the refresh interval (can be the value you've proposed as default, but needs to be customisable).
@sobelek as indicated in the docs, the HealthCheckServiceInstanceListSupplier should not be used together with the CachingServiceInstanceListSupplier as it has it's own caching mechanism. Please feel free to provide a PR for this feature.

OlgaMaciaszek · 2023-11-03T12:18:17Z

@sobelek Please confirm if you will be submitting this PR.

sobelek · 2023-11-03T17:23:21Z

@OlgaMaciaszek
I would love to. Don't have much time currently but will try to do it in coming weeks.
I am not sure yet about how to set a loader for cache but will do PoC so we can discuss it

OlgaMaciaszek · 2023-11-06T09:24:28Z

Ok, please do and ping me here.

spring-cloud-issues added the waiting-for-triage label Mar 14, 2023

OlgaMaciaszek self-assigned this Mar 16, 2023

OlgaMaciaszek added in progress and removed waiting-for-triage labels Mar 16, 2023

OlgaMaciaszek added enhancement and removed in progress labels Oct 31, 2023

OlgaMaciaszek assigned sobelek and unassigned OlgaMaciaszek Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoadBalancerCacheManager supports refresh cache, not just expire cache #1215

LoadBalancerCacheManager supports refresh cache, not just expire cache #1215

jizhuozhi commented Mar 14, 2023 •

edited

Loading

xinlunanxinlunan commented Jul 12, 2023

jizhuozhi commented Jul 12, 2023 •

edited

Loading

sobelek commented Sep 27, 2023

jizhuozhi commented Oct 4, 2023

sobelek commented Oct 4, 2023

jizhuozhi commented Oct 4, 2023 •

edited

Loading

sobelek commented Oct 4, 2023

jizhuozhi commented Oct 4, 2023 •

edited

Loading

sobelek commented Oct 9, 2023

jizhuozhi commented Oct 9, 2023

sobelek commented Oct 10, 2023

jizhuozhi commented Oct 10, 2023

OlgaMaciaszek commented Oct 31, 2023

OlgaMaciaszek commented Nov 3, 2023

sobelek commented Nov 3, 2023

OlgaMaciaszek commented Nov 6, 2023

LoadBalancerCacheManager supports refresh cache, not just expire cache #1215

LoadBalancerCacheManager supports refresh cache, not just expire cache #1215

Comments

jizhuozhi commented Mar 14, 2023 • edited Loading

xinlunanxinlunan commented Jul 12, 2023

jizhuozhi commented Jul 12, 2023 • edited Loading

sobelek commented Sep 27, 2023

jizhuozhi commented Oct 4, 2023

sobelek commented Oct 4, 2023

jizhuozhi commented Oct 4, 2023 • edited Loading

sobelek commented Oct 4, 2023

jizhuozhi commented Oct 4, 2023 • edited Loading

sobelek commented Oct 9, 2023

jizhuozhi commented Oct 9, 2023

sobelek commented Oct 10, 2023

jizhuozhi commented Oct 10, 2023

OlgaMaciaszek commented Oct 31, 2023

OlgaMaciaszek commented Nov 3, 2023

sobelek commented Nov 3, 2023

OlgaMaciaszek commented Nov 6, 2023

jizhuozhi commented Mar 14, 2023 •

edited

Loading

jizhuozhi commented Jul 12, 2023 •

edited

Loading

jizhuozhi commented Oct 4, 2023 •

edited

Loading

jizhuozhi commented Oct 4, 2023 •

edited

Loading