-
Notifications
You must be signed in to change notification settings - Fork 16
Treat resourceVersion as opaque string #467
Conversation
The Kubernetes API explicitly disallows parsing the ResourceVersion as an integer https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions
if current, ok := g.gatewayClasses[class.class.Name]; ok { | ||
if utils.ResourceVersionGreater(current.class.ResourceVersion, class.class.ResourceVersion) { | ||
if cachedClass, ok := g.gatewayClasses[class.class.Name]; ok { | ||
if cachedClass.class.ResourceVersion == class.class.ResourceVersion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So part of the problem with this is that I believe we can actually corrupt our cache if we're simultaneously reconciling resources and only doing equality checks (i.e. an old resource is going to potentially override a newer resource). Since a lot of our internal state storage has changed fairly substantially, I'm not entirely sure if this is still an issue, but it clearly was early on -- which is why we explicitly made the decision early on to ignore the warning against resource version parsing (it has been a monotonically increasing global 64 bit integer based on etcd's revision
field since day one IIRC).
I'm fine if we can remove the reliance on that, but just be aware that we may get
- (in the best case), additional log spam due to failures in whatever status
Update
calls we're doing - (in the bad case), potential, hopefully temporary, cache corruption that may or may not affect what we're syncing to Consul
- (in the worst case), due to the above, potentially wedging a resource in a reconciliation loop due to affecting upstream dependencies that are stale in cache (i.e. we write in an old gateway somehow, but routes are reconciling that require a gateway status update for such a thing as the number of routes bound to it, the update fails, so the route gets re-reconciled, but never completes until we purge/fix the stale cache entry)
All of this said, while this fix seems fairly straightforward, it can potentially have some pretty large consequences, so I would want to thoroughly vet it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewstucki if I understand this logic correctly, the caller of K8sGatewayClasses.Upsert()
wants to mutate if and only if the cached version of the class they have is up-to-date, right? Instead of doing the resource version comparison manually here, why not pass the resource version through the client to the server as a precondition and allow the server to reject the call if it's out-of-date? Store in g.gatewayClasses[class.class.Name]
if successful and ignore the conflict error if you want your client not to re-try with an updated version from the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting the server do the resourceVersion
comparison for you would free you from having to carefully manage when you do and do not add to your client-side cache, meaning your option 2 & 3 would not occur. Although, one more question - are you using the controller-runtime cache-backed clients, and, if so, what is the additional utility to the in-memory cache in question here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the concern is less about where we do the comparing and more that the meaning of "out of date" with this PR changes from "a lower incrementing resourceVersion integer than the one I currently have" to "a resource version that exactly matches the one I currently have".
The current state deals with a watch trigger series for versions 1->3->2 by rejecting the last version because 2 < 3
, ending up holding resourceVersion 3
.
The proposed state would accept the last version because 2 != 3
, ending up holding resourceVersion 2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I don't think I follow (yet!).
In the current implementation, the reconciler has a cached version of an object, with some resourceVersion=a
. The intent with the current Upsert()
semantic is that this reconciler's client call should only go through if the cache has not seen a more-recent version resourceVersion=b
of the object yet.
If, instead of doing the check in-memory, the reconclier unconditionally made the client call to the sever, and passed that resourceVersion=a
in the precondition on the UpdateStatus()
, the server would ensure that the there had not been any committed changes to the object past a
- so the invariant remains that the UpdateStatus()
succeeds if and only if the client's cached version (a
) is the latest. In fact, the invariant becomes stronger, since you are guaranteed not only that a
is newer than any revision you have happened to observe and store in your in-memory cache, but that it is the newest revision available in the database. Storing the object only on a successful call would allow your cache to stay coherent.
With the current implementation, if you want to make sure your upsert is not clobbering newer data on the server (perhaps from other actors), you still need to pass the precondition to the server to guard against the following case:
- cache observes object with
resourceVersion=a
- reconclier uses cached state to issue an upsert call
- in-memory
resourceVersion
comparison is successful, as no new data exists in the cache - server updates the object to
resourceVersion=b
- reconclier issues a call to the server to UPDATE the resource
- cache observes object with
resourceVersion=b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I mentioned the controller-runtime
cache-backed client, as well, is that you have a full view of the objects your reconclier is operating over in an in-memory cache as part of the controller-runtime
infrastructure. Therefore, a simple pattern to ensure correctness is to pass the resourceVersion
on objects you see in a reconcile.Request
to the server as a precondition. If the server call fails with a conflict, the cached view that your reconclier used was out-of-date. However, you'll see the up-to-date version at some point in the future, at which point the reconciler's client call will succeed and you've made forward progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevekuznetsov The cache/out-of-k8s storage is required for shipping SDS and syncing all of our data to Consul -- having a working set of how things need to be and actually enforcing that when users change things externally (simple example of enforcing intentions updates when a user may add or remove an additional source or destination to a service that we're routing to). RE your comment in your issue about the in-memory store, the original design of this external storage layer was to be pluggable with future implementations, including using durable storage backends, in part to pave the way for working with both on k8s and off-k8s execution modes -- some of that is no longer true though.
RE the current use of resourceVersion
I'm thinking that part of our use of resourceVersion
could be ameliorated by actually doing the external storage sync after delegating to Kubernetes for the version check during our status updates, if the update succeeds, then as you mentioned, we have made forward progress since we were using a fresh entry and we can go ahead and do the rest of the sync. This does make tracking whether or not we were able to sync state to our external sources as part of the user reflected Status
more difficult though, as that now happens after the initial remote update and would require a second pass through the reconciler to actually update the remote sync status.
Just to set expectations, given that there's likely a good amount of work involved with both changing this resourceVersion check and ensuring correctness with both Kubernetes usage and our external state syncing code and the current workload/priorities of our team, I'm not sure that we'll be able to change this behavior in short order. I would like to fix it, but it could take some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely - if you have appetite for it, I am more than happy to take on the work. By no means did I mean to show up and ask you to do more - I am sure you are very busy! If that sounds like a good idea to you, I'd love to take more time to learn the code and perhaps sync with you to understand the problem space a little better. I'm sure there's a solution here that does not require breaking that particular API convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there's a solution here that does not require breaking that particular API convention.
Totally agree that this is tractable and thanks for your desire to contribute! If you do wind up tackling this, let me know if you have any questions or need some feedback.
@stevekuznetsov if you do have an appetite for pursuing that larger change, I will close this PR in favor of that. Let me know! |
Absolutely - I can work on that. |
Changes proposed in this PR:
The Kubernetes API explicitly disallows parsing the metadata.resourceVersion field on an object as an integer.
With this change, we rely on the watcher to only trigger for newer versions than we've previously seen. Instead of parsing to an integer, we do simple equality checks.
How I've tested this PR:
🤖 tests
How I expect reviewers to test this PR:
Checklist: