annotate all pods under our domain, not just those in the STS #120

tekicode · 2023-09-18T16:22:42Z

annotatePods() only affects pods which are running in RouterIngestor mode: Pods which declare a hashring file AND are also a member of that hashring. It works by iterating through every expected pod in the STS. This has a couple of problems:

If the size of the STS is larger than the number of running pods (failed pods) there will be failures to update those pods. This is a nuisance error.
If we're operating in a split setup (Router separately from Ingestor) the controller does not know to annotate the Router pods, it only annotates the discovered STS pods. This means the Routers are left to reload their CM on their fallback timeout, instead of immediately.
Only StatefulSets are affected, if using Routers as Deployments instead of STS, the controller will not annotate them with updates.

This change will discover any pods with the config.StatefulSetLabel value default: controller.receive.thanos.io=thanos-receive-controller LABEL and add a controller.receive.thanos.io/lastControllerUpdate annotation to each pod when the --annotate-pods-on-change flag is used and a new CM is generated.

It is not necessary to specify the controller.receive.thanos.io/hashring label when using this feature on pods which are not part of the hashring.

tekicode · 2023-09-18T16:57:07Z

The failures in the checks/Generate look unrelated jsonnet updates?

tekicode · 2023-09-19T17:51:37Z

@matej-g Could you review? Thank you!

saswatamcode · 2023-09-20T05:45:28Z

cc: @philipgough

squat · 2023-09-22T09:50:12Z

@tekicode if we go down this road, then we must change the configuration flag name from --statefulset-label [0] to something different, e.g. --label because it is misleading to users to suggest that we only use this to select statefulset pods. In this case, I would suggest we continue to accept --statefulset-label flag and deprecate it in favor of a new one that takes preference over it. This will allow existing users of the thanos-receive-controller to upgrade gracefully.

Separately, I am realizing now that the description of the --annotate-pods flag is also misleading. It indicates that we will annotate the pod with the hash of the latest configuration, but that is not true: we annotate the pods with a timestamp.

[0] https://github.com/observatorium/thanos-receive-controller/blob/master/main.go#L81

This addresses the comment in: #120 (comment). Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

philipgough · 2023-09-22T13:39:07Z

@squat aside from agreeing with all your points I actually think the --annotate-pods flag is risky. It uses the kube client to do an update but it doesn't handle failures very well. Take an example where you might have a large hashring, you go to update a Pod with the annotation and you get a conflict. It just moves on and now you can have different replicas with different understandings of how the hashring looks causing issues with replication.

tekicode · 2023-09-22T23:02:31Z

@squat aside from agreeing with all your points I actually think the --annotate-pods flag is risky. It uses the kube client to do an update but it doesn't handle failures very well. Take an example where you might have a large hashring, you go to update a Pod with the annotation and you get a conflict. It just moves on and now you can have different replicas with different understandings of how the hashring looks causing issues with replication.

I'm not sure how to address this comment. Without using annotate-pods, kubelet will update the configmap mounted on each container at most once every minute. It's not particularly at the top of the minute, just one minute timeout. The reason for the annotate-pods is to speed that up. So, if it fails, the fallback would be the default behavior (which is not ideal, I'll admit)

I already see annotate-pods failures with the current code; so my changes here do not address the root behavior, only make it more useful.

tekicode · 2023-09-22T23:12:38Z

@tekicode if we go down this road, then we must change the configuration flag name from --statefulset-label [0] to something different, e.g. --label because it is misleading to users to suggest that we only use this to select statefulset pods. In this case, I would suggest we continue to accept --statefulset-label flag and deprecate it in favor of a new one that takes preference over it. This will allow existing users of the thanos-receive-controller to upgrade gracefully.

Separately, I am realizing now that the description of the --annotate-pods flag is also misleading. It indicates that we will annotate the pod with the hash of the latest configuration, but that is not true: we annotate the pods with a timestamp.

[0] https://github.com/observatorium/thanos-receive-controller/blob/master/main.go#L81

Hi thanks for the review.

Slightly concerned that by changing the flag as you suggest, we would indicate to users that the controller is capable of allowing non statefulset pods into the hashring; which it is not and would not be recommended.

The intention of my change here is for those of us using a split setup, with Statefulsets as endpoint targets, and Deployments working as autoscaling routers. These deployments are never in the endpoints list.

Either way, it sounds like something that is more solved with documentation; would we rather docs say "Do not use --label to select Deployments for the hashring, as --label is only meant for statefulset selection" or "--annotate-pods will annotate ANY pod with this label, not just statefulset pods"

IMO, I believe users would be more confused by trying to use --label and believing it would add Deployments to the endpoints list.

Naming things is hard, it's more important we see the controller do the functionality than picking a name. Let me know how we can move forward!

philipgough · 2023-09-23T08:32:41Z

I'm not sure how to address this comment. Without using annotate-pods, kubelet will update the configmap mounted on each container at most once every minute. It's not particularly at the top of the minute, just one minute timeout. The reason for the annotate-pods is to speed that up. So, if it fails, the fallback would be the default behavior (which is not ideal, I'll admit)

I already see annotate-pods failures with the current code; so my changes here do not address the root behavior, only make it more useful.

You are not sure how to address it in what way? I'm aware of the reason for the flag to exist in the first place. I was merely commenting within a conversation where we were discussing issues with the existing approach.

From a technical perspective, it can likely be addressed in some way by using an event driven approach and re-enqueuing Pods on failure to update, but that would make the controller significantly more complicated too.

squat · 2023-09-23T12:33:09Z

re-enqueuing Pods on failure to update

That would mean using controller-runtime tooling like the queue and caches etc. It would for sure make the controller more robust, turning it into a fully-fledged operator pretty much. Not a bad thing! It's just a trade-off: more work and complexity vs robustness and functionality.

squat · 2023-09-23T12:37:16Z

Another possible approach would be to eschew pod annotation entirely and instead instruct users to install third party tools like https://github.com/wave-k8s/wave or https://github.com/stakater/Reloader of they want "instantaneous" config updates.

tekicode · 2023-09-23T17:27:39Z

@squat @philipgough I unfortunately don't have the time to invest in making the controller more robust in the sync of the hashring file. Can we table that for another PR or issue?

@philipgough Addressing --statefulset-label vs --label I am going through right now to deprecate statefulset-label. What should the verbiage be for the use of the new flag?

The label Pods must have to be watched by the controller

The label StatefulSets must have to be watched by the controller

The label Workloads must have to be watched by the controller.

Each of these has a different implication for the end user in how the controller can function and I see issues with the language of all three.

Please let me know if this is an impasse. We've found use of this controller in our pre-production environments and it's Pretty Good(tm) just a few things here and there to make it better.

squat · 2023-09-23T18:19:28Z

@tekicode i agree with doing the Bigger Refactor in a different PR :)
as for verbiage, I like The label workloads must have to be watched by the controller. (with a lowercase w).

squat · 2023-09-23T18:22:26Z

Also, I fixed the generated files in #121 so that CI passes. Please rebase your PR on main :)

tekicode · 2023-10-03T03:48:24Z

@tekicode i agree with doing the Bigger Refactor in a different PR :) as for verbiage, I like The label workloads must have to be watched by the controller. (with a lowercase w).

Thank you for the review @squat Addressed this feedback with 519d41c

Please let me know what you think!

tekicode · 2023-10-03T18:02:38Z

Changes in 45f615c are due to a limit on statements allowed in main() according to the lint rules.

tekicode · 2023-10-18T23:08:18Z

@squat @philipgough any chance either of you could review? ty!

tekicode · 2023-10-24T19:50:55Z

@saswatamcode available to review? ty!

philipgough · 2023-10-24T20:03:27Z

Sorry, I will take a look at this one asap

philipgough · 2023-10-25T14:28:48Z

main.go

-					annotations = make(map[string]string)
-				}
+	// Select pods that have a controllerLabel matching ours.
+	podList, err := c.klient.CoreV1().Pods(c.options.namespace).List(ctx,


is this accurate? what if the controller is watching and acting on multiple StatefulSets? Won't it annotate Pods for each of those workloads even in the absence of a diff of their hashring which may be confusing?

I don't believe the controller differentiates when it performs hashring updates as to which pods it touches. The sync code previously updated all pods on a hashring change, regardless of which hashring.

philipgough · 2023-10-25T14:29:23Z

main.go

+		})
+	if err != nil {
+		level.Error(c.logger).Log("msg", "failed to list pods belonging to controller", "err", err)
+	} else {


do we need this else block? assume we can just return if we fail to list Pods?

addressed with be74ca3

philipgough · 2023-10-25T14:30:25Z

main.go


-				annotations["annotationTimestamp"] = fmt.Sprintf("%d", time.Now().Unix())
-				pod.SetAnnotations(annotations)
+			annotations[annotationKey] = fmt.Sprintf("%d", time.Now().Unix())


instead of iterating over the Pods it might make more sense to calculate the timestamp in advance and set all Pods to the same value as when the hashring changed?

addressed with 0861153

philipgough · 2023-10-25T14:30:48Z

main.go

 			}
-		}
-	}
+		} // for range podList


can you remove this commented code?

Done in 0861153 Thank you so much for the review!

tekicode · 2023-11-04T09:56:22Z

@philipgough Good to merge?

philipgough

lgtm

squat added a commit that referenced this pull request Sep 22, 2023

examples: regenerate to fix CI

defcbe9

This addresses the comment in: #120 (comment). Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

squat mentioned this pull request Sep 22, 2023

examples: regenerate to fix CI #121

Merged

tekicode added 2 commits September 23, 2023 15:12

PR120 rebase to master

a730e85

pr feedback, deprecate --statefulset-label

519d41c

tekicode force-pushed the annotate-all-pods-on-change branch from 629a760 to 519d41c Compare October 3, 2023 03:45

tekicode force-pushed the annotate-all-pods-on-change branch from d59dff9 to af45562 Compare October 3, 2023 18:00

refactor flags into parseFlags() due to lint limit

45f615c

tekicode force-pushed the annotate-all-pods-on-change branch from af45562 to 45f615c Compare October 3, 2023 18:18

make lint and github actions lint do not align

49b9a92

philipgough reviewed Oct 25, 2023

View reviewed changes

tekicode added 2 commits October 25, 2023 18:12

collapse if else and reduce indentation

be74ca3

remove comments, set updateTime once for all pods

0861153

philipgough approved these changes Nov 6, 2023

View reviewed changes

philipgough merged commit c57219e into observatorium:main Nov 6, 2023
4 checks passed

tekicode deleted the annotate-all-pods-on-change branch November 8, 2023 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotate all pods under our domain, not just those in the STS #120

annotate all pods under our domain, not just those in the STS #120

tekicode commented Sep 18, 2023 •

edited

Loading

tekicode commented Sep 18, 2023

tekicode commented Sep 19, 2023

saswatamcode commented Sep 20, 2023

squat commented Sep 22, 2023

philipgough commented Sep 22, 2023

tekicode commented Sep 22, 2023

tekicode commented Sep 22, 2023

philipgough commented Sep 23, 2023

squat commented Sep 23, 2023

squat commented Sep 23, 2023

tekicode commented Sep 23, 2023

squat commented Sep 23, 2023

squat commented Sep 23, 2023

tekicode commented Oct 3, 2023

tekicode commented Oct 3, 2023 •

edited

Loading

tekicode commented Oct 18, 2023

tekicode commented Oct 24, 2023

philipgough commented Oct 24, 2023

philipgough Oct 25, 2023

tekicode Oct 26, 2023

philipgough Oct 25, 2023

tekicode Oct 26, 2023

philipgough Oct 25, 2023

tekicode Oct 26, 2023

philipgough Oct 25, 2023

tekicode Oct 26, 2023

tekicode commented Nov 4, 2023

philipgough left a comment

annotate all pods under our domain, not just those in the STS #120

annotate all pods under our domain, not just those in the STS #120

Conversation

tekicode commented Sep 18, 2023 • edited Loading

tekicode commented Sep 18, 2023

tekicode commented Sep 19, 2023

saswatamcode commented Sep 20, 2023

squat commented Sep 22, 2023

philipgough commented Sep 22, 2023

tekicode commented Sep 22, 2023

tekicode commented Sep 22, 2023

philipgough commented Sep 23, 2023

squat commented Sep 23, 2023

squat commented Sep 23, 2023

tekicode commented Sep 23, 2023

squat commented Sep 23, 2023

squat commented Sep 23, 2023

tekicode commented Oct 3, 2023

tekicode commented Oct 3, 2023 • edited Loading

tekicode commented Oct 18, 2023

tekicode commented Oct 24, 2023

philipgough commented Oct 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekicode commented Nov 4, 2023

philipgough left a comment

Choose a reason for hiding this comment

tekicode commented Sep 18, 2023 •

edited

Loading

tekicode commented Oct 3, 2023 •

edited

Loading