Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add ignore drift definition #313

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

oliverbaehler
Copy link
Contributor

@oliverbaehler oliverbaehler commented Jun 12, 2024

I'm not yet 100% sure if that how it should be implement. What i understood:

Signed-off-by: Oliver Bähler <oliverbaehler@hotmail.com>
@gianlucam76
Copy link
Member

The field is good. It is just I am not sure yet how to implement this.

Currently this is how Sveltos detects a drift:

  1. Resources are deployed to managed cluster(s)
  2. Once resource is deployed, drift-detection-manager in each managed cluster evaluates the hash and starts a watcher
  3. When watcher is notified of a change (for a resource Sveltos is tracking for configuration drift), drift-detection-manager re-evaluates the hash. If hash is different (or resource has been deleted) a configuration drift is reported to the managed cluster

This is the method that evaluates the hash

I guess knowing the path we can adjust above method to exclude it when evaluating hash

@oliverbaehler
Copy link
Contributor Author

@gianlucam76 or we have to not use hashes probably? I was wondering anyway: why did you opt in for creating a dedicated drift detaction manager? I just know from experience deploying additional workload from operators is a bit hacky at times, especially in corp. environment where you might need a dedicated proxy registry and such stuff.

@gianlucam76
Copy link
Member

gianlucam76 commented Jun 13, 2024

Thank you.

I developed the drift-detection-manager cause I wanted to:

  1. immediately detect a drift
  2. I don't like the idea of having the management cluster pull and check for drift on a timer based

Point 2 is actually what makes Sveltos perform better than many other solutions when number of application scales and number of managed cluster scales.

The manager starts watchers and reacts only to change. So if nothing happens, drift detection manager (and the management cluster) does nothing.

If you have any other alternative I am happy to hear it. Maybe we can have more than one approach so to cover more use cases.

@oliverbaehler
Copy link
Contributor Author

I see your point. It's just that sveltos has a lot of repositories and I am wondering how you manage all of these at once.

I would have probably just created a new controller for each cluster having drift-detection enabled. New controller in the sense of spawning a new controller in the runtime of the addon-controller, instead of creating a dedicated deployment. Obv. the current way might be more scalable, can't really tell yet.

@gianlucam76
Copy link
Member

Thanks @oliverbaehler. I see your point. That was an option I considered but took the other approach because I did not want to overload the addon-controller. Also I wanted to isolate faults to smaller domain. Right now, if drift has a bug, only drift is affected but addon-controller keeps doing is job for instance.

And I also considered the clusterAPI approach (one big repo for all) instead of N. But I got used to N repos while at Tigera and (I am biased of course as there is no right or wrong) I feel N repos allows me to work and deliver at a much higher speed.

With respect to managing all of them, I took sometime to design it right. But now I have a combination of go generator (which makes sure addon-controller deploys current drift-detection-manager) and scripts (at release level). So as developer I don't have to worry that there are N different repos.

@oliverbaehler
Copy link
Contributor Author

@gianlucam76 Thanks, btw. i am not trying to argue over your structure etc. I see that your domain architecture makes total sense. Just as a new startert it's a bit difficult to know where to start, but that's something different.

However I have created a dedicated issue regarding the workload configuration projectsveltos/addon-controller#594, since it's not really related to the PR proposed here.

Regarding the actual drift implementation: Do we also have to change the way how the hash is calculated on the addon-controller? The other function should be easy to change to drop the specified paths

@gianlucam76
Copy link
Member

I got that @oliverbaehler and I don't mind talking about it at all. It actually helps me re-validate it or spot issues :-)
And i agree unfortunately there is too many things to look at when starting. Please keep asking and providing feedbacks. I really appreciate and welcome those.

Thanks for filing this

If you have time to change the drift detection logic to onboard ignore field paths, we can merge that pr. I am currently moving to v1beta1 and implementing a conversion webhook so that we can simply change the type of clusterSelector (string in v1alpha1 and the type you introduced in v1beta1).

In the addon controller,when ignore paths are set, if the object already exists in the managed cluster we should not override it. While this might be easy for yaml, I am not sure yet how we can have the desired effect with helm.

@gianlucam76
Copy link
Member

@kprav33n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants