-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Cluster Scaling and StatefulSet Management #270
base: feat/new-status-check-pt4
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes in this pull request involve modifications to several files, primarily focusing on enhancing the Dockerfile and improving the functionality of the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (7)
cmd/app/commandline.go (1)
51-51
: Enhance flag description and add domain validationWhile the flag is correctly defined, consider:
- Expanding the description to explain how it affects etcd client endpoint construction
- Adding validation for the domain format to prevent configuration issues
- pflag.String("cluster-domain", "cluster.local", "The cluster domain configured in kube-dns") + pflag.String("cluster-domain", "cluster.local", "The cluster domain configured in kube-dns. Used for constructing etcd client endpoints (e.g., service.namespace.svc.<cluster-domain>)")Also, consider adding domain validation in the ParseCmdLine function:
// Add after viper.GetString("cluster-domain") domain := viper.GetString("cluster-domain") if !isDomainValid(domain) { exitUsage(fmt.Errorf("invalid cluster domain format: %s", domain)) } // Helper function func isDomainValid(domain string) bool { // RFC 1123 subdomain regex pattern := `^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` matched, _ := regexp.MatchString(pattern, domain) return matched }internal/controller/observables.go (2)
137-141
: Address the TODO: Implement proper replica determination logic indesiredReplicas
.The function
desiredReplicas
contains a TODO comment indicating the need for a robust logic to determine the correct number of replicas based on the cluster state. Implementing this logic is important for accurate scaling behavior.Would you like assistance in developing the replica determination algorithm?
177-181
: Implement thestatefulSetPodSpecCorrect
function.The
statefulSetPodSpecCorrect
function currently always returnstrue
and includes a TODO comment. Implementing this function is crucial to ensure that the StatefulSet's pod specification matches the desired configuration, facilitating proper reconciliation.Would you like help in implementing this function to compare the desired and existing StatefulSet specifications?
internal/controller/etcdcluster_controller.go (4)
149-149
: Remove outdated TODO comment forscaleUpFromZero
method.The
scaleUpFromZero
function is marked with a TODO comment indicating it needs implementing, but it appears to be implemented later in the code (lines 737–749). Consider removing the outdated TODO comment to avoid confusion.- return ctrl.Result{}, r.scaleUpFromZero(ctx, state) // TODO: needs implementing + return ctrl.Result{}, r.scaleUpFromZero(ctx, state)
Line range hint
634-682
: Prevent potential panic due to premature channel closure in goroutines.The error channel
c
is closed withdefer close(c)
at the beginning of the function. If any goroutine attempts to send toc
after it has been closed, it will cause a panic.Move
close(c)
to after all goroutines have completed to ensure they do not send to a closed channel:func (r *EtcdClusterReconciler) ensureUnconditionalObjects(ctx context.Context, state *observables) error { - c := make(chan error) - defer close(c) ctx, cancel := context.WithCancel(ctx) defer cancel() + c := make(chan error) var wg sync.WaitGroup wg.Add(concurrentOperations) wrapWithMsg := func(err error, msg string) error { if err != nil { return fmt.Errorf(msg+": %w", err) } return nil } // Goroutine definitions... for i := 0; i < concurrentOperations; i++ { if err := <-c; err != nil { cancel() wg.Wait() + close(c) return err } } + wg.Wait() + close(c) return nil }
736-749
: Implement thepatchOrCreateObject
method.The
patchOrCreateObject
method is marked with a TODO and currently lacks a complete implementation. This function is critical for applying or creating Kubernetes objects.Would you like assistance in implementing
patchOrCreateObject
? Proper implementation should handle both patching existing objects and creating new ones if they do not exist.
149-149
: Remove redundant code comments and keep code clean.There are several comments like
// needs implementing
that may no longer be relevant. Keeping code comments up-to-date helps maintain code readability.Review and remove any outdated or unnecessary comments to keep the codebase clean.
🛑 Comments failed to post (8)
internal/controller/observables.go (4)
184-185:
⚠️ Potential issueAdd nil check for
statefulSet.Spec.Replicas
instatefulSetReady
.Since
o.statefulSet.Spec.Replicas
is a pointer, it may benil
. Add a nil check to avoid potential nil pointer dereferences when comparing replica counts.Apply this diff to add the nil check:
func (o *observables) statefulSetReady() bool { + if o.statefulSet.Spec.Replicas == nil { + return false + } return o.statefulSet.Status.ReadyReplicas == *o.statefulSet.Spec.Replicas }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.func (o *observables) statefulSetReady() bool { if o.statefulSet.Spec.Replicas == nil { return false } return o.statefulSet.Status.ReadyReplicas == *o.statefulSet.Spec.Replicas }
83-88:
⚠️ Potential issueFix off-by-one error in
memberListsAllEqual
function.The loop currently starts from index
0
, comparingmemberLists[0]
to itself, which is unnecessary. To correctly compare each member list to the first one, the loop should start fromi = 1
.Apply this diff to fix the off-by-one error:
func (o *observables) memberListsAllEqual() bool { // ... - for i := range memberLists { + for i := 1; i < len(memberLists); i++ { if !memberLists[0].Equals(memberLists[i]) { return false } } return true }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.for i := 1; i < len(memberLists); i++ { if !memberLists[0].Equals(memberLists[i]) { return false } } return true
104-117:
⚠️ Potential issueAdd checks to prevent index out of range errors in
pvcMaxIndex
.The code assumes that splitting the PVC name by
"-"
will result in a slice with at least one element. If the PVC name does not contain"-"
, accessingtokens[len(tokens)-1]
could cause an index out of range error. Add a check to ensurelen(tokens) > 0
before accessing the last element.Apply this diff to add the necessary checks:
func (o *observables) pvcMaxIndex() (max int) { max = -1 for i := range o.pvcs { tokens := strings.Split(o.pvcs[i].Name, "-") + if len(tokens) == 0 { + continue + } index, err := strconv.Atoi(tokens[len(tokens)-1]) if err != nil { continue } if index > max { max = index } } return max }Committable suggestion skipped: line range outside the PR's diff.
119-135:
⚠️ Potential issuePrevent index out of range errors in
endpointMaxIndex
.The function assumes that the endpoint string is formatted to allow safe access to
tokens[len(tokens)-2]
and subsequent indices. If the endpoint format is unexpected, this could result in index out of range errors. Add checks to ensure the slices have sufficient length before accessing their elements.Apply this diff to add the necessary checks:
func (o *observables) endpointMaxIndex() (max int) { for i := range o.endpoints { tokens := strings.Split(o.endpoints[i], ":") if len(tokens) < 2 { continue } tokens = strings.Split(tokens[len(tokens)-2], "-") + if len(tokens) == 0 { + continue + } index, err := strconv.Atoi(tokens[len(tokens)-1]) if err != nil { continue } if index > max { max = index } } return max }Committable suggestion skipped: line range outside the PR's diff.
internal/controller/etcdcluster_controller.go (4)
130-134:
⚠️ Potential issueImplement missing methods
state.statefulSetPodSpecCorrect()
andfactory.TemplateStatefulSet()
.The code references
state.statefulSetPodSpecCorrect()
andfactory.TemplateStatefulSet()
, which are marked with TODO comments indicating they need to be implemented. Without these implementations, the reconciliation logic may not function as intended.Please implement these methods to ensure that the stateful set pod specifications are correctly validated and templated.
729-733:
⚠️ Potential issueAddress incomplete
createClusterFromScratch
method containing a panic.The
createClusterFromScratch
method ends withpanic("not yet implemented")
, which will crash the controller if this code path is executed.Consider implementing this method or adding proper error handling to prevent unexpected panics. If the method is not ready for use, ensure that it's not called until fully implemented.
784-784:
⚠️ Potential issueAddress potential nil panic detected by static analysis.
Static analysis indicates a potential nil panic at line 784. Ensure that
memberListResp
is not nil before accessing itsMembers
field.Add a check after
MemberList
to verifymemberListResp
is not nil:memberListResp, err := clusterClient.MemberList(ctx) if err != nil { return fmt.Errorf("failed to get member list: %w", err) } + if memberListResp == nil { + return fmt.Errorf("failed to get member list: response is nil") + }Alternatively, investigate why
memberListResp
could be nil even whenerr
is nil.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.memberListResp, err := clusterClient.MemberList(ctx) if err != nil { return fmt.Errorf("failed to get member list: %w", err) } if memberListResp == nil { return fmt.Errorf("failed to get member list: response is nil") }
🧰 Tools
🪛 GitHub Check: nilaway-lint
[failure] 784-784:
�[31merror: �[0mPotential nil panic detected. Observed nil flow from source to dereference point:
777-786:
⚠️ Potential issueAdd nil check for
clusterClient
to prevent nil pointer dereference.In the
promoteLearners
function,clusterClient
may benil
even iferr
isnil
after callingfactory.NewEtcdClientSet
. UsingclusterClient.MemberList(ctx)
without checking could cause a nil pointer dereference.Add a nil check for
clusterClient
before using it:clusterClient, _, err := factory.NewEtcdClientSet(ctx, state.instance, r.Client) if err != nil { return fmt.Errorf("failed to create etcd client: %w", err) } + if clusterClient == nil { + return fmt.Errorf("failed to create etcd client: client is nil") + } defer clusterClient.Close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.clusterClient, _, err := factory.NewEtcdClientSet(ctx, state.instance, r.Client) if err != nil { return fmt.Errorf("failed to create etcd client: %w", err) } if clusterClient == nil { return fmt.Errorf("failed to create etcd client: client is nil") } defer clusterClient.Close() // Retrieve the current member list memberListResp, err := clusterClient.MemberList(ctx) if err != nil { return fmt.Errorf("failed to get member list: %w", err)
🧰 Tools
🪛 GitHub Check: nilaway-lint
[failure] 784-784:
�[31merror: �[0mPotential nil panic detected. Observed nil flow from source to dereference point:
a5725af
to
0987b97
Compare
Summary
This PR implements key functions in the
EtcdClusterReconciler
for the etcd-operator controller. These functions add critical functionality for managing cluster scaling, cluster state ConfigMap updates, StatefulSet management, and learner promotion within an etcd cluster.Key Changes
Cluster Scaling from Zero:
scaleUpFromZero
to handle the case where the etcd cluster is scaled up from zero replicas.Cluster State ConfigMap Management:
createOrUpdateClusterStateConfigMap
function added to manage the cluster state ConfigMap, updating it based on the current state of the cluster.StatefulSet Creation and Update:
createOrUpdateStatefulSet
function added to manage StatefulSet creation and updates, ensuring consistency between the desired and actual StatefulSet configurations.Learner Promotion:
promoteLearners
to promote any learner members to full voting members in the etcd cluster.Implementation Details
Reconcile
to pass thestate
parameter, ensuring alignment with updated function definitions.Testing
Future Improvements
This PR significantly enhances the functionality of the etcd-operator by providing robust cluster scaling and management capabilities.
Summary by CodeRabbit
New Features
--cluster-domain
for specifying the cluster domain, defaulting to"cluster.local"
.Improvements
observables
struct.Bug Fixes
Chores