Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent member.ready & readyReplicas #614

Open
OneCricketeer opened this issue Aug 24, 2024 · 2 comments
Open

Inconsistent member.ready & readyReplicas #614

OneCricketeer opened this issue Aug 24, 2024 · 2 comments

Comments

@OneCricketeer
Copy link

OneCricketeer commented Aug 24, 2024

Description

The post-install hook is failing to check the Ready status, and we notice the following output in the status, yet the "unready" pod seems to have very similar log output to the "ready" ones. Restarting the unready pod has not helped.

  • How is readyReplicas counted? Should it not be len(members.ready)?
  • How to debug further?
  members:
    ready:
      - app-zookeeper-3
      - app-zookeeper-2
      - app-zookeeper-0
      - app-zookeeper-1
    unready:
      - app-zookeeper-4
  readyReplicas: 5
  replicas: 5

Importance

must-have

Location

(Where is the piece of code, package, or document affected by this issue?)

Suggestions for an improvement

readyReplicas should respect len(members.ready)

@OneCricketeer
Copy link
Author

Unready pod logs

2024-08-24 02:40:09,063 [myid:5] - INFO  [NIOWorkerThread-5:NIOServerCnxn@518] - Processing ruok command from /127.0.0.1:38160
2024-08-24 02:40:09,069 [myid:5] - INFO  [NIOWorkerThread-4:NIOServerCnxn@518] - Processing ruok command from /127.0.0.1:38162
2024-08-24 02:40:09,568 [myid:5] - WARN  [NIOWorkerThread-3:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:10,378 [myid:5] - WARN  [NIOWorkerThread-6:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:11,256 [myid:5] - WARN  [NIOWorkerThread-9:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:12,902 [myid:5] - WARN  [NIOWorkerThread-7:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:13,711 [myid:5] - WARN  [NIOWorkerThread-10:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:14,589 [myid:5] - WARN  [NIOWorkerThread-8:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:14,592 [myid:5] - WARN  [NIOWorkerThread-11:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:15,640 [myid:5] - WARN  [NIOWorkerThread-12:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:16,235 [myid:5] - WARN  [NIOWorkerThread-13:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:17,046 [myid:5] - WARN  [NIOWorkerThread-16:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:17,923 [myid:5] - WARN  [NIOWorkerThread-15:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:18,636 [myid:5] - WARN  [NIOWorkerThread-14:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:19,063 [myid:5] - INFO  [NIOWorkerThread-17:NIOServerCnxn@518] - Processing ruok command from /127.0.0.1:59816
2024-08-24 02:40:19,065 [myid:5] - INFO  [NIOWorkerThread-18:NIOServerCnxn@518] - Processing ruok command from /127.0.0.1:59822
2024-08-24 02:40:19,568 [myid:5] - WARN  [NIOWorkerThread-19:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:20,378 [myid:5] - WARN  [NIOWorkerThread-21:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:21,256 [myid:5] - WARN  [NIOWorkerThread-20:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:22,902 [myid:5] - WARN  [NIOWorkerThread-22:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:23,712 [myid:5] - WARN  [NIOWorkerThread-25:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:24,590 [myid:5] - WARN  [NIOWorkerThread-24:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:26,235 [myid:5] - WARN  [NIOWorkerThread-27:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:27,046 [myid:5] - WARN  [NIOWorkerThread-23:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:27,923 [myid:5] - WARN  [NIOWorkerThread-26:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:28,973 [myid:5] - WARN  [NIOWorkerThread-28:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2024-08-24 02:40:29,063 [myid:5] - INFO  [NIOWorkerThread-30:NIOServerCnxn@518] - Processing ruok command from /127.0.0.1:36344
2024-08-24 02:40:29,066 [myid:5] - INFO  [NIOWorkerThread-29:NIOServerCnxn@518] - Processing ruok command from /127.0.0.1:36356

Error seen in the operator, otherwise says it is connected

"error":"Error creating cluster metadata path /zookeeper-operator/app-zookeeper, Error creating parent zkNode: /zookeeper-operator: zk: node already exists","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:234"}

Error from post-install hook

Checking for ready ZK replicas
I0824 02:57:12.548963 [request.go:665] Waited for 1.163382298s due to client-side throttling, not priority and fairness, request: GET:https://192.168.192.1:443/apis/rbac.authorization.k8s.io/v1?timeout=32s
ZK replicas not ready

@OneCricketeer
Copy link
Author

Container Statuses for zookeeper-4 - Both say ready: true

status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2024-08-23T15:40:33Z'
      status: 'True'
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: '2024-08-23T15:41:09Z'
      status: 'True'
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: '2024-08-23T15:41:09Z'
      status: 'True'
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: '2024-08-23T15:40:33Z'
      status: 'True'
      type: PodScheduled
  containerStatuses:
    - containerID: 'cri-o://0cb757458dbe524f5c75d7c63f37c2ba9baacf6503e579016ee1e7f37e419aa5'
      image: >-
        redacted
      imageID: >-
        redacted
      lastState: {}
      name: fluent-bit
      ready: true
      restartCount: 0
      started: true
      state:
        running:
          startedAt: '2024-08-23T15:40:39Z'
    - containerID: 'cri-o://1de7506155cfb27de7cad0244c4ead699fdff2f1d6ef1c258da747a35e6835c2'
      image: 'docker.io/redacted/zookeeper:3.5.7'
      imageID: >-
        docker.io/redacted/zookeeper@sha256:f032bd83682738f32757bf1f365ed9de8ee7aa41015a010083b5c3074e5f2659
      lastState: {}
      name: zookeeper
      ready: true
      restartCount: 0
      started: true
      state:
        running:
          startedAt: '2024-08-23T15:40:39Z'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant