Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync stop working on k8s api errors, liveness check needed? #338

Open
fctb opened this issue Mar 23, 2024 · 2 comments
Open

sync stop working on k8s api errors, liveness check needed? #338

fctb opened this issue Mar 23, 2024 · 2 comments

Comments

@fctb
Copy link

fctb commented Mar 23, 2024

When we get k8s api errors the sync stop working silently
like
calling kubernetes: (410) Reason: Expired: The resourceVersion for the provided watch is too old.
After on debuglevel you only see the msg:
Performing watch-based sync on secret resources: {'label_selector': 'grafana_dashboard_v10=1', 'timeout_seconds': '300', '_request_timeout': '330'}
the msg for configmap stops:
Performing watch-based sync on configmap resources: {'label_selector': 'grafana_dashboard_v10=1', 'timeout_seconds': '300', '_request_timeout': '330'}
as well as other debug messages related to configmap. We only have matching configmaps in this cluster.

It looks like that the process for configmap are dead. The process itself is still there.

Make it sense to introduce a liveness check (dead man switch like), that on problems the hole container get restartet?

container yaml:

  - env:
    - name: REQ_TIMEOUT
      value: "60"
    - name: IGNORE_ALREADY_PROCESSED
      value: "true"
    - name: METHOD
      value: WATCH
    - name: LABEL
      value: grafana_dashboard_v10
    - name: LABEL_VALUE
      value: "1"
    - name: LOG_LEVEL
      value: debug
    - name: FOLDER
      value: /tmp/dashboards
    - name: RESOURCE
      value: both
    - name: NAMESPACE
      value: ALL
    - name: REQ_USERNAME
      valueFrom:
        secretKeyRef:
          key: admin-user
          name: grafana-admin-password
    - name: REQ_PASSWORD
      valueFrom:
        secretKeyRef:
          key: admin-password
          name: grafana-admin-password
    - name: REQ_URL
      value: http://localhost:3000/api/admin/provisioning/dashboards/reload
    - name: REQ_METHOD
      value: POST
    - name: WATCH_SERVER_TIMEOUT
      value: "300"
    - name: WATCH_CLIENT_TIMEOUT
      value: "330"
    image: quay.io/kiwigrid/k8s-sidecar:1.26.1
    imagePullPolicy: IfNotPresent
    name: grafana-sc-dashboard
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /tmp/dashboards
      name: sc-dashboard-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-6sd9s
      readOnly: true
Copy link

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

@github-actions github-actions bot added the stale close issues and PRs after 60 days of inactivity label Sep 28, 2024
@lindhe
Copy link

lindhe commented Dec 13, 2024

I'm also running into issues where the process seems to stop responding but neither the process nor the container get killed. I don't know how to probe this process though… 😕

@github-actions github-actions bot removed the stale close issues and PRs after 60 days of inactivity label Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants