You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our team removed all resources for a project called platform-toy. The imageupdateautomation object and the associated gitrepository objects were removed from the cluster and gitrepository flux is connected to.
Upon removing all resources related to the project we began receiving alerts that the imageupdateautomation object was not able to reconcile despite the object not existing.
Further investigation revealed that the metric:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}"
exposed by the image-automation-controller was set to true and the metric:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} "
was set to false after the object was removed from the cluster.
Our alerting is configured to alert us if:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}"
is set to 1
It seems that after an imageupdateautomation object is removed from the cluster, the image-automation-controller does not correctly identify that the imageupdateautomation object has been deleted and does not correctly update its metrics.
Upon restarting the image-automation-controller, the platform-toy imageupdateautomation object is not seen by the controller anymore and the alerting stops since the metric is no longer advertised.
Steps to reproduce
Create the target namespace: platform-toy
Deploy an imageupdateautomation object with name platform-toy
Delete the imageupdateautomation object and namespace from the cluster
Check the "/metrics" endpoint of the image-automation-controller to see what metrics are being exposed
The image-automation-controller will expose the metric:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}" 1
Expected behavior
The image-automation-controller should expose the following metric:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 1
Or the image-automation-controller should remove the resource from it's metrics endpoint entirely.
Screenshots and recordings
The following is the console output before the imageupdateautomation object is removed from the cluster:
$ kubetl get imageupdateautomation -n platform-toy
NAME LAST RUN
platform-toy
The metrics exposed by the image-automation-controller are the following:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"} 1
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="True",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Unknown",type="Ready"} 0
The following console output is after the resources have been deleted:
$ kubectl get imageupdateautomation -n platform-toy
No resources found in platform-toy namespace.
The metrics exposed by the image-automation-controller are the following after deleting the resource:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"} 1
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="True",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Unknown",type="Ready"} 0
OS / Distro
VMware Photon OS/Linux
Flux version
flux version 0.38.2
Flux check
N/A
Git provider
gitlab
Container Registry provider
artifactory
Additional context
No response
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Describe the bug
Our team removed all resources for a project called platform-toy. The imageupdateautomation object and the associated gitrepository objects were removed from the cluster and gitrepository flux is connected to.
Upon removing all resources related to the project we began receiving alerts that the imageupdateautomation object was not able to reconcile despite the object not existing.
Further investigation revealed that the metric:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}"
exposed by the image-automation-controller was set to true and the metric:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} "
was set to false after the object was removed from the cluster.
Our alerting is configured to alert us if:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}"
is set to 1
It seems that after an imageupdateautomation object is removed from the cluster, the image-automation-controller does not correctly identify that the imageupdateautomation object has been deleted and does not correctly update its metrics.
Upon restarting the image-automation-controller, the platform-toy imageupdateautomation object is not seen by the controller anymore and the alerting stops since the metric is no longer advertised.
Steps to reproduce
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}" 1
Expected behavior
The image-automation-controller should expose the following metric:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 1
Or the image-automation-controller should remove the resource from it's metrics endpoint entirely.
Screenshots and recordings
The following is the console output before the imageupdateautomation object is removed from the cluster:
$ kubetl get imageupdateautomation -n platform-toy
NAME LAST RUN
platform-toy
The metrics exposed by the image-automation-controller are the following:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"} 1
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="True",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Unknown",type="Ready"} 0
The following console output is after the resources have been deleted:
$ kubectl get imageupdateautomation -n platform-toy
No resources found in platform-toy namespace.
The metrics exposed by the image-automation-controller are the following after deleting the resource:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"} 1
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="True",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Unknown",type="Ready"} 0
OS / Distro
VMware Photon OS/Linux
Flux version
flux version 0.38.2
Flux check
N/A
Git provider
gitlab
Container Registry provider
artifactory
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: