You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Better support for monitoring TaskRuns and PipelineRuns
How
A new abstraction that allows you to customize how the metrics will be exported. This way to can control the cardinality without tampering with our ability to get highly specific metrics.
Use case
Currently, Tekton Pipelines' metrics can be configured by changing config-observability. However, the options that we have are very limited. You can configure to use metrics.taskrun.level: task, and then you can't aggregate further than that. And if you use metrics.taskrun.level: taskrun, that's not recommended since can lead to unbounded cardinality.
Just to be more specific, let me share some examples of use cases:
Environment monitoring
Let's say you have a task that receives environment as a parameter (prod, staging, qa, etc.). You might want to analyze: is there an environment that is slower to run this task? How much slower? Is qa environment being used at all?
Task optimization
You might just have merged a test fix, and want to know: How much that fix improved the duration of the integration tests? Did it improve the error rate of that task using similar parameters? etc.
Anomaly Detection
Is the CICD platform executing tasks normally? Are there too many tasks coming from a single repository? Are all tasks failing?
The text was updated successfully, but these errors were encountered:
Objective
Better support for monitoring TaskRuns and PipelineRuns
How
A new abstraction that allows you to customize how the metrics will be exported. This way to can control the cardinality without tampering with our ability to get highly specific metrics.
Use case
Currently, Tekton Pipelines' metrics can be configured by changing config-observability. However, the options that we have are very limited. You can configure to use
metrics.taskrun.level: task
, and then you can't aggregate further than that. And if you usemetrics.taskrun.level: taskrun
, that's not recommended since can lead to unbounded cardinality.Just to be more specific, let me share some examples of use cases:
Environment monitoring
Let's say you have a task that receives
environment
as a parameter (prod, staging, qa, etc.). You might want to analyze: is there an environment that is slower to run this task? How much slower? Isqa
environment being used at all?Task optimization
You might just have merged a test fix, and want to know: How much that fix improved the duration of the integration tests? Did it improve the error rate of that task using similar parameters? etc.
Anomaly Detection
Is the CICD platform executing tasks normally? Are there too many tasks coming from a single repository? Are all tasks failing?
The text was updated successfully, but these errors were encountered: