Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clusteragent/autoscaling] Add local fallback fields to DatadogPodAutoscalerInternal #30776

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
2346774
Add annotations and local recommendations to pod autoscaler internal
jennchenn Nov 1, 2024
bf45051
Extend scaling values with local recommendations
jennchenn Nov 4, 2024
c54a91e
Track previous horizontal recommendations
jennchenn Nov 4, 2024
0ca33ba
Fix linter errors
jennchenn Nov 5, 2024
2b491ac
fixup! Track previous horizontal recommendations
jennchenn Nov 5, 2024
1957a34
Define bool to determine when local fallback active
jennchenn Nov 5, 2024
f1305ff
Check target source prior to updating DPAI from existing status
jennchenn Nov 5, 2024
3e72df5
Define annotation structure and parse directly
jennchenn Nov 6, 2024
70ed026
Use scalingValues to represent active scaling values
jennchenn Nov 6, 2024
3afc059
Update function description comments
jennchenn Dec 4, 2024
5757b59
Parse annotations to new custom recommender config type
jennchenn Dec 5, 2024
92263a1
Rename type CustomRecommenderConfiguration to RecommenderConfiguration
jennchenn Dec 5, 2024
fffced9
Merge remote-tracking branch 'origin/main' into jenn/CASCL-102_adapt-…
jennchenn Dec 5, 2024
b69bcfb
Add recommender configuration type comment
jennchenn Dec 5, 2024
8424eb9
fixup! Merge remote-tracking branch 'origin/main' into jenn/CASCL-102…
jennchenn Dec 9, 2024
5b04388
Merge remote-tracking branch 'origin/main' into jenn/CASCL-102_adapt-…
jennchenn Dec 9, 2024
767066e
Add custom recommender configuration to fake DPAI for tests
jennchenn Dec 9, 2024
5dd8695
Set custom recommender configuration once
jennchenn Dec 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions pkg/clusteragent/autoscaling/workload/model/metadata.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// Unless explicitly stated otherwise all files in this repository are licensed
// under the Apache License Version 2.0.
// This product includes software developed at Datadog (https://www.datadoghq.com/).
// Copyright 2024-present Datadog, Inc.

//go:build kubeapiserver

package model

import "encoding/json"

// exported for testing purposes
const (
AnnotationsURLKey = "autoscaling.datadoghq.com/url"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you already have a JSON for settings, it can simplified as a single JSON object:

autoscaling.datadoghq.com/custom-recommender: {"url": <string>, "settings": {...} }

AnnotationsFallbackURLKey = "autoscaling.datadoghq.com/fallback-url"
AnnotationsSettingsKey = "autoscaling.datadoghq.com/settings"
)

// Annotations represents the relevant annotations on a DatadogPodAutoscaler object
type Annotations struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the fact that the info comes from annotations is important, the info itself is important. I would see it as a RecommenderConfiguration which happens to be from annotations in DPA parsing only.

That could directly be a field in DPAInt customRecommenderConfiguration *recommenderConfiguration (in which case the file may not be necessary)

Endpoint string
FallbackEndpoint string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to support another endpoint for fallback (for now).

Settings map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Settings map[string]string
Settings map[string]any

I think we can keep the types (int, string, etc.) so that the recommender receives typed info. This is what the JSON/Proto allows: https://github.com/DataDog/agent-payload/blob/master/jsonschema/WorkloadRecommendationsRequest.json#L51-L74

}

// ParseAnnotations extracts the relevant autoscaling annotations from a kubernetes annotation map
func ParseAnnotations(annotations map[string]string) Annotations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the function needs to be exported. It should be call within the package and filled directly by the DatadogPodAutoscalerInternal operations (UpdateFromPodAutoscaler, so that we guarantee consistency of the object).

annotation := Annotations{
Endpoint: annotations[AnnotationsURLKey],
FallbackEndpoint: annotations[AnnotationsFallbackURLKey],
}

settings := map[string]string{}
err := json.Unmarshal([]byte(annotations[AnnotationsSettingsKey]), &settings)
if err == nil && len(settings) > 0 {
annotation.Settings = settings
}

return annotation
}
70 changes: 70 additions & 0 deletions pkg/clusteragent/autoscaling/workload/model/metadata_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
// Unless explicitly stated otherwise all files in this repository are licensed
// under the Apache License Version 2.0.
// This product includes software developed at Datadog (https://www.datadoghq.com/).
// Copyright 2024-present Datadog, Inc.

//go:build kubeapiserver && test

package model

import (
"testing"

"github.com/stretchr/testify/assert"
)

func TestParseAnnotation(t *testing.T) {
tests := []struct {
name string
annotations map[string]string
expected Annotations
}{
{
name: "Empty annotations",
annotations: map[string]string{},
expected: Annotations{},
},
{
name: "URL annotation",
annotations: map[string]string{
AnnotationsURLKey: "localhost:8080/test",
},
expected: Annotations{
Endpoint: "localhost:8080/test",
},
},
{
name: "Fallback annotation",
annotations: map[string]string{
AnnotationsURLKey: "localhost:8080/test",
AnnotationsFallbackURLKey: "localhost:8080/fallback",
},
expected: Annotations{
Endpoint: "localhost:8080/test",
FallbackEndpoint: "localhost:8080/fallback",
},
},
{
name: "Settings annotation",
annotations: map[string]string{
AnnotationsURLKey: "localhost:8080/test",
AnnotationsFallbackURLKey: "localhost:8080/fallback",
AnnotationsSettingsKey: `{"key": "value"}`,
},
expected: Annotations{
Endpoint: "localhost:8080/test",
FallbackEndpoint: "localhost:8080/fallback",
Settings: map[string]string{
"key": "value",
},
},
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
parsedAnnotation := ParseAnnotations(tt.annotations)
assert.Equal(t, tt.expected, parsedAnnotation)
})
}
}
56 changes: 51 additions & 5 deletions pkg/clusteragent/autoscaling/workload/model/pod_autoscaler.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ type PodAutoscalerInternal struct {
// name is the name of the PodAutoscaler
name string

// annotations are the annotations of the PodAutoscaler
annotations Annotations

// creationTimestamp is the time when the kubernetes object was created
// creationTimestamp is stored in .DatadogPodAutoscaler.CreationTimestamp
creationTimestamp time.Time
Expand All @@ -53,9 +56,15 @@ type PodAutoscalerInternal struct {
// (only if owner == remote)
settingsTimestamp time.Time

// scalingValues represents the current target scaling values (retrieved from RC)
// scalingValues represents the active scaling values that should be used
scalingValues ScalingValues

// mainScalingValues represents the scaling values retrieved from the product
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// mainScalingValues represents the scaling values retrieved from the product
// mainScalingValues represents the scaling values retrieved from the main recommender (product, optionally a custom endpoint).

mainScalingValues ScalingValues

// fallbackScalingValues represents the scaling values retrieved from the fallback
fallbackScalingValues ScalingValues

// horizontalLastActions is the last horizontal action successfully taken
horizontalLastActions []datadoghq.DatadogPodAutoscalerHorizontalAction

Expand Down Expand Up @@ -100,8 +109,9 @@ type PodAutoscalerInternal struct {
// NewPodAutoscalerInternal creates a new PodAutoscalerInternal from a Kubernetes CR
func NewPodAutoscalerInternal(podAutoscaler *datadoghq.DatadogPodAutoscaler) PodAutoscalerInternal {
pai := PodAutoscalerInternal{
namespace: podAutoscaler.Namespace,
name: podAutoscaler.Name,
namespace: podAutoscaler.Namespace,
name: podAutoscaler.Name,
annotations: ParseAnnotations(podAutoscaler.Annotations),
}
pai.UpdateFromPodAutoscaler(podAutoscaler)
pai.UpdateFromStatus(&podAutoscaler.Status)
Expand All @@ -128,6 +138,7 @@ func NewPodAutoscalerFromSettings(ns, name string, podAutoscalerSpec *datadoghq.
func (p *PodAutoscalerInternal) UpdateFromPodAutoscaler(podAutoscaler *datadoghq.DatadogPodAutoscaler) {
p.creationTimestamp = podAutoscaler.CreationTimestamp.Time
p.generation = podAutoscaler.Generation
p.annotations = ParseAnnotations(podAutoscaler.Annotations)
p.spec = podAutoscaler.Spec.DeepCopy()
// Reset the target GVK as it might have changed
// Resolving the target GVK is done in the controller sync to ensure proper sync and error handling
Expand All @@ -148,16 +159,36 @@ func (p *PodAutoscalerInternal) UpdateFromSettings(podAutoscalerSpec *datadoghq.
p.horizontalEventsRetention = getHorizontalEventsRetention(podAutoscalerSpec.Policy, longestScalingRulePeriodAllowed)
}

// UpdateFromValues updates the PodAutoscalerInternal from a new scaling values
// UpdateFromValues updates the PodAutoscalerInternal scaling values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would need to refine this API so that it's not complete freedom from controller 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this validation (ensuring that scalingValues is always either mainScalingValues or fallbackScalingValues) be done in one of the follow-up PRs to make use of fallbackScalingValues/mainScalingValues? There are no changes to the controller logic yet so scalingValues is the only field being used at the moment

func (p *PodAutoscalerInternal) UpdateFromValues(scalingValues ScalingValues) {
p.scalingValues = scalingValues
}

// UpdateFromMainValues updates the PodAutoscalerInternal from new main scaling values
func (p *PodAutoscalerInternal) UpdateFromMainValues(mainScalingValues ScalingValues) {
p.mainScalingValues = mainScalingValues
}

// UpdateFromLocalValues updates the PodAutoscalerInternal from new local scaling values
func (p *PodAutoscalerInternal) UpdateFromLocalValues(fallbackScalingValues ScalingValues) {
p.fallbackScalingValues = fallbackScalingValues
}

// RemoveValues clears autoscaling values data from the PodAutoscalerInternal as we stopped autoscaling
func (p *PodAutoscalerInternal) RemoveValues() {
p.scalingValues = ScalingValues{}
}

// RemoveMainValues clears main autoscaling values data from the PodAutoscalerInternal as we stopped autoscaling
func (p *PodAutoscalerInternal) RemoveMainValues() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any difference with UpdateFromMainValues(ScalingValues{})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, no difference, mostly added to stay consistent with the RemoveValues() function above but can remove these two

p.mainScalingValues = ScalingValues{}
}

// RemoveLocalValues clears local autoscaling values data from the PodAutoscalerInternal as we stopped autoscaling
func (p *PodAutoscalerInternal) RemoveLocalValues() {
p.fallbackScalingValues = ScalingValues{}
}

// UpdateFromHorizontalAction updates the PodAutoscalerInternal from a new horizontal action
func (p *PodAutoscalerInternal) UpdateFromHorizontalAction(action *datadoghq.DatadogPodAutoscalerHorizontalAction, err error) {
if err != nil {
Expand Down Expand Up @@ -295,6 +326,11 @@ func (p *PodAutoscalerInternal) Name() string {
return p.name
}

// Annotations returns the annotations on the PodAutoscaler
func (p *PodAutoscalerInternal) Annotations() Annotations {
return p.annotations
}

// ID returns the functional identifier of the PodAutoscaler
func (p *PodAutoscalerInternal) ID() string {
return p.namespace + "/" + p.name
Expand All @@ -320,11 +356,21 @@ func (p *PodAutoscalerInternal) CreationTimestamp() time.Time {
return p.creationTimestamp
}

// ScalingValues returns the scaling values of the PodAutoscaler
// ScalingValues returns a pointer to the active scaling values of the PodAutoscaler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a pointer

func (p *PodAutoscalerInternal) ScalingValues() ScalingValues {
return p.scalingValues
}

// MainScalingValues returns the main scaling values of the PodAutoscaler
func (p *PodAutoscalerInternal) MainScalingValues() ScalingValues {
return p.mainScalingValues
}

// FallbackScalingValues returns the fallback scaling values of the PodAutoscaler
func (p *PodAutoscalerInternal) FallbackScalingValues() ScalingValues {
return p.fallbackScalingValues
}

// HorizontalLastActions returns the last horizontal actions taken
func (p *PodAutoscalerInternal) HorizontalLastActions() []datadoghq.DatadogPodAutoscalerHorizontalAction {
return p.horizontalLastActions
Expand Down
62 changes: 0 additions & 62 deletions pkg/clusteragent/autoscaling/workload/model/rc_schema.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,6 @@
package model

import (
"time"

"k8s.io/apimachinery/pkg/api/resource"

kubeAutoscaling "github.com/DataDog/agent-payload/v5/autoscaling/kubernetes"
datadoghq "github.com/DataDog/datadog-operator/apis/datadoghq/v1alpha1"
)
Expand Down Expand Up @@ -41,61 +37,3 @@ type AutoscalingSettings struct {
// Spec is the full spec of the PodAutoscaler
Spec *datadoghq.DatadogPodAutoscalerSpec `json:"spec"`
}

// ScalingValues represents the scaling values (horizontal and vertical) for a target
type ScalingValues struct {
// HorizontalError refers to an error encountered by Datadog while computing the horizontal scaling values
HorizontalError error
Horizontal *HorizontalScalingValues

// VerticalError refers to an error encountered by Datadog while computing the vertical scaling values
VerticalError error
Vertical *VerticalScalingValues

// Error refers to a general error encountered by Datadog while computing the scaling values
Error error
}

// HorizontalScalingValues holds the horizontal scaling values for a target
type HorizontalScalingValues struct {
// Source is the source of the value
Source datadoghq.DatadogPodAutoscalerValueSource

// Timestamp is the time at which the data was generated
Timestamp time.Time

// Replicas is the desired number of replicas for the target
Replicas int32
}

// VerticalScalingValues holds the vertical scaling values for a target
type VerticalScalingValues struct {
// Source is the source of the value
Source datadoghq.DatadogPodAutoscalerValueSource

// Timestamp is the time at which the data was generated
Timestamp time.Time

// ResourcesHash is the hash of containerResources
ResourcesHash string

// ContainerResources holds the resources for a container
ContainerResources []datadoghq.DatadogPodAutoscalerContainerResources
}

// SumCPUMemoryRequests sums the CPU and memory requests of all containers
func (v *VerticalScalingValues) SumCPUMemoryRequests() (cpu, memory resource.Quantity) {
for _, container := range v.ContainerResources {
cpuReq := container.Requests.Cpu()
if cpuReq != nil {
cpu.Add(*cpuReq)
}

memoryReq := container.Requests.Memory()
if memoryReq != nil {
memory.Add(*memoryReq)
}
}

return
}
74 changes: 74 additions & 0 deletions pkg/clusteragent/autoscaling/workload/model/recommendations.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
// Unless explicitly stated otherwise all files in this repository are licensed
// under the Apache License Version 2.0.
// This product includes software developed at Datadog (https://www.datadoghq.com/).
// Copyright 2024-present Datadog, Inc.

//go:build kubeapiserver

package model

import (
"time"

"k8s.io/apimachinery/pkg/api/resource"

datadoghq "github.com/DataDog/datadog-operator/apis/datadoghq/v1alpha1"
)

// ScalingValues represents the scaling values (horizontal and vertical) for a target
type ScalingValues struct {
// HorizontalError refers to an error encountered by Datadog while computing the horizontal scaling values
HorizontalError error
Horizontal *HorizontalScalingValues

// VerticalError refers to an error encountered by Datadog while computing the vertical scaling values
VerticalError error
Vertical *VerticalScalingValues

// Error refers to a general error encountered by Datadog while computing the scaling values
Error error
}

// HorizontalScalingValues holds the horizontal scaling values for a target
type HorizontalScalingValues struct {
// Source is the source of the value
Source datadoghq.DatadogPodAutoscalerValueSource

// Timestamp is the time at which the data was generated
Timestamp time.Time

// Replicas is the desired number of replicas for the target
Replicas int32
}

// VerticalScalingValues holds the vertical scaling values for a target
type VerticalScalingValues struct {
// Source is the source of the value
Source datadoghq.DatadogPodAutoscalerValueSource

// Timestamp is the time at which the data was generated
Timestamp time.Time

// ResourcesHash is the hash of containerResources
ResourcesHash string

// ContainerResources holds the resources for a container
ContainerResources []datadoghq.DatadogPodAutoscalerContainerResources
}

// SumCPUMemoryRequests sums the CPU and memory requests of all containers
func (v *VerticalScalingValues) SumCPUMemoryRequests() (cpu, memory resource.Quantity) {
for _, container := range v.ContainerResources {
cpuReq := container.Requests.Cpu()
if cpuReq != nil {
cpu.Add(*cpuReq)
}

memoryReq := container.Requests.Memory()
if memoryReq != nil {
memory.Add(*memoryReq)
}
}

return
}
Loading