29 Aug 01:33

Fixes

Reduced memory footprint.
Fix an issue blocking settings page because of core count limit.
Fix an issue with Cloud costs page not leveraging cloud account mappings.
Fix an issue with higher than normal numbers in gcp cloud costs.
Fix an issue with aggregating by label on right-sizing.
Fix an issue with request-sizing sort by field causing an api error.
Fix an issue with scheduled reports not sending.
Fix an issue with collections adding cloud cost with custom labels.
Fix an issue with external costs not getting ingested properly.
Fix an issue with asset budgets getting an api error using service workload type.
Fix an issue with budgets page resetting weekly send 1-7 from 0-6.
Fix an issue in budgets collection selector not showing selected value.
Fix an issue causing prometheus query error in local storage queries.
Fix an issue creating cloud cost reports via helm chart.
Fix an issue with trial status disappearing on upgrade of eks optimized.
Fix an issue with slow queries on cluster status api.
Fix an issue with invalid provider name in the cluster list and cluster detail.
Fix error visible in aggregator logs “append row Failure: acquiring max concurrency semaphore: context canceled” resulting in hung api responses.
Fix an issue with allocation top line matching allocation summary api.
Fix an issue with assets not matching allocation summary api.
Fix an issue with adjustments when no cloud integration or custom pricing is enabled.
Fix an issue with auth loops when OIDC values are set.
Fix an issue with azure costs being lower in kubecost than azure.

Helm Fixes

#3595 Fix okta redirect loop.
#3600 Reduce Memory usage.

Security Updates

#3618 Bump kubecost-modeling to v0.1.15 to fix CVE-2024-7592
#3609 Bump kubecost-modeling to v0.1.14 to fix CVE-2024-4603, CVE-2024-5535, CVE-2024-4603, CVE-2024-5535, CVE-2024-6923, CVE-2024-37891
#3594 Bump cluster-controller to 0.16.8 to fix CVE-2024-41110
#3603 Bump Grafana for CVE-2024-21490 CVE-2024-24557

Known issues:

prom/prometheus v2.53.1 in our helm chart has a known critical CVE-2024-41110 that has not been resolved upstream. Once this is resolved we will patch again.
redhat/ubi9 has a known high CVE-2024-6345 that has not been resolved upstream. We are working for alternate resolutions here as ubi9 has left this open for some time. Will patch a resolution for this when we have a verified and tested solution.

Assets 2

30 Jul 17:48

cliffcolvin

v2.3.4

d366e27

V2.3.4

Fixes

Fix Ingestion progress bar issues where the green bar doesn’t complete.
Fix an issue with missing windows keeping ingestion from completing.
Fix an issue with efficiency numbers matching when you drill down from cluster to namespace.
Fix an issue with cluster totals matching the sum of the pages.
Fix an issue with abandoned workloads when a pod has 0 utilization.
Fix an issue with request sizing when a pod has 0 utilization.
Fix an issue with the cluster right-sizing page when some clusters are missing configuration.
Fix an issue with cloud cost scheduled reports not working appropriately.
Fix an issue with asset budget reports showing the wrong aggregation on form.

Known issues:

If Grafana links are disabled, enable them by adding the following helm value:

kubecostAggregator:
  extraEnv:
  - name: GRAFANA_ENABLED
    value: "true"

2.3.5: Adding a custom label in collections for cloud costs fails
2.3.5: External Costs is broken for Datadog
2.3.5: Double counting discounts in certain situations
2.4: In budgets adding an assets tag causes alerts to fail
2.4: Allocation container detail modal (last level popup) is slow/fails in large environments
PV cost discrepancy in multi cluster environments (no cluster filter)
Request-sizing cannot sort by average or max usage

Assets 2

18 Jul 01:10

cliffcolvin

v2.3.3

badf0e0

V2.3.3

Front-End

Fixes an issue where budgets were not validating properly
Fixes an issue where Actions configs do not delete properly
Fixes an issue where creating a request right-sizing action, the autocomplete menu offers options outside the primary cluster
Fixes an issue where Cloud Budget Alerts would not work in Alerts page
Fixes an issue where the refresh button on KC actions page would not respond
Fixes an issue where changing the cost metric on the assets page didn’t result in Total Cost being updated
Fixes an issue where the clusters page would show an incorrect total cost for cluster and doesn't match cluster inspect page
Fixes an issue where you could not delete an NS Turndown Action
Fixed an issue where unclaimed volumes page is listing volumes that are bound and huge discrepancy compared to v1.108.1
Fixes an issue where creating Request Sizing actions via Helm values results in incorrect UI display
Fixes an issue where sorting the Allocations table crashes the page
Fixes an issue where collections missing filter by container option
Fixes an issue where estimated savings were greater than Total Cost on Cluster Page
Fixes an issue where drilling down when aggregated by Deployment breaks the Efficiency Reports page
Adds an orchestrator to bug report

Backend

Fixes an issue where QueryAssetCTE would cause a panic
Significantly improves Aggregator Tuning and Testing
Fixes an ingestion bug which could lead to gaps in data ingested, even though the diagnostics said all files had been ingested. This is typically caused when the Aggregator is stopped/killed. Partial or failed ingestion attempts will now automatically retry
Fixes an issue where transactions were not handled properly
Fixes "NewBillingParseSchema: failed to find Date field" error
Fixes cloudAccountMapping
Fixes an issue where odic or saml redirect on certain configurations would 404 after login
Fixes an issue where a PDF attached to a scheduled report is corrupt
Fixes an issue where custom labels would not work as expected (department, owner, etc)
Fixes an issue where scheduled reports were not being sent
Fix an issue with EKS Optimized gating from backend
Fixes an issue where derivation steps would run out of order
Fixes an issue with filters in scheduled cloud cost reports not being parsed correctly

Helm Changes

#3532 Add API configuration endpoints to nginx config
#3536 Bump Cluster Controller to 0.16.5

Assets 2

03 Jul 00:35

teevans

v2.3.2

3809f94

V2.3.2

Frontend

Fix an issue where the graph on-hover tooltips in the Cloud Costs page were showing windows in local time, instead of UTC. This sometimes caused the tooltip and x-axis to disagree on date.
Fix an issue where the Automatic Request Sizing UI in Savings Actions was passing an invalid filter when attempting to filter workloads by Deployment.
Added a tooltip to the Clusters List page explaining Workload Efficiency, and changed the column name from “Efficiency” to “Workload Efficiency”.
Fix an issue where the Automatic Request Sizing UI in Savings Actions would sometimes produce malformed timestamps in requests to schedule recurring sizing actions.
Fix a UI issue where any errors that occurred during the creation of a cluster turndown schedule were not relayed to the user.
Fix a UI issue where creating an Alert with a window other than 1d-7d would cause it to show up in the Alerts table as having a window of “7d”.
Note that the UI does not have a way to accurately represent completely arbitrary windows at this time. Setting windows of e.g. “1h” via the Helm chart will produce similarly broken displays in the UI, although the alert will function correctly.

Backend

Fix an issue where Allocation reports with a step size of “hour” could not be saved.
Add a DB info log, printed after DB instance creation.
Improve stability and memory consumption on high scale environments by switching off Jemalloc.
Fix an issue where idle compute resources would sometimes not account properly in reconciled allocation data.
Bump image to resolve CVE-2024-35255
Helm chart
Add namespace label to aggregator serviceMonitor.
Add option for aggregator env var for DB_TRIM_MEMORY_ON_CLOSE.

Cluster turndown

Send HTTP responses in header rather than body for better compliance with HTTP standards.

Cluster controller

Update Cluster turndown reference version to v0.16.3

The following CVEs have been resolved:

Assets 2

25 Jun 01:56

jessegoodier

v2.3.1

17f5a49

v2.3.1

What's Changed

Notice:
In Kubecost 2.3, Helm no longer reads kubecostAggregator.env. This was done to prevent unexpected behavior and to simplify the aggregator's configuration. New key pairs are available to change these settings, though most will not need to change anything.

Very few users made changes to Aggregator environment variables- for those that did, a new message will be displayed during upgrades to 2.3.1.

For more information, see: https://docs.kubecost.com/install-and-configure/install/multi-cluster/federated-etl/aggregator#aggregator-optimizations.

The most important change is the addition of .Values.kubecostAggregator.etlDailyStoreDurationDays which is the setting that determines the history aggregator should build from the federated-store. The default is 91 days.

Remove duplicate labels, causing ArgoCD to fail on upgrade #3500
Add notice for values change #3509

Assets 2

20 Jun 17:31

cliffcolvin

v2.3.0

8264dc5

V2.3.0

Overview:

Version 2.3 is a 'production' release focused on targeting bug fixes and stability. as well as an upgrade to the version of the underlying database we use to store kubecost data.

⚠️ Memory Consumption Disclaimer ⚠️

Enjoy a whopping 10X improvement to data ingestion speeds, making it faster and easier to see your cost and operational metrics in Kubecost, especially as a new user.

New Data Ingestion

Due to changes needed to improve stability, upgrading from versions prior to 2.3 will cause Kubecost to build a new DB. This can take some time in large environments. For most installations, this will easily finish in a night. Please understand that data is built starting from today and moving back in time, a green progress bar will indicate progress. Contact Kubecost support for more specific guidance.

It is possible to create a parallel installation to your current Kubecost primary instance, though this is not required. See the doc below on how to do this.

https://docs.kubecost.com/install-and-configure/advanced-configuration/parallel

This improvement comes at the expense of increased memory requirements. Our recommendation is to temporarily remove memory limits (if you use them) for the first few days of running the application. Once a new baseline is recognized, feel free to re-implement your memory limits. If you have any questions or concerns about this, please contact your dedicated Kubecost success manager or reach out to us at team@kubecost.com.

Major:

Kubernetes Efficiency View - Easily breakdown the efficiency of your clusters and workloads over time.
- A link to this view has been added to the Cluster Efficiency card at the top of the Overview page.
Enterprise Integration (Postgres) - Add the ability to integrate kubecost data for enterprise customers to export kubecost data for usage with BI tools with Postgres.
Custom SMTP Server integration - Add the ability to integrate with custom SMTP servers for alerts, budgets, etc instead of using Kubecost’s default SMTP solution.
Anomaly Detection Enhancements
- Change how anomalies are detected and make the output more actionable. Anomalies are now detected on a rolling lookback window.
- Add a user defined threshold and minimum cost filter for detecting anomalies.
- Add anomaly detection for allocations as well as cloud costs.
- When navigating to the allocations or cloud costs page by clicking an anomaly, the anomalous entry will now be highlighted, and the lookback window marked.
All Business Tier users granted access to full Enterprise features during the transition period.

Minor:

Cross-provider cur access - IRSA access for cloud cost integrations in kubecost with multiple providers.
Add the ability to end a free trial with the /expireProductTrial endpoint, allowing users to see what Free tier features are available even if they have begun a free trial.
- Added a button to the Settings page which can be used to call this endpoint.
Free trials will now automatically begin when the limit of 250 monitored cores is exceeded, rather than upon install. Trials can still be started manually via the settings page.
Additional diagnostic information is available in bug reports, and is used to more accurately indicate the state of data ingestion on startup. The Helm Chart version is now also visible from the settings page.
Updated the workload field label for Budgets.
Add support for specifying label values in Assets filters.
Reduced frequency of calls to a diagnostics endpoint.

Ingestion Fixes

A large focus has been placed on fixing data ingestion issues that we have seen in live environments during this release cycle. Below are a list of the focus areas

Large performance increase in ingestion and derivation of allocations, assets, cloud cost, network and containerstats data.
Fix an issue with the promotion of “write data” to “read data” where a race condition would sometimes cause a breakage.
Fix the error ‘internal list scan offset is out of range’ updates to the database.
Added many new diagnostics data points for assistance in troubleshooting and to help ensure a healthy flow of data during the initial phases of ingesting especially large datasets.
Add the ability to automatically grow the refresh interval on initial load of large datasets. This fixes an issue where on large datasets the ingestion process would be halted so the next ingestion process could begin. This feature will grow the refresh interval in the beginning to allow ingestion to get to a complete state before promoting and moving to the next ingestion cycle.
Add the ability to get a first cut of the data faster when a full reingest is being processed, enabling the nearest time-series data to be viewed while the historical data is still being ingested.
Fix the order of ingestion to make sure daily data, and the latest data is ingested first.

Fixes:

Fix Cloud Cost and Asset budgets so that they work appropriately after Kubecost 2.0.
Fix Summary Allocation windowing inconsistencies between different accumulation options.
Fix the http 500 error in Cluster Sizing error when some nodes don’t have a valid asset type.
Fix an issue with savings api for clusters that contain Fargate nodes (nodes without a node type).
Fix the http 500 error in Assets Topline API when aggregating by label.
Fix an error filtering with the “contains”/”contains prefix”/”contains suffix” operators on custom labels.
Fix an issue with multiple reports with the same profile being created on pod restart when using v1 filters in the config map.
Fix an issue with Business tier licenses not being appropriately recognized post v2.0.
Fix an issue with /debug/orchestrator and /providerOptimization endpoints not being accessible when core count exceeds free tier and no valid license or trial.
Fix collections to use filters from teams/rbac configuration.
Fix an issue with duplicate budgets being created when creating a new budget and a budget with the same criteria already exists.
Fix an issue causing negative idle when multiple clusters share the same nodes.
Fix an issue where cloud costs and external costs processes could be initialized in cost-model even when the separate container is running for cloud cost and external costs causing messy logs and un-needed processing in the cost-model container.
Fix an issue with SMTP connection causing a panic.
Fix issues with Idle calculation with service/label aggregation.
Fix an issue with SAML configuration when query filter is empty and saml filter is not empty.
Fix an issue with request sizing missing valid parameters in query validation.
Fix an issue when aggregating by predefined label aliases (deployment, daemonset, etc).
Fix idle sharing of CPU, GPU, and RAM for Allocations API.
Fix aggregate by label when separating idle.
Fix an issue with drastic differences between assets visual representation in 2.x versus 1.108. This was due to seemingly duplicative data for certain time periods. Added cluster id to ingestion to aid in the de-duplication of this time-series data.
Fix an error in Assets View API when the end of the window is empty causing a Boundary Error.
Fix many noisy logs to be logged at the appropriate level or removed for ease of understanding state and troubleshooting.
Fix an issue when saving a scheduled report where the next run was sometimes not appropriately set.
Fix an error when enabling .Values.saml.enabled=true and .Values.readonly=true.
Fix an error where network insights would not be visible even when configuration was set to enable.
Fix an issue with Address Network cost reconciliation for Azure provider with an edge case for virtual machine scale sets.
Fix an issue where unallocated__idle was being returned in /savings/requestSizingv2.
Fix cluster sizing recommendation failures when nil objects detected.
Fix Assets API to accurately align topline and table data.
Fix Allocations to appropriately display idle for unallocated workloads.
Fix large inflated node prices before reconciliation occurs.
Updates Cloud Cost ingestion for GCP to fall back to resource.global_name when resource.name is null for determining ProviderID. This is particularly relevant for Cloud SQL, Cloud Storage, and Cloud Logging, which very often have null resource.name values, resulting in unallocated ProviderID values.
Fix an issue where some time series charts used the end of a time period for their x-axis instead of the start.
Fix an issue where the UI would attempt to show hourly data for External Costs on small, recent time windows. Hourly data is not collected for External Costs at this time.
Fix an issue where idle costs were not represented in the Namespaces table of the cluster inspect page.
Fix an issue where exporting the values.yaml entries for reports would sometimes format filters incorrectly.
Fix an issue on the cluster-inspect page in which the cost for a namespace did not account for the shareTenancyCost configuration.
Fix for sorting by cluster name on clusters Page.
Fix UI issues with the Automatic Request Rightsizing Action, where no loading indicator was shown while the action was registered, and failures were not indicated to the user.
Fix an issue in which filters were not properly set when drilling into an anomaly when using multi-aggregation.
Fix an issue where Business Tier users were being blocked from usage when exceeding 250 cores monitored. Business Tier is now effectively Enterprise.
Fix an issue where table header sort icon (up/down arrow) sometimes appeared to the left of the text instead of to the right.
Fix an issue where the “Add Cloud Provider” button would be absent from the Settings page unless at least one provider was already set up.
...

Assets 2

18 Jun 23:50

cliffcolvin

v2.3.0-rc.10

cd33337

V2.3.0-rc.10 Pre-release

Pre-release

Fix 500 error on allocations calls for invalid column.
Fix to not block business tier licenses on core count in the front-end.

Assets 2

18 Jun 22:31

cliffcolvin

v2.2.6-rc.1

009acf2

V2.2.6-rc.1 Pre-release

Pre-release

Fixes

Fixed issues that caused “white screens” with Kubecost free/trial installations

Assets 2

17 Jun 23:36

cliffcolvin

v2.3.0-rc.7

0fa3538

V2.3.0-rc.7 Pre-release

Pre-release

Modify orchestrator/debug endpoint to allow progress to be given without access to write database, to prevent race condition/panic.

Assets 2

17 Jun 23:35

cliffcolvin

v2.3.0-rc.6

130a054

V2.3.0-rc.6 Pre-release

Pre-release

Disable debug and orchestrator endpoints for testing race condition while copying database.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes

Helm Fixes

Security Updates

Known issues:

Fixes

Known issues:

Front-End

Backend

Helm Changes

Frontend

Backend

Cluster turndown

Cluster controller

The following CVEs have been resolved:

What's Changed

Overview:

⚠️ Memory Consumption Disclaimer ⚠️

New Data Ingestion

Major:

Minor:

Ingestion Fixes

Fixes:

Fixes

Releases: kubecost/cost-analyzer-helm-chart

v2.3.5

Fixes

Helm Fixes

Security Updates

Known issues:

V2.3.4

Fixes

Known issues:

V2.3.3

Front-End

Backend

Helm Changes

V2.3.2

Frontend

Backend

Cluster turndown

Cluster controller

The following CVEs have been resolved:

v2.3.1

What's Changed

V2.3.0

Overview:

⚠️ Memory Consumption Disclaimer ⚠️

New Data Ingestion

Major:

Minor:

Ingestion Fixes

Fixes:

V2.3.0-rc.10

V2.2.6-rc.1

Fixes

V2.3.0-rc.7

V2.3.0-rc.6