Performance improvements for `kustomize build` #5084

chlunde · 2023-03-08T11:15:32Z

We have a configrepo which produces about 4000 kubernetes resources, and expect three times that number in september. kustomize build currently takes around 45 seconds on a developer laptop. We run kustomize build against master and a branch for CI to find the impact of a change, and flux also runs kustomize build. In CI this takes about 1.5 minutes, so in total we see 7.5 minutes of compute time to roll out one change.

Since this is just under 1.4 MB of YAML I believe it should be possible to do this work much faster.

I have made four PRs with performance improvements. I have made the PRs as small as possible, but I believe some changes might be nicer with larger refactorings to make the new code more resilient to changes (resWrangler id map, PR #5081 ). 5081+5082 might also need more tests, please provide feedback in each PR with suggested improvements as I don't know the kustomize codebase.

Here's a summary of the proposed changes:

perf: Add full program benchmark for kustomize build #5425 - add benchmark
perf: improve applyOrdering by avoid call to GetByCurrentId #5079 - 40.68s to 27.41s
perf: Intersection: Avoid callid AllIds inside inner loop #5080 - 27.46s to 18.94s
perf: Improve resmap.Append performance by using map lookup #5081 - ~~18.94s to 4.79s~~ - will do this another way
perf: Improve resmap performance with AppendAll and Transform functions #5427 - the other way
perf: MakeDefaultConfig once #5082 - 4.79s to 1.82s

pprof before changes:

after it is mostly YAML parsing and GC.

Why is this needed?

Faster linting of PRs, quicker reconcile in flux

Can you accomplish the motivating task without this feature, and if so, how?

Splitting into smaller repos might help, but it will not allow us to analyze the whole service mesh graph and interactions between services/configurations.

What other solutions have you considered?

N/A

Anything else we should know?

No response

Feature ownership

I am interested in contributing this feature myself! 🎉

The text was updated successfully, but these errors were encountered:

natasha41575 · 2023-03-22T19:09:04Z

We will happily accept performance improvements. Since you've already opened some PRs, we can discuss specific changes on those PRs.

/triage accepted

natasha41575 · 2023-08-21T21:36:48Z

Apologies for the delay - we will try to have people review your PRs soon!

This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build # sigs.k8s.io/kustomize/kustomize/v5/commands/build.test pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 8523677542 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 8.798s *Currently*, this benchmark requires 3000 seconds to run on my machine. In order to run it on master today, you need to add `-timeout=30m` to the `go test` command. The dataset size was chosen because I believe it represents a real workload which we could get a runtime of less than 10 seconds. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total

ephesused · 2023-11-03T16:40:31Z

Another thought here would be to join the lookups done by Resource.CurId(). It calls Resource.GetGvk (which calls RNode.GetApiVersion and RNode.GetKind), Resource.GetName, and Resource.GetNamespace.

That means there are four independent traversals at the top level (apiVersion, kind, and metadata twice). Then, in metadata, there are two independent traversals (name and namespace).

This flow could be optimized for performance so it would have a single execution to find apiVersion, kind, and metadata, and then a single execution to find name and namespace within metadata.

This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build # sigs.k8s.io/kustomize/kustomize/v5/commands/build.test pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 8523677542 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 8.798s *Currently*, this benchmark requires 3000 seconds to run on my machine. In order to run it on master today, you need to add `-timeout=30m` to the `go test` command. The dataset size was chosen because I believe it represents a real workload which we could get a runtime of less than 10 seconds. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total

ephesused · 2023-11-10T14:33:30Z

Another thought here would be to join the lookups done by Resource.CurId().

I took a little time to investigate this option, and the improvement was minor at best - not really worth the effort.

I've started to investigate what might be possible with caching CurId() in the Resource. Given that caching isn't already in place, I'm a little worried about complexities I may find with cache invalidation. I had hoped to lean on the code path that updates the list of previous ids, but there appear to be gaps (for example, api/krusty/component_test.go's TestComponent/multiple-components fails - I don't yet know if that indicates a flaw in the existing code where the previous id list should be updated but is not, or if that indicates a flaw in my hope that any change in CurId() should be associated with an update to the list of previous ids). I will continue investigating.

ephesused · 2023-11-10T21:45:41Z

Initial results for caching the Resource.CurId() return value are very promising. I hooked cache invalidation into Resource.setPreviousId() and resWrangler.appendReplaceOrMerge()'s case 1 for replace and merge, and that looks to cover the unit test cases. Note that there are a small number of unit tests that cannot run cleanly on my system, so I may have gaps there.

@natasha41575 (and others), before I consider moving forward with this change, do you know if there are reasons why caching Resource.CurId() could be problematic? I feel like there may be hidden pitfalls here. Is this sort of caching in line with kustomize coding patterns? In addition to resWrangler.appendReplaceOrMerge, are there other spots that might adjust a resource in a way that could alter its ResId, but do not issue a call to Resource.setPreviousId()? Anything other aspect I might be missing?

I did some testing using the benchmark from #5425. However, I didn't want to wait forever so I adjusted the second level resource count from 100 down to 20.

$ git log --oneline -1
e219b8864 (HEAD -> master, upstream/master, origin/master, origin/HEAD) Merge pull request #5421 from prashantrewar/swap-yaml-library

$ go test ./kustomize/commands/build -run nope -bench BenchmarkBuild -benchmem
goos: linux
goarch: amd64
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
BenchmarkBuild-2               1        149546683100 ns/op      2276899072 B/op 21421892 allocs/op
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       149.598s

$ git checkout optimize-curid
M       go.work.sum
Switched to branch 'optimize-curid'
Your branch is ahead of 'master' by 1 commit.
  (use "git push" to publish your local commits)

$ go test ./kustomize/commands/build -run nope -bench BenchmarkBuild -benchmem
goos: linux
goarch: amd64
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
BenchmarkBuild-2               1        12183853800 ns/op       2280974424 B/op 21462373 allocs/op
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       12.236s

$

ephesused · 2023-12-08T02:52:40Z

Initial results for caching the Resource.CurId() return value are very promising.

I went ahead and created #5481 for this effort. I still have some concerns about what other code paths might need to issue the call to invalidate the ID caches, but after discussion in #5422 (comment) I figured it was worth sharing the work. I don't know of any other spots in the code that would need the additions, so there's not much benefit in me keeping the PR private.

shapirus · 2023-12-08T10:35:49Z

So far, no performance changes against v5.3.0 (multiple invocations scenario):

Starting kustomize benchmark on Linux x86_64
kustomize versions: 
  5.2.1
  5.3.0
  PR-5481
iterations per test: 200
tests: 
  1_no-patches
  2_patches-json6902
  3_patches-strategic-merge
  4_no-patches-unknown-kind
  5_component-no-base-no-patches
  6_component-json6902-over-base
  7_component-PSM-over-base
time unit: seconds

             test: 1   test: 2   test: 3   test: 4   test: 5   test: 6   test: 7
    v5.2.1      1.37      1.52     11.96      1.37      2.25      3.10     13.40
    v5.3.0      1.33      1.47     12.29      1.34      1.54      1.92     12.34
   PR-5481      1.37      1.46     12.19      1.30      1.55      1.81     12.29

ephesused · 2023-12-08T14:03:44Z

@shapirus, I'm not surprised #5481 left the PSM performance (#5422) as-is. The optimization in #5481 was aimed at #5425, and in that context it has dramatic improvements. If I can carve some time in the next few weeks, I'll take a close look at #5422 and update there with any useful information I can find.

This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 48385043792 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 48.701s *Currently*, this benchmark requires 48 seconds to run on my machine. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total

This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: $ make run-benchmarks go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 48035946042 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 48.357s *Currently*, this benchmark requires 48 seconds to run on my machine. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total

chlunde added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 8, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 8, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 22, 2023

natasha41575 assigned chlunde Apr 11, 2023

shapirus mentioned this issue Oct 27, 2023

Researching performance degradation (and improvement) over a number of releases #5422

Open

chlunde mentioned this issue Oct 30, 2023

perf: Add full program benchmark for kustomize build #5425

Open

ephesused mentioned this issue Dec 8, 2023

perf: cache Resource.CurId & PrevIds return value #5481

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements for `kustomize build` #5084

Performance improvements for `kustomize build` #5084

chlunde commented Mar 8, 2023 •

edited

Loading

natasha41575 commented Mar 22, 2023

natasha41575 commented Aug 21, 2023

ephesused commented Nov 3, 2023

ephesused commented Nov 10, 2023

ephesused commented Nov 10, 2023

ephesused commented Dec 8, 2023

shapirus commented Dec 8, 2023

ephesused commented Dec 8, 2023

Performance improvements for kustomize build #5084

Performance improvements for kustomize build #5084

Comments

chlunde commented Mar 8, 2023 • edited Loading

Why is this needed?

Can you accomplish the motivating task without this feature, and if so, how?

What other solutions have you considered?

Anything else we should know?

Feature ownership

natasha41575 commented Mar 22, 2023

natasha41575 commented Aug 21, 2023

ephesused commented Nov 3, 2023

ephesused commented Nov 10, 2023

ephesused commented Nov 10, 2023

ephesused commented Dec 8, 2023

shapirus commented Dec 8, 2023

ephesused commented Dec 8, 2023

Performance improvements for `kustomize build` #5084

Performance improvements for `kustomize build` #5084

chlunde commented Mar 8, 2023 •

edited

Loading