-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements for kustomize build
#5084
Comments
We will happily accept performance improvements. Since you've already opened some PRs, we can discuss specific changes on those PRs. /triage accepted |
Apologies for the delay - we will try to have people review your PRs soon! |
This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build # sigs.k8s.io/kustomize/kustomize/v5/commands/build.test pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 8523677542 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 8.798s *Currently*, this benchmark requires 3000 seconds to run on my machine. In order to run it on master today, you need to add `-timeout=30m` to the `go test` command. The dataset size was chosen because I believe it represents a real workload which we could get a runtime of less than 10 seconds. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total
This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build # sigs.k8s.io/kustomize/kustomize/v5/commands/build.test pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 8523677542 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 8.798s *Currently*, this benchmark requires 3000 seconds to run on my machine. In order to run it on master today, you need to add `-timeout=30m` to the `go test` command. The dataset size was chosen because I believe it represents a real workload which we could get a runtime of less than 10 seconds. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total
Another thought here would be to join the lookups done by Resource.CurId(). It calls Resource.GetGvk (which calls RNode.GetApiVersion and RNode.GetKind), Resource.GetName, and Resource.GetNamespace. That means there are four independent traversals at the top level ( This flow could be optimized for performance so it would have a single execution to find |
This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build # sigs.k8s.io/kustomize/kustomize/v5/commands/build.test pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 8523677542 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 8.798s *Currently*, this benchmark requires 3000 seconds to run on my machine. In order to run it on master today, you need to add `-timeout=30m` to the `go test` command. The dataset size was chosen because I believe it represents a real workload which we could get a runtime of less than 10 seconds. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total
I took a little time to investigate this option, and the improvement was minor at best - not really worth the effort. I've started to investigate what might be possible with caching CurId() in the Resource. Given that caching isn't already in place, I'm a little worried about complexities I may find with cache invalidation. I had hoped to lean on the code path that updates the list of previous ids, but there appear to be gaps (for example, api/krusty/component_test.go's TestComponent/multiple-components fails - I don't yet know if that indicates a flaw in the existing code where the previous id list should be updated but is not, or if that indicates a flaw in my hope that any change in CurId() should be associated with an update to the list of previous ids). I will continue investigating. |
Initial results for caching the Resource.CurId() return value are very promising. I hooked cache invalidation into Resource.setPreviousId() and resWrangler.appendReplaceOrMerge()'s case 1 for replace and merge, and that looks to cover the unit test cases. Note that there are a small number of unit tests that cannot run cleanly on my system, so I may have gaps there. @natasha41575 (and others), before I consider moving forward with this change, do you know if there are reasons why caching Resource.CurId() could be problematic? I feel like there may be hidden pitfalls here. Is this sort of caching in line with kustomize coding patterns? In addition to resWrangler.appendReplaceOrMerge, are there other spots that might adjust a resource in a way that could alter its ResId, but do not issue a call to Resource.setPreviousId()? Anything other aspect I might be missing? I did some testing using the benchmark from #5425. However, I didn't want to wait forever so I adjusted the second level resource count from 100 down to 20.
|
I went ahead and created #5481 for this effort. I still have some concerns about what other code paths might need to issue the call to invalidate the ID caches, but after discussion in #5422 (comment) I figured it was worth sharing the work. I don't know of any other spots in the code that would need the additions, so there's not much benefit in me keeping the PR private. |
So far, no performance changes against v5.3.0 (multiple invocations scenario):
|
@shapirus, I'm not surprised #5481 left the PSM performance (#5422) as-is. The optimization in #5481 was aimed at #5425, and in that context it has dramatic improvements. If I can carve some time in the next few weeks, I'll take a close look at #5422 and update there with any useful information I can find. |
This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 48385043792 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 48.701s *Currently*, this benchmark requires 48 seconds to run on my machine. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total
This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: $ make run-benchmarks go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 48035946042 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 48.357s *Currently*, this benchmark requires 48 seconds to run on my machine. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total
We have a configrepo which produces about 4000 kubernetes resources, and expect three times that number in september.
kustomize build
currently takes around 45 seconds on a developer laptop. We runkustomize build
against master and a branch for CI to find the impact of a change, and flux also runs kustomize build. In CI this takes about 1.5 minutes, so in total we see 7.5 minutes of compute time to roll out one change.Since this is just under 1.4 MB of YAML I believe it should be possible to do this work much faster.
I have made four PRs with performance improvements. I have made the PRs as small as possible, but I believe some changes might be nicer with larger refactorings to make the new code more resilient to changes (
resWrangler
id
map, PR #5081 ). 5081+5082 might also need more tests, please provide feedback in each PR with suggested improvements as I don't know the kustomize codebase.Here's a summary of the proposed changes:
18.94s to 4.79s- will do this another waypprof before changes:
after it is mostly YAML parsing and GC.
Why is this needed?
Faster linting of PRs, quicker reconcile in flux
Can you accomplish the motivating task without this feature, and if so, how?
Splitting into smaller repos might help, but it will not allow us to analyze the whole service mesh graph and interactions between services/configurations.
What other solutions have you considered?
N/A
Anything else we should know?
No response
Feature ownership
The text was updated successfully, but these errors were encountered: