Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve chunk throughput with elide array bound checks #43

Merged
merged 2 commits into from
Oct 1, 2023

Conversation

wjiec
Copy link
Contributor

@wjiec wjiec commented Sep 28, 2023

Gain about ~3% speedup by inlining the slide method and reducing one assignment statement.

goos: linux
goarch: amd64
pkg: github.com/restic/chunker
cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
                    │   old.txt    │               new.txt               │
                    │    sec/op    │    sec/op     vs base               │
ChunkerWithSHA256-8    158.7m ± 0%    154.3m ± 0%  -2.79% (p=0.000 n=10)
Chunker-8              65.72m ± 1%    61.49m ± 0%  -6.44% (p=0.000 n=10)
NewChunker-8           136.4µ ± 9%    131.3µ ± 4%       ~ (p=0.063 n=10)
PolDivMod-8            2.877n ± 0%    2.876n ± 0%       ~ (p=0.490 n=10)
PolDiv-8               2.890n ± 0%    2.891n ± 0%       ~ (p=0.952 n=10)
PolMod-8               2.894n ± 1%    2.894n ± 0%       ~ (p=0.809 n=10)
PolDeg-8              0.3148n ± 0%   0.3144n ± 0%       ~ (p=0.075 n=10)
RandomPolynomial-8     2.017m ± 3%    2.054m ± 4%       ~ (p=0.684 n=10)
PolIrreducible-8       1.146m ± 0%    1.146m ± 0%       ~ (p=0.971 n=10)
geomean                6.634µ         6.550µ       -1.28%

                    │   old.txt    │               new.txt               │
                    │     B/s      │     B/s       vs base               │
ChunkerWithSHA256-8   201.7Mi ± 0%   207.4Mi ± 0%  +2.87% (p=0.000 n=10)
Chunker-8             486.9Mi ± 1%   520.4Mi ± 0%  +6.88% (p=0.000 n=10)
geomean               313.4Mi        328.6Mi       +4.86%

@MichaelEischer
Copy link
Member

The change to use

		out := win[wpos%windowSize]
		win[wpos%windowSize] = b

yields the following speedup for me:

goos: linux
goarch: amd64
pkg: github.com/restic/chunker
cpu: AMD Ryzen 5 5600X 6-Core Processor
name                  old time/op   new time/op   delta
ChunkerWithSHA256-12   56.0ms ± 0%   54.5ms ± 0%  -2.60%  (p=0.000 n=8+9)
Chunker-12             41.5ms ± 0%   40.0ms ± 0%  -3.58%  (p=0.000 n=9+9)

name                  old speed     new speed     delta
ChunkerWithSHA256-12  599MB/s ± 0%  615MB/s ± 0%  +2.67%  (p=0.000 n=8+9)
Chunker-12            809MB/s ± 0%  840MB/s ± 0%  +3.71%  (p=0.000 n=9+9)

The other changes have no measurable effect. Regarding readability I'd prefer to keep the slide() method. Thus, please limit the changes in the PR to the just mentioned code snippet and update the corresponding comment to read // limit wpos to elide array bound checks

@wjiec
Copy link
Contributor Author

wjiec commented Sep 29, 2023

done

@wjiec wjiec changed the title improve chunk throughput with inlining and reducing statements improve chunk throughput with elide array bound checks Sep 30, 2023
Copy link
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot for discovering the performance improvement. And sorry for completely dissecting your PR.

I'm not sure when I'll get around to take a look at the polynomials PR.

@MichaelEischer MichaelEischer merged commit ac4c622 into restic:master Oct 1, 2023
8 checks passed
@wjiec
Copy link
Contributor Author

wjiec commented Oct 1, 2023

Cool. It's all about making the program better. 😁

@wjiec wjiec deleted the perf/chunker branch October 1, 2023 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants