Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do some tactical inlining across lexer and parser. #4307

Merged
merged 2 commits into from
Sep 16, 2024

Conversation

chandlerc
Copy link
Contributor

These are based on looking at our compilation benchmark and looking at function bodies that seem surprising to not get inlined.

Note that this will have a bit more impact on x86 where function call overhead (especially due to pushing and popping registers) is a bit higher than Arm.

For a recent AMD server, this makes parsing around 15% faster, and full "check" phase 5% faster.

Benchmark results:

name                                               old cpu/op   new cpu/op   delta
BM_CompileAPIFileDenseDecls<Phase::Lex>/256        40.2µs ± 2%  37.8µs ± 1%   -5.89%  (p=0.000 n=19+17)
BM_CompileAPIFileDenseDecls<Phase::Lex>/1024        190µs ± 2%   181µs ± 2%   -4.93%  (p=0.000 n=19+18)
BM_CompileAPIFileDenseDecls<Phase::Lex>/4096        779µs ± 1%   745µs ± 2%   -4.29%  (p=0.000 n=19+19)
BM_CompileAPIFileDenseDecls<Phase::Lex>/16384      3.44ms ± 1%  3.32ms ± 3%   -3.32%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/65536      14.6ms ± 2%  14.3ms ± 3%   -2.46%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/262144     66.7ms ± 2%  65.0ms ± 4%   -2.52%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      85.7µs ± 2%  71.3µs ± 2%  -16.77%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      421µs ± 2%   352µs ± 2%  -16.38%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.71ms ± 2%  1.44ms ± 2%  -15.89%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.19ms ± 2%  6.10ms ± 2%  -15.24%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    29.8ms ± 2%  25.3ms ± 2%  -14.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    127ms ± 2%   109ms ± 2%  -14.28%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       785µs ± 1%   752µs ± 1%   -4.13%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.71ms ± 1%  1.62ms ± 1%   -5.17%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.28ms ± 1%  4.97ms ± 1%   -6.04%  (p=0.000 n=20+19)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    20.2ms ± 1%  19.0ms ± 2%   -5.98%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    83.8ms ± 1%  78.9ms ± 2%   -5.84%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    354ms ± 1%   335ms ± 1%   -5.41%  (p=0.000 n=19+20)

These are based on looking at our compilation benchmark and looking at
function bodies that seem surprising to not get inlined.

Note that this will have a bit more impact on x86 where function call
overhead (especially due to pushing and popping registers) is a bit
higher than Arm.

For a recent AMD server, this makes parsing around 15% faster, and full
"check" phase 5% faster.

Benchmark results:
```
name                                               old cpu/op   new cpu/op   delta
BM_CompileAPIFileDenseDecls<Phase::Lex>/256        40.2µs ± 2%  37.8µs ± 1%   -5.89%  (p=0.000 n=19+17)
BM_CompileAPIFileDenseDecls<Phase::Lex>/1024        190µs ± 2%   181µs ± 2%   -4.93%  (p=0.000 n=19+18)
BM_CompileAPIFileDenseDecls<Phase::Lex>/4096        779µs ± 1%   745µs ± 2%   -4.29%  (p=0.000 n=19+19)
BM_CompileAPIFileDenseDecls<Phase::Lex>/16384      3.44ms ± 1%  3.32ms ± 3%   -3.32%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/65536      14.6ms ± 2%  14.3ms ± 3%   -2.46%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/262144     66.7ms ± 2%  65.0ms ± 4%   -2.52%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      85.7µs ± 2%  71.3µs ± 2%  -16.77%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      421µs ± 2%   352µs ± 2%  -16.38%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.71ms ± 2%  1.44ms ± 2%  -15.89%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.19ms ± 2%  6.10ms ± 2%  -15.24%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    29.8ms ± 2%  25.3ms ± 2%  -14.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    127ms ± 2%   109ms ± 2%  -14.28%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       785µs ± 1%   752µs ± 1%   -4.13%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.71ms ± 1%  1.62ms ± 1%   -5.17%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.28ms ± 1%  4.97ms ± 1%   -6.04%  (p=0.000 n=20+19)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    20.2ms ± 1%  19.0ms ± 2%   -5.98%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    83.8ms ± 1%  78.9ms ± 2%   -5.84%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    354ms ± 1%   335ms ± 1%   -5.41%  (p=0.000 n=19+20)
```
has_leading_space_ = false;
return buffer_.AddToken(info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wasn't it sufficient to move the definition of this function to tokenized_buffer.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have been, but it didn't seem worth having the function broken out given that all the code around the call to this was already directly manipulating the private members of the tokenized buffer. This is purely about what is nicer code wise, not a performance thing. The only performance thing was getting it out of the source file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have a method, if just to reduce the duplicated logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, switched back. I had found it more confusing to have the mixture of methods manipulating state and direct code manipulations, particularly below in the error recovery code, but can keep it if preferred.

toolchain/parse/context.h Show resolved Hide resolved
@josh11b josh11b added this pull request to the merge queue Sep 15, 2024
Merged via the queue into carbon-language:trunk with commit 06344ae Sep 16, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants