Do some tactical inlining across lexer and parser. #4307

chandlerc · 2024-09-12T19:31:28Z

These are based on looking at our compilation benchmark and looking at function bodies that seem surprising to not get inlined.

Note that this will have a bit more impact on x86 where function call overhead (especially due to pushing and popping registers) is a bit higher than Arm.

For a recent AMD server, this makes parsing around 15% faster, and full "check" phase 5% faster.

Benchmark results:

name                                               old cpu/op   new cpu/op   delta
BM_CompileAPIFileDenseDecls<Phase::Lex>/256        40.2µs ± 2%  37.8µs ± 1%   -5.89%  (p=0.000 n=19+17)
BM_CompileAPIFileDenseDecls<Phase::Lex>/1024        190µs ± 2%   181µs ± 2%   -4.93%  (p=0.000 n=19+18)
BM_CompileAPIFileDenseDecls<Phase::Lex>/4096        779µs ± 1%   745µs ± 2%   -4.29%  (p=0.000 n=19+19)
BM_CompileAPIFileDenseDecls<Phase::Lex>/16384      3.44ms ± 1%  3.32ms ± 3%   -3.32%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/65536      14.6ms ± 2%  14.3ms ± 3%   -2.46%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/262144     66.7ms ± 2%  65.0ms ± 4%   -2.52%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      85.7µs ± 2%  71.3µs ± 2%  -16.77%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      421µs ± 2%   352µs ± 2%  -16.38%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.71ms ± 2%  1.44ms ± 2%  -15.89%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.19ms ± 2%  6.10ms ± 2%  -15.24%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    29.8ms ± 2%  25.3ms ± 2%  -14.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    127ms ± 2%   109ms ± 2%  -14.28%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       785µs ± 1%   752µs ± 1%   -4.13%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.71ms ± 1%  1.62ms ± 1%   -5.17%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.28ms ± 1%  4.97ms ± 1%   -6.04%  (p=0.000 n=20+19)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    20.2ms ± 1%  19.0ms ± 2%   -5.98%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    83.8ms ± 1%  78.9ms ± 2%   -5.84%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    354ms ± 1%   335ms ± 1%   -5.41%  (p=0.000 n=19+20)

These are based on looking at our compilation benchmark and looking at function bodies that seem surprising to not get inlined. Note that this will have a bit more impact on x86 where function call overhead (especially due to pushing and popping registers) is a bit higher than Arm. For a recent AMD server, this makes parsing around 15% faster, and full "check" phase 5% faster. Benchmark results: ``` name old cpu/op new cpu/op delta BM_CompileAPIFileDenseDecls<Phase::Lex>/256 40.2µs ± 2% 37.8µs ± 1% -5.89% (p=0.000 n=19+17) BM_CompileAPIFileDenseDecls<Phase::Lex>/1024 190µs ± 2% 181µs ± 2% -4.93% (p=0.000 n=19+18) BM_CompileAPIFileDenseDecls<Phase::Lex>/4096 779µs ± 1% 745µs ± 2% -4.29% (p=0.000 n=19+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/16384 3.44ms ± 1% 3.32ms ± 3% -3.32% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/65536 14.6ms ± 2% 14.3ms ± 3% -2.46% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/262144 66.7ms ± 2% 65.0ms ± 4% -2.52% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/256 85.7µs ± 2% 71.3µs ± 2% -16.77% (p=0.000 n=20+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/1024 421µs ± 2% 352µs ± 2% -16.38% (p=0.000 n=20+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/4096 1.71ms ± 2% 1.44ms ± 2% -15.89% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/16384 7.19ms ± 2% 6.10ms ± 2% -15.24% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/65536 29.8ms ± 2% 25.3ms ± 2% -14.91% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/262144 127ms ± 2% 109ms ± 2% -14.28% (p=0.000 n=20+20) BM_CompileAPIFileDenseDecls<Phase::Check>/256 785µs ± 1% 752µs ± 1% -4.13% (p=0.000 n=20+18) BM_CompileAPIFileDenseDecls<Phase::Check>/1024 1.71ms ± 1% 1.62ms ± 1% -5.17% (p=0.000 n=20+18) BM_CompileAPIFileDenseDecls<Phase::Check>/4096 5.28ms ± 1% 4.97ms ± 1% -6.04% (p=0.000 n=20+19) BM_CompileAPIFileDenseDecls<Phase::Check>/16384 20.2ms ± 1% 19.0ms ± 2% -5.98% (p=0.000 n=20+20) BM_CompileAPIFileDenseDecls<Phase::Check>/65536 83.8ms ± 1% 78.9ms ± 2% -5.84% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/262144 354ms ± 1% 335ms ± 1% -5.41% (p=0.000 n=19+20) ```

josh11b · 2024-09-12T19:50:43Z

toolchain/lex/lex.cpp

 has_leading_space_ = false;
- return buffer_.AddToken(info);


Why wasn't it sufficient to move the definition of this function to tokenized_buffer.h?

It would have been, but it didn't seem worth having the function broken out given that all the code around the call to this was already directly manipulating the private members of the tokenized buffer. This is purely about what is nicer code wise, not a performance thing. The only performance thing was getting it out of the source file.

I'd rather have a method, if just to reduce the duplicated logic.

Ok, switched back. I had found it more confusing to have the mixture of methods manipulating state and direct code manipulations, particularly below in the error recovery code, but can keep it if preferred.

toolchain/parse/context.h

github-actions bot requested a review from josh11b September 12, 2024 19:31

github-actions bot added the toolchain label Sep 12, 2024

chandlerc mentioned this pull request Sep 12, 2024

Move the constants in the precedence table into the header. #4308

Merged

josh11b reviewed Sep 12, 2024

View reviewed changes

address review comments

1b4c234

josh11b approved these changes Sep 15, 2024

View reviewed changes

josh11b added this pull request to the merge queue Sep 15, 2024

Merged via the queue into carbon-language:trunk with commit 06344ae Sep 16, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do some tactical inlining across lexer and parser. #4307

Do some tactical inlining across lexer and parser. #4307

chandlerc commented Sep 12, 2024

josh11b Sep 12, 2024

chandlerc Sep 12, 2024

josh11b Sep 13, 2024

chandlerc Sep 14, 2024

Do some tactical inlining across lexer and parser. #4307

Do some tactical inlining across lexer and parser. #4307

Conversation

chandlerc commented Sep 12, 2024

josh11b Sep 12, 2024

Choose a reason for hiding this comment

chandlerc Sep 12, 2024

Choose a reason for hiding this comment

josh11b Sep 13, 2024

Choose a reason for hiding this comment

chandlerc Sep 14, 2024

Choose a reason for hiding this comment