Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for stitcher configuration #321

Merged
merged 7 commits into from
Nov 21, 2023
Merged

Conversation

hendrikvanantwerpen
Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen commented Oct 16, 2023

« #346 | #326 »

The static methods on ForwardPartialPathStitcher do not allow configuration of stitcher properties. This PR adds a StitcherConfig type to remedy that.

@hendrikvanantwerpen hendrikvanantwerpen self-assigned this Oct 16, 2023
@hendrikvanantwerpen hendrikvanantwerpen linked an issue Oct 17, 2023 that may be closed by this pull request
Copy link
Member

@dcreager dcreager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of introducing a config struct, on the presumption that we will need to add more configuration knobs in the future. Just a couple of comments on the details:

stack-graphs/src/stitching.rs Outdated Show resolved Hide resolved
stack-graphs/src/stitching.rs Outdated Show resolved Hide resolved
@@ -1471,6 +1495,7 @@ pub extern "C" fn sg_forward_partial_path_stitcher_from_partial_paths(
partials: *mut sg_partial_path_arena,
count: usize,
initial_partial_paths: *const sg_partial_path,
config: sg_stitcher_config,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but maybe counter-intuitively, here I think you should pass in the config by pointer! (We're not currently worrying about versioning our C ABIs, but if we were to do that, passing by pointer means that the ABI calling convention for this function would not change even if we were to add more fields to the config struct.)

@dcreager
Copy link
Member

To clarify, would this supersede sg_forward_partial_path_stitcher_set_similar_path_detection? Should we also move max_work_per_phase into the config struct?

@hendrikvanantwerpen
Copy link
Collaborator Author

To clarify, would this supersede sg_forward_partial_path_stitcher_set_similar_path_detection?

Yes, I think so.

Should we also move max_work_per_phase into the config struct?

I've been wondering this too. They feel different to me. The similar path detection is required, perhaps not for correctness, at least for feasibility (for certain rule sets). The max work is more an operational thing, more like a cancellation flag even? The similar path setting is something that should be part of a language configuration, I think, while the work per phase certainly not. My idea was that the language configuration would just contain a stitcher configuration value, but perhaps that is not the right approach. It really depends on what other settings we envision? One I could see happening is whether the stitcher tracks statistics or not, but this would also be an operational thing, more like the max work, than this.

@dcreager
Copy link
Member

I've been wondering this too. They feel different to me.

I hear that, though for me, the "intent" of each knob is less relevant than the fact that there is a knob. So we should standardize on either one of the following, but not a mixture of both:

  1. All configuration knobs are set via separate C functions (like is currently done with set_similar_path_detection and set_max_work_per_phase)
  2. All configuration knobs are set in a single config struct, where a 0 value encodes some kind of reasonable default

The benefit of (1) is that we don't have to finagle a Go-style 0 value that is "a reasonable default" for each knob — the default is in place if you don't call the setter function. The benefit of (2) is that there is less of a proliferation of functions.

My idea was that the language configuration would just contain a stitcher configuration value

This is the part that would suggest a change from (1) to (2). But I think I'd still have a slight preference for (1). You could still have a stitcher config "section" in the language configuration types — i.e., as a separate embedded type, like we're doing with Precise up in the aleph Go wrapper code. You'd have to write the code (which you already have!) to pull out the relevant fields and call the C setter functions if needed. But that doesn't seem so bad. And it means that we can craft the language config type(s) to be exactly the right shape for configuration languages. (Because as you say, some of the stitcher config is operational!)

@hendrikvanantwerpen
Copy link
Collaborator Author

And it means that we can craft the language config type(s) to be exactly the right shape for configuration languages. (Because as you say, some of the stitcher config is operational!)

I think you're right. Trying to shoehorn these two use cases into one type doesn't work.

Re (1) and (2), I'll see how that works out for passing things into more high-level functions like the test runner. But perhaps the values from the language configuration will be all we need for that, and the others will always be chosen by the implementation.

@hendrikvanantwerpen hendrikvanantwerpen changed the base branch from main to leaner-similar-paths-detection November 13, 2023 11:07
@hendrikvanantwerpen hendrikvanantwerpen force-pushed the stitcher-config branch 3 times, most recently from 4921589 to a92b9b0 Compare November 13, 2023 11:16
Copy link

Performance Summary

Comparing base d7bb4ad with head a92b9b0 on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before
--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    MB
717.9^                                                                   #    
     |                                                @                  #    
     |                                                @                ::#    
     |                                              ::@                ::#    
     |                                             :: @               :::#:   
     |                          :                  :: @  :           ::::#:   
     |                   ::   ::::                 :: @  :           ::::#::: 
     |                   :    ::::       :        ::: @:::           ::::#::: 
     |                  @:    ::::   :  @:        ::: @:::           ::::#::: 
     |                  @: :@@::::  ::::@:        ::: @:::          @::::#::: 
     |                  @: :@ ::::  ::::@: ::   ::::: @:::     : :::@::::#::: 
     |                 @@: :@ :::: :::::@: :::: : ::: @:::::::@:::::@::::#::: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   52.56
After
--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
1.029^                                                  #                     
     |                                                  #                   : 
     |                                                  #                   ::
     |                                                 :#                   ::
     |                                                 :#             :     ::
     |                                          :     ::#             :    :::
     |                                         @:     ::#           :::    :::
     |                       :                 @:     ::#          @:::   ::::
     |                 ::  ::::              ::@:  :::::#         :@::: ::::::
     |                 : @@::::     @:       : @::::: ::#         :@::::::::::
     |                @: @ ::::   ::@:      @: @:: :: ::#        ::@:::::@::::
     |                @: @ :::::::::@: ::: :@: @:: :: ::#::  @ : ::@:::::@::::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   59.23

Copy link

Performance Summary

Comparing base d7bb4ad with head ad9c0da on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before
--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    MB
717.9^                                                                   #    
     |                                                @                  #    
     |                                                @                ::#    
     |                                              ::@                ::#    
     |                                             :: @               :::#:   
     |                          :                  :: @  :           ::::#:   
     |                   ::   ::::                 :: @  :           ::::#::: 
     |                   :    ::::       :        ::: @:::           ::::#::: 
     |                  @:    ::::   :  @:        ::: @:::           ::::#::: 
     |                  @: :@@::::  ::::@:        ::: @:::          @::::#::: 
     |                  @: :@ ::::  ::::@: ::   ::::: @:::     : :::@::::#::: 
     |                 @@: :@ :::: :::::@: :::: : ::: @:::::::@:::::@::::#::: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   52.56
After
--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    MB
717.8^                                                                   #    
     |                                                @                  #:   
     |                                                @                  #:   
     |                           @                 :::@                ::#:   
     |                           @                 : :@  :            :::#:   
     |                          :@                 : :@  :            :::#:   
     |                         ::@                :: :@ :::          @:::#: : 
     |                   ::  ::::@       @        :: :@ :::          @:::#::: 
     |                  @: : : ::@   :  :@        :: :@::::          @:::#::: 
     |                  @: ::: ::@  :: ::@    :  ::: :@::::         :@:::#::: 
     |                  @: ::: ::@  :::::@ :: :  ::: :@::::::   : :::@:::#::: 
     |                 @@: ::: ::@  :::::@ : ::  ::: :@::::: : @:::::@:::#::: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   52.50

@hendrikvanantwerpen hendrikvanantwerpen marked this pull request as ready for review November 13, 2023 14:01
@hendrikvanantwerpen hendrikvanantwerpen requested a review from a team as a code owner November 13, 2023 14:01
@hendrikvanantwerpen
Copy link
Collaborator Author

The changes ended up like this:

  • The ForwardPartialPathStitcher retains methods like set_max_work_per_phase. If you are working with a stitcher directly, use these to configure it.
  • Methods that wrap a stitcher accept an additional StitcherConfig which is applied to the stitcher when it is created.
  • LanguageConfiguration has a new field has_similar_paths, which can be set to false for languages where similar path detection is not necessary.
  • As an additional cleanup, special_files was removed from the LanguageConfiguration::from_sources, since it was actually required to build the initial configuration. It can easily be configured on the resulting value. This makes the responsibility of from_sources more clear and makes it less sensitive to additional fields of LanguageConfiguration that may not always be explicitly required.

Base automatically changed from leaner-similar-paths-detection to main November 13, 2023 16:54
Copy link

Performance Summary

Comparing base 3696992 with head e46feb3 on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before
--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
3.276^                                                                     :# 
     |                                                                     :# 
     |                                                                     :# 
     |                                                                     :#:
     |                                                                     :#:
     |                                                                     :#:
     |                                             :::@                 ::::#:
     |                                             :  @                 :: :#:
     |                                        ::@@@:  @                 :: :#:
     |                                        : @  :  @               :::: :#:
     |                                 @:   ::: @  :  @         :::  :@ :: :#:
     |              :  ::@           :@@::::::: @  :  @       ::@:::::@ :: :#:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   73.09
After
--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
1.702^                                             ::::#                      
     |                                             :   #                      
     |                                             :   #                      
     |                                             :   #                     :
     |                                           :::   #                    ::
     |                                        ::@: :   #                    ::
     |                                        : @: :   #                ::::::
     |                                       :: @: :   #                :   ::
     |                                  @   ::: @: :   #         ::     :   ::
     |                                @:@   ::: @: :   #       @::: :::::   ::
     |              @  :::     :     :@:@:::::: @: :   #       @:::@::: :   ::
     |             @@::::: ::@:::  :::@:@:::::: @: :   #      :@:::@::: :   ::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   74.01

Copy link

Performance Summary

Comparing base 3696992 with head 405a28b on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before
--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
3.276^                                                                     :# 
     |                                                                     :# 
     |                                                                     :# 
     |                                                                     :#:
     |                                                                     :#:
     |                                                                     :#:
     |                                             :::@                 ::::#:
     |                                             :  @                 :: :#:
     |                                        ::@@@:  @                 :: :#:
     |                                        : @  :  @               :::: :#:
     |                                 @:   ::: @  :  @         :::  :@ :: :#:
     |              :  ::@           :@@::::::: @  :  @       ::@:::::@ :: :#:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   73.09
After
--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
1.406^                                                #                       
     |                                                #                     : 
     |                                             :::#                     : 
     |                                          ::::  #                     : 
     |                                          : ::  #                     : 
     |                                          : ::  #                 ::::: 
     |                                         :: ::  #                 :   : 
     |                                   @:   ::: ::  #          :     ::   : 
     |                    @            @:@:   ::: ::  #        :@:   @@::   : 
     |              ::   :@            @:@:  :::: ::  #       ::@: : @ ::   : 
     |              : ::::@     :      @:@::@:::: ::  #      :::@::: @ ::   : 
     |             @: ::::@ ::@:: @ : :@:@::@:::: ::  #  :   :::@::::@ ::   : 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   70.81

Copy link

Performance Summary

Comparing base 5a6744b with head 294596a on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before
--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
3.276^                                                                  :##   
     |                                                                  :#    
     |                                                                  :#  : 
     |                                                                  :# :: 
     |                                                                  :# :: 
     |                                                                  :# :: 
     |                                           :::@                ::::# :: 
     |                                           :  @                :@ :# :: 
     |                                      ::@@@:  @              :::@ :# :: 
     |                                      : @  :  @              : :@ :# :: 
     |                                @   ::: @  :  @        ::@  :: :@ :# :: 
     |             :  :::     :     @@@:::::: @  :  @       :::@:::: :@ :# :: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   78.25
After
--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
1.702^                                             ::::#                      
     |                                             :   #                      
     |                                             :   #                      
     |                                             :   #                     :
     |                                           @@:   #                    ::
     |                                          :@ :   #                    ::
     |                                        :::@ :   #                @@@@::
     |                                       :: :@ :   #                @   ::
     |                                  @   ::: :@ :   #        :::     @   ::
     |                   @            @:@   ::: :@ :   #       ::::   ::@   ::
     |              :  ::@     :     :@:@:::::: :@ :   #       @:::@::: @   ::
     |             @:::::@ ::@:::    :@:@:::::: :@ :   #     ::@:::@::: @   ::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   73.95

Copy link
Member

@dcreager dcreager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, otherwise 👍

@@ -36,5 +36,6 @@ harness = false # need to provide own main function to handle running tests
[dependencies]
anyhow = "1.0"
clap = { version = "4", features = ["derive"] }
stack-graphs = { version = "0.12", path = "../../stack-graphs" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this spurious? I don't see a change in the crate's Rust code that needs to refer to stack-graphs directly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!


impl Into<StitcherConfig> for sg_stitcher_config {
fn into(self) -> StitcherConfig {
StitcherConfig::default().with_detect_similar_paths(self.detect_similar_paths)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is nice, it means we don't have to #[repr(C)] the Rust type

@hendrikvanantwerpen hendrikvanantwerpen merged commit e1b4d44 into main Nov 21, 2023
10 checks passed
@hendrikvanantwerpen hendrikvanantwerpen deleted the stitcher-config branch November 21, 2023 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow users to pass along stitcher configuration
2 participants