Add support for stitcher configuration #321

hendrikvanantwerpen · 2023-10-16T17:24:56Z

The static methods on ForwardPartialPathStitcher do not allow configuration of stitcher properties. This PR adds a StitcherConfig type to remedy that.

dcreager

I like the idea of introducing a config struct, on the presumption that we will need to add more configuration knobs in the future. Just a couple of comments on the details:

stack-graphs/src/stitching.rs

dcreager · 2023-10-17T13:14:58Z

stack-graphs/src/c.rs

@@ -1471,6 +1495,7 @@ pub extern "C" fn sg_forward_partial_path_stitcher_from_partial_paths(
    partials: *mut sg_partial_path_arena,
    count: usize,
    initial_partial_paths: *const sg_partial_path,
+    config: sg_stitcher_config,


...but maybe counter-intuitively, here I think you should pass in the config by pointer! (We're not currently worrying about versioning our C ABIs, but if we were to do that, passing by pointer means that the ABI calling convention for this function would not change even if we were to add more fields to the config struct.)

dcreager · 2023-10-17T13:20:33Z

To clarify, would this supersede sg_forward_partial_path_stitcher_set_similar_path_detection? Should we also move max_work_per_phase into the config struct?

hendrikvanantwerpen · 2023-10-17T13:27:05Z

To clarify, would this supersede sg_forward_partial_path_stitcher_set_similar_path_detection?

Yes, I think so.

Should we also move max_work_per_phase into the config struct?

I've been wondering this too. They feel different to me. The similar path detection is required, perhaps not for correctness, at least for feasibility (for certain rule sets). The max work is more an operational thing, more like a cancellation flag even? The similar path setting is something that should be part of a language configuration, I think, while the work per phase certainly not. My idea was that the language configuration would just contain a stitcher configuration value, but perhaps that is not the right approach. It really depends on what other settings we envision? One I could see happening is whether the stitcher tracks statistics or not, but this would also be an operational thing, more like the max work, than this.

dcreager · 2023-10-17T13:42:12Z

I've been wondering this too. They feel different to me.

I hear that, though for me, the "intent" of each knob is less relevant than the fact that there is a knob. So we should standardize on either one of the following, but not a mixture of both:

All configuration knobs are set via separate C functions (like is currently done with set_similar_path_detection and set_max_work_per_phase)
All configuration knobs are set in a single config struct, where a 0 value encodes some kind of reasonable default

The benefit of (1) is that we don't have to finagle a Go-style 0 value that is "a reasonable default" for each knob — the default is in place if you don't call the setter function. The benefit of (2) is that there is less of a proliferation of functions.

My idea was that the language configuration would just contain a stitcher configuration value

This is the part that would suggest a change from (1) to (2). But I think I'd still have a slight preference for (1). You could still have a stitcher config "section" in the language configuration types — i.e., as a separate embedded type, like we're doing with Precise up in the aleph Go wrapper code. You'd have to write the code (which you already have!) to pull out the relevant fields and call the C setter functions if needed. But that doesn't seem so bad. And it means that we can craft the language config type(s) to be exactly the right shape for configuration languages. (Because as you say, some of the stitcher config is operational!)

hendrikvanantwerpen · 2023-10-17T14:00:26Z

And it means that we can craft the language config type(s) to be exactly the right shape for configuration languages. (Because as you say, some of the stitcher config is operational!)

I think you're right. Trying to shoehorn these two use cases into one type doesn't work.

Re (1) and (2), I'll see how that works out for passing things into more high-level functions like the test runner. But perhaps the values from the language configuration will be all we need for that, and the others will always be chosen by the implementation.

github-actions · 2023-11-13T11:23:47Z

Performance Summary

Comparing base d7bb4ad with head a92b9b0 on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before

--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    MB
717.9^                                                                   #    
     |                                                @                  #    
     |                                                @                ::#    
     |                                              ::@                ::#    
     |                                             :: @               :::#:   
     |                          :                  :: @  :           ::::#:   
     |                   ::   ::::                 :: @  :           ::::#::: 
     |                   :    ::::       :        ::: @:::           ::::#::: 
     |                  @:    ::::   :  @:        ::: @:::           ::::#::: 
     |                  @: :@@::::  ::::@:        ::: @:::          @::::#::: 
     |                  @: :@ ::::  ::::@: ::   ::::: @:::     : :::@::::#::: 
     |                 @@: :@ :::: :::::@: :::: : ::: @:::::::@:::::@::::#::: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   52.56

After

--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
1.029^                                                  #                     
     |                                                  #                   : 
     |                                                  #                   ::
     |                                                 :#                   ::
     |                                                 :#             :     ::
     |                                          :     ::#             :    :::
     |                                         @:     ::#           :::    :::
     |                       :                 @:     ::#          @:::   ::::
     |                 ::  ::::              ::@:  :::::#         :@::: ::::::
     |                 : @@::::     @:       : @::::: ::#         :@::::::::::
     |                @: @ ::::   ::@:      @: @:: :: ::#        ::@:::::@::::
     |                @: @ :::::::::@: ::: :@: @:: :: ::#::  @ : ::@:::::@::::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   59.23

github-actions · 2023-11-13T13:58:21Z

Performance Summary

Comparing base d7bb4ad with head ad9c0da on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before

--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    MB
717.9^                                                                   #    
     |                                                @                  #    
     |                                                @                ::#    
     |                                              ::@                ::#    
     |                                             :: @               :::#:   
     |                          :                  :: @  :           ::::#:   
     |                   ::   ::::                 :: @  :           ::::#::: 
     |                   :    ::::       :        ::: @:::           ::::#::: 
     |                  @:    ::::   :  @:        ::: @:::           ::::#::: 
     |                  @: :@@::::  ::::@:        ::: @:::          @::::#::: 
     |                  @: :@ ::::  ::::@: ::   ::::: @:::     : :::@::::#::: 
     |                 @@: :@ :::: :::::@: :::: : ::: @:::::::@:::::@::::#::: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   52.56

After

--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    MB
717.8^                                                                   #    
     |                                                @                  #:   
     |                                                @                  #:   
     |                           @                 :::@                ::#:   
     |                           @                 : :@  :            :::#:   
     |                          :@                 : :@  :            :::#:   
     |                         ::@                :: :@ :::          @:::#: : 
     |                   ::  ::::@       @        :: :@ :::          @:::#::: 
     |                  @: : : ::@   :  :@        :: :@::::          @:::#::: 
     |                  @: ::: ::@  :: ::@    :  ::: :@::::         :@:::#::: 
     |                  @: ::: ::@  :::::@ :: :  ::: :@::::::   : :::@:::#::: 
     |                 @@: ::: ::@  :::::@ : ::  ::: :@::::: : @:::::@:::#::: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   52.50

hendrikvanantwerpen · 2023-11-13T14:08:45Z

The changes ended up like this:

The ForwardPartialPathStitcher retains methods like set_max_work_per_phase. If you are working with a stitcher directly, use these to configure it.
Methods that wrap a stitcher accept an additional StitcherConfig which is applied to the stitcher when it is created.
LanguageConfiguration has a new field has_similar_paths, which can be set to false for languages where similar path detection is not necessary.
As an additional cleanup, special_files was removed from the LanguageConfiguration::from_sources, since it was actually required to build the initial configuration. It can easily be configured on the resulting value. This makes the responsibility of from_sources more clear and makes it less sensitive to additional fields of LanguageConfiguration that may not always be explicitly required.

github-actions · 2023-11-20T12:50:16Z

Performance Summary

Comparing base 3696992 with head e46feb3 on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before

--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
3.276^                                                                     :# 
     |                                                                     :# 
     |                                                                     :# 
     |                                                                     :#:
     |                                                                     :#:
     |                                                                     :#:
     |                                             :::@                 ::::#:
     |                                             :  @                 :: :#:
     |                                        ::@@@:  @                 :: :#:
     |                                        : @  :  @               :::: :#:
     |                                 @:   ::: @  :  @         :::  :@ :: :#:
     |              :  ::@           :@@::::::: @  :  @       ::@:::::@ :: :#:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   73.09

After

--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
1.702^                                             ::::#                      
     |                                             :   #                      
     |                                             :   #                      
     |                                             :   #                     :
     |                                           :::   #                    ::
     |                                        ::@: :   #                    ::
     |                                        : @: :   #                ::::::
     |                                       :: @: :   #                :   ::
     |                                  @   ::: @: :   #         ::     :   ::
     |                                @:@   ::: @: :   #       @::: :::::   ::
     |              @  :::     :     :@:@:::::: @: :   #       @:::@::: :   ::
     |             @@::::: ::@:::  :::@:@:::::: @: :   #      :@:::@::: :   ::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   74.01

github-actions · 2023-11-20T14:11:19Z

Performance Summary

Comparing base 3696992 with head 405a28b on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before

--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
3.276^                                                                     :# 
     |                                                                     :# 
     |                                                                     :# 
     |                                                                     :#:
     |                                                                     :#:
     |                                                                     :#:
     |                                             :::@                 ::::#:
     |                                             :  @                 :: :#:
     |                                        ::@@@:  @                 :: :#:
     |                                        : @  :  @               :::: :#:
     |                                 @:   ::: @  :  @         :::  :@ :: :#:
     |              :  ::@           :@@::::::: @  :  @       ::@:::::@ :: :#:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   73.09

After

--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
1.406^                                                #                       
     |                                                #                     : 
     |                                             :::#                     : 
     |                                          ::::  #                     : 
     |                                          : ::  #                     : 
     |                                          : ::  #                 ::::: 
     |                                         :: ::  #                 :   : 
     |                                   @:   ::: ::  #          :     ::   : 
     |                    @            @:@:   ::: ::  #        :@:   @@::   : 
     |              ::   :@            @:@:  :::: ::  #       ::@: : @ ::   : 
     |              : ::::@     :      @:@::@:::: ::  #      :::@::: @ ::   : 
     |             @: ::::@ ::@:: @ : :@:@::@:::: ::  #  :   :::@::::@ ::   : 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   70.81

…stitcher itself

github-actions · 2023-11-20T17:41:07Z

Performance Summary

Comparing base 5a6744b with head 294596a on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before

--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
3.276^                                                                  :##   
     |                                                                  :#    
     |                                                                  :#  : 
     |                                                                  :# :: 
     |                                                                  :# :: 
     |                                                                  :# :: 
     |                                           :::@                ::::# :: 
     |                                           :  @                :@ :# :: 
     |                                      ::@@@:  @              :::@ :# :: 
     |                                      : @  :  @              : :@ :# :: 
     |                                @   ::: @  :  @        ::@  :: :@ :# :: 
     |             :  :::     :     @@@:::::: @  :  @       :::@:::: :@ :# :: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   78.25

After

--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
1.702^                                             ::::#                      
     |                                             :   #                      
     |                                             :   #                      
     |                                             :   #                     :
     |                                           @@:   #                    ::
     |                                          :@ :   #                    ::
     |                                        :::@ :   #                @@@@::
     |                                       :: :@ :   #                @   ::
     |                                  @   ::: :@ :   #        :::     @   ::
     |                   @            @:@   ::: :@ :   #       ::::   ::@   ::
     |              :  ::@     :     :@:@:::::: :@ :   #       @:::@::: @   ::
     |             @:::::@ ::@:::    :@:@:::::: :@ :   #     ::@:::@::: @   ::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   73.95

dcreager

Just one nit, otherwise 👍

dcreager · 2023-11-21T13:31:03Z

languages/tree-sitter-stack-graphs-java/Cargo.toml

@@ -36,5 +36,6 @@ harness = false # need to provide own main function to handle running tests
 [dependencies]
 anyhow = "1.0"
 clap = { version = "4", features = ["derive"] }
+stack-graphs = { version = "0.12", path = "../../stack-graphs" }


Is this spurious? I don't see a change in the crate's Rust code that needs to refer to stack-graphs directly

Good catch!

dcreager · 2023-11-21T13:33:19Z

stack-graphs/src/c.rs

+
+impl Into<StitcherConfig> for sg_stitcher_config {
+    fn into(self) -> StitcherConfig {
+        StitcherConfig::default().with_detect_similar_paths(self.detect_similar_paths)


Ah this is nice, it means we don't have to #[repr(C)] the Rust type

hendrikvanantwerpen self-assigned this Oct 16, 2023

hendrikvanantwerpen linked an issue Oct 17, 2023 that may be closed by this pull request

Allow users to pass along stitcher configuration #322

Closed

dcreager reviewed Oct 17, 2023

View reviewed changes

hendrikvanantwerpen mentioned this pull request Oct 26, 2023

Collect stitching and database stats #326

Merged

hendrikvanantwerpen force-pushed the stitcher-config branch from 9069c45 to 064155e Compare November 13, 2023 11:03

hendrikvanantwerpen changed the base branch from main to leaner-similar-paths-detection November 13, 2023 11:07

hendrikvanantwerpen force-pushed the stitcher-config branch 3 times, most recently from 4921589 to a92b9b0 Compare November 13, 2023 11:16

hendrikvanantwerpen mentioned this pull request Nov 13, 2023

Reduce stored similar paths and checked cycles #346

Merged

hendrikvanantwerpen marked this pull request as ready for review November 13, 2023 14:01

hendrikvanantwerpen requested a review from a team as a code owner November 13, 2023 14:01

Base automatically changed from leaner-similar-paths-detection to main November 13, 2023 16:54

hendrikvanantwerpen force-pushed the stitcher-config branch from 1ffe3cf to e46feb3 Compare November 20, 2023 12:43

hendrikvanantwerpen requested a review from dcreager November 20, 2023 16:45

hendrikvanantwerpen added 6 commits November 20, 2023 18:34

Add support for stitcher configuration

87d312a

Use stitcher config only for functions that wrap a stitcher, not for …

2216fda

…stitcher itself

Add similar path setting to language configuration

098e03a

Update project init

966f613

Do not use transmute for stitcher config

3d32fbb

Change setting name and add comment explaining the consequences

294596a

hendrikvanantwerpen force-pushed the stitcher-config branch from 405a28b to 294596a Compare November 20, 2023 17:35

dcreager approved these changes Nov 21, 2023

View reviewed changes

Remove unnecessary dependency

63bf3a5

hendrikvanantwerpen merged commit e1b4d44 into main Nov 21, 2023
10 checks passed

hendrikvanantwerpen deleted the stitcher-config branch November 21, 2023 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for stitcher configuration #321

Add support for stitcher configuration #321

hendrikvanantwerpen commented Oct 16, 2023 •

edited

Loading

dcreager left a comment

dcreager Oct 17, 2023

dcreager commented Oct 17, 2023

hendrikvanantwerpen commented Oct 17, 2023

dcreager commented Oct 17, 2023

hendrikvanantwerpen commented Oct 17, 2023

github-actions bot commented Nov 13, 2023

github-actions bot commented Nov 13, 2023

hendrikvanantwerpen commented Nov 13, 2023

github-actions bot commented Nov 20, 2023

github-actions bot commented Nov 20, 2023

github-actions bot commented Nov 20, 2023

dcreager left a comment

dcreager Nov 21, 2023

hendrikvanantwerpen Nov 21, 2023

dcreager Nov 21, 2023

Add support for stitcher configuration #321

Add support for stitcher configuration #321

Conversation

hendrikvanantwerpen commented Oct 16, 2023 • edited Loading

dcreager left a comment

Choose a reason for hiding this comment

dcreager Oct 17, 2023

Choose a reason for hiding this comment

dcreager commented Oct 17, 2023

hendrikvanantwerpen commented Oct 17, 2023

dcreager commented Oct 17, 2023

hendrikvanantwerpen commented Oct 17, 2023

github-actions bot commented Nov 13, 2023

Performance Summary

github-actions bot commented Nov 13, 2023

Performance Summary

hendrikvanantwerpen commented Nov 13, 2023

github-actions bot commented Nov 20, 2023

Performance Summary

github-actions bot commented Nov 20, 2023

Performance Summary

github-actions bot commented Nov 20, 2023

Performance Summary

dcreager left a comment

Choose a reason for hiding this comment

dcreager Nov 21, 2023

Choose a reason for hiding this comment

hendrikvanantwerpen Nov 21, 2023

Choose a reason for hiding this comment

dcreager Nov 21, 2023

Choose a reason for hiding this comment

hendrikvanantwerpen commented Oct 16, 2023 •

edited

Loading