Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git for-each-ref: is-base atom and base branches #1768

Closed
wants to merge 4 commits into from

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Aug 1, 2024

This change introduces a new 'git for-each-ref' atom, 'is-base', in a very similar way to the 'ahead-behind' atom. As detailed carefully in the first change, this is motivated by the need to detect the concept of a "base branch" in a repository with multiple long-lived branches.

This change is motivated by a third-party tool created to make this detection with the same optimization mechanism, but using a much slower technique due to the limitations of the Git CLI not presenting this information. The existing algorithm involves using git rev-list --first-parent -<N> in batches for the collection of considered references, comparing those lists, and increasing <N> as needed until finding a collision. This new use of 'git for-each-ref' will allow determining this mechanism within a single process and walking a minimal number of commits.

There are benefits to users both on client-side and server-side. In an internal monorepo, this base branch detection algorithm is used to determine a long-lived branch based on the HEAD commit, mapping to a group within the organizational structure of the repository, which determines a set of projects that the user will likely need to build; this leads to automatically selecting an initial sparse-checkout definition based on the build dependencies required. An upcoming feature in Azure Repos will use this algorithm to automatically create a pull request against the correct target branch, reducing user pain from needing to select a different branch after a large commit diff is rendered against the default branch. This atom unlocks that ability for Git hosting services that use Git in their backend.

Thanks, -Stolee

Updates in v2

  • I had forgotten to include a documentation change in v1. My attempt to create a succinct doc change in a follow-up hunk continued to be confusing. This version includes a more expanded version of the documentation blurb for the is-base token.

Updates in v3

  • Corrected some grammar in a commit message.
  • Fixed (and tested for) a bug where the source branch is equal to a candidate ref.
  • Added a test in t6500-for-each-ref.sh to cover some non-commit refs and some broken objects.
  • Motivated by the test in t6500, add a new patch that adds a ..._gently() method to reduce error noise for non-commit refs.

cc: gitster@pobox.com
cc: vdye@github.com

@derrickstolee derrickstolee self-assigned this Aug 1, 2024
Copy link

gitgitgadget bot commented Aug 1, 2024

There are issues in commit 3ff144e:
commit-reach: add get_branch_base_for_tip
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

gitgitgadget bot commented Aug 1, 2024

There are issues in commit 22b72ab:
for-each-ref: add 'is-base' token
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

gitgitgadget bot commented Aug 1, 2024

There are issues in commit 3ff144e:
commit-reach: add get_branch_base_for_tip
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

gitgitgadget bot commented Aug 1, 2024

There are issues in commit 6b1e6c4:
for-each-ref: add 'is-base' token
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

gitgitgadget bot commented Aug 1, 2024

There are issues in commit 23f18f1:
commit-reach: add get_branch_base_for_tip
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

gitgitgadget bot commented Aug 1, 2024

There are issues in commit 9ccfdd6:
for-each-ref: add 'is-base' token
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

gitgitgadget bot commented Aug 1, 2024

There are issues in commit 7631c3d:
p1500: add is-base performance tests
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

@derrickstolee
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Aug 1, 2024

Submitted as pull.1768.git.1722550226.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1768/derrickstolee/target-ref-v1

To fetch this version to local tag pr-1768/derrickstolee/target-ref-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1768/derrickstolee/target-ref-v1

Copy link

gitgitgadget bot commented Aug 1, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Derrick Stolee (3):
>   commit-reach: add get_branch_base_for_tip
>   for-each-ref: add 'is-base' token
>   p1500: add is-base performance tests
>
>  commit-reach.c              | 118 ++++++++++++++++++++++++++++++++++++
>  commit-reach.h              |  17 ++++++
>  ref-filter.c                |  78 +++++++++++++++++++++++-
>  ref-filter.h                |  15 +++++
>  t/helper/test-reach.c       |   2 +
>  t/perf/p1500-graph-walks.sh |  31 ++++++++++
>  t/t6600-test-reach.sh       |  94 ++++++++++++++++++++++++++++
>  7 files changed, 354 insertions(+), 1 deletion(-)

I was expecting to see an documentation update to for-each-ref (and
probably branch and tag) so that what this new atom means.  Is it
that %(is-base:<commit>) interpolates to <commit> for a ref that is an
ancestor of <commit>, and interpolates to an empty string for a ref
that is not, or something?

Thanks.

Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/1/24 7:06 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >> Derrick Stolee (3):
>>    commit-reach: add get_branch_base_for_tip
>>    for-each-ref: add 'is-base' token
>>    p1500: add is-base performance tests
>>
>>   commit-reach.c              | 118 ++++++++++++++++++++++++++++++++++++
>>   commit-reach.h              |  17 ++++++
>>   ref-filter.c                |  78 +++++++++++++++++++++++-
>>   ref-filter.h                |  15 +++++
>>   t/helper/test-reach.c       |   2 +
>>   t/perf/p1500-graph-walks.sh |  31 ++++++++++
>>   t/t6600-test-reach.sh       |  94 ++++++++++++++++++++++++++++
>>   7 files changed, 354 insertions(+), 1 deletion(-)
> > I was expecting to see an documentation update to for-each-ref (and
> probably branch and tag) so that what this new atom means.  Is it
> that %(is-base:<commit>) interpolates to <commit> for a ref that is an
> ancestor of <commit>, and interpolates to an empty string for a ref
> that is not, or something?

You are absolutely right that I missed this crucial detail. I will
eventually send a v2 with this oversight corrected. For now, please
consider this documentation diff, and I look forward to other review
comments that I can use to improve this series before sending v2.

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index c1dd12b93cf..5154ba3e2a7 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -264,6 +264,16 @@ ahead-behind:<committish>::
 	commits ahead and behind, respectively, when comparing the output
 	ref to the `<committish>` specified in the format.

+is-base:<committish>::
+	In at most one row, `(<committish>)` will appear to indicate the ref
+	that minimizes the number of commits in the first-parent history of
+	`<committish>` and not in the first-parent history of the ref. Ties
+	are broken by favoring the earliest ref in the list. Note that this
+	token will not appear if the first-parent history of `<committish>`
+	does not intersect the first-parent histories of the filtered refs.
+	This can be used as a heuristic to guess which of the filtered refs
+	was used as the base of the branch that produced `<committish>`.
+
 describe[:options]::
 	A human-readable name, like linkgit:git-describe[1];
 	empty string for undescribable commits. The `describe` string may

Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

> consider this documentation diff, and I look forward to other review
> comments that I can use to improve this series before sending v2.
>
> diff --git a/Documentation/git-for-each-ref.txt
> b/Documentation/git-for-each-ref.txt
> index c1dd12b93cf..5154ba3e2a7 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -264,6 +264,16 @@ ahead-behind:<committish>::
>  	commits ahead and behind, respectively, when comparing the output
>  	ref to the `<committish>` specified in the format.
>
> +is-base:<committish>::
> +	In at most one row, `(<committish>)` will appear to indicate the ref
> +	that minimizes the number of commits in the first-parent history of
> +	`<committish>` and not in the first-parent history of the ref. Ties
> +	are broken by favoring the earliest ref in the list. Note that this
> +	token will not appear if the first-parent history of `<committish>`
> +	does not intersect the first-parent histories of the filtered refs.
> +	This can be used as a heuristic to guess which of the filtered refs
> +	was used as the base of the branch that produced `<committish>`.
> +

OK.  Knowing what definition you used is crucial when reading the
implementation, as we cannot tell what you wanted to implement
without it ;-)

Thanks.

Copy link

gitgitgadget bot commented Aug 2, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

>> +is-base:<committish>::
>> +	In at most one row, `(<committish>)` will appear to indicate the ref
>> +	that minimizes the number of commits in the first-parent history of
>> +	`<committish>` and not in the first-parent history of the ref.

This was a bit too dense for me to grok.  So if I have a <commit>
that is at the tip of a branch B forked from 'master', and then
'master' advanced by a lot since the branch forked, the number this
is minimizing for 'master' is the commits on the branch B, but when
showing 'maint', then even though the branch B may have the tip of
'maint' as an ancestor, the number for 'maint' would be a lot more
than the number for 'master'.  If there were another branch C that
was forked from 'master' and shared some (or all) commits that are
near the tip of branch B, e.g.

    ---o---o---o---o---o---o---o---o---o 'master'
            \
             o---o---o---o 'C'
                      \    
                       o---o---o---o 'B'

then the number may be even smaller for branch 'C' than 'master'.

And for at most one ref, %(is-base:<commit>) becomes "(<commit>)";
for all other refs, it becomes an empty string.

OK.

> OK.  Knowing what definition you used is crucial when reading the
> implementation, as we cannot tell what you wanted to implement
> without it ;-)
>
> Thanks.

Copy link

gitgitgadget bot commented Aug 2, 2024

This patch series was integrated into seen via git@6021df1.

@gitgitgadget gitgitgadget bot added the seen label Aug 2, 2024
Copy link

gitgitgadget bot commented Aug 3, 2024

This patch series was integrated into seen via git@cff39cd.

Copy link

gitgitgadget bot commented Aug 5, 2024

This branch is now known as ds/for-each-ref-is-base.

Copy link

gitgitgadget bot commented Aug 5, 2024

This patch series was integrated into seen via git@6912159.

Copy link

gitgitgadget bot commented Aug 5, 2024

This patch series was integrated into seen via git@813637c.

Copy link

gitgitgadget bot commented Aug 5, 2024

There was a status update in the "New Topics" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Expecting a reroll.
source: <pull.1768.git.1722550226.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 6, 2024

This patch series was integrated into seen via git@2de25db.

Copy link

gitgitgadget bot commented Aug 8, 2024

This patch series was integrated into seen via git@f1df888.

Copy link

gitgitgadget bot commented Aug 8, 2024

This patch series was integrated into seen via git@259918f.

Copy link

gitgitgadget bot commented Aug 9, 2024

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Expecting a reroll.
source: <pull.1768.git.1722550226.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 9, 2024

This patch series was integrated into seen via git@21eb9b8.

The lookup_commit_reference_by_name() method uses lookup_commit_reference()
without an option to use lookup_commit_reference_gently(). Create a gentle
version of the method so it can be used in locations where non-commits may
be found but error messages should be silenced.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The previous change introduced the get_branch_base_for_tip() method in
commit-reach.c. The motivation of that change was about using a heuristic to
deteremine the base branch for a source commit from a list of candidate
commit tips. This change makes that algorithm visible to users via a new
atom in the 'git for-each-ref' format. This change is very similar to the
chang in 49abcd2 (for-each-ref: add ahead-behind format atom,
2023-03-20).

Introduce the 'is-base:<source>' atom, which will indicate that the
algorithm should be computed and the result of the algorithm is reported
using an indicator of the form '(<source>)'. For example, using
'%(is-base:HEAD)' would result in one line having the token '(HEAD)'.

Use the sorted order of refs included in the ref filter to break ties in the
algorithm's heuristic. In the previous change, the motivating examples
include using an L0 trunk, long-lived L1 branches, and temporary release
branches. A caller could communicate the ordered preference among these
categories using the input refpecs and avoiding a different sort mechanism.
This sorting behavior is tested in the test scripts.

It is important to include this atom as a special case to
can_do_iterative_format() to match the expectations created in bd98f97
(ref-filter.c: filter & format refs in the same callback, 2023-11-14). The
ahead-behind atom was one of the special cases, and this similarly requires
using an algorithm across all input refs before starting the format of any
single ref.

In the test script, the format tokens use colons or lack whitespace to avoid
Git complaining about trailing whitespace errors.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
The previous two changes introduced a commit walking heuristic for finding
the most likely base branch for a given source. This algorithm walks
first-parent histories until reaching a collision.

This walk _should_ be very fast. Exceptions include cases where a
commit-graph file does not exist, leading to a full walk of all reachable
commits to compute generation numbers, or a case where no collision in the
first-parent history exists, leading to a walk of all first-parent history
to the root commits.

The p1500 test script guarantees a complete commit-graph file during its
setup, so we will not test that scenario. Do create a new root commit in an
effort to test the scenario of parallel first-parent histories.

Even with the extra root commit, these tests take no longer than 0.02
seconds on my machine for the Git repository. However, the results are
slightly more interesting in a copy of the Linux kernel repository:

Test
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref              0.12
1500.3: ahead-behind counts: git branch                    0.12
1500.4: ahead-behind counts: git tag                       0.12
1500.5: contains: git for-each-ref --merged                0.04
1500.6: contains: git branch --merged                      0.04
1500.7: contains: git tag --merged                         0.04
1500.8: is-base check: test-tool reach (refs)              0.03
1500.9: is-base check: test-tool reach (tags)              0.03
1500.10: is-base check: git for-each-ref                   0.03
1500.11: is-base check: git for-each-ref (disjoint-base)   0.07

Signed-off-by: Derrick Stolee <stolee@gmail.com>
@derrickstolee
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Aug 14, 2024

Submitted as pull.1768.v3.git.1723631490.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1768/derrickstolee/target-ref-v3

To fetch this version to local tag pr-1768/derrickstolee/target-ref-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1768/derrickstolee/target-ref-v3

Copy link

gitgitgadget bot commented Aug 15, 2024

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Comments?
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 16, 2024

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Comments?
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 17, 2024

This patch series was integrated into seen via git@58280ea.

Copy link

gitgitgadget bot commented Aug 19, 2024

This patch series was integrated into seen via git@996b87a.

Copy link

gitgitgadget bot commented Aug 19, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> There are benefits to users both on client-side and server-side. In an
> internal monorepo, this base branch detection algorithm is used to determine
> a long-lived branch based on the HEAD commit, mapping to a group within the
> organizational structure of the repository, which determines a set of
> projects that the user will likely need to build; this leads to
> automatically selecting an initial sparse-checkout definition based on the
> build dependencies required. An upcoming feature in Azure Repos will use
> this algorithm to automatically create a pull request against the correct
> target branch, reducing user pain from needing to select a different branch
> after a large commit diff is rendered against the default branch. This atom
> unlocks that ability for Git hosting services that use Git in their backend.

Thanks for an update.  This iteration looks good to me.

Copy link

gitgitgadget bot commented Aug 20, 2024

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Will merge to 'next'.
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 20, 2024

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/19/24 3:52 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >> There are benefits to users both on client-side and server-side. In an
>> internal monorepo, this base branch detection algorithm is used to determine
>> a long-lived branch based on the HEAD commit, mapping to a group within the
>> organizational structure of the repository, which determines a set of
>> projects that the user will likely need to build; this leads to
>> automatically selecting an initial sparse-checkout definition based on the
>> build dependencies required. An upcoming feature in Azure Repos will use
>> this algorithm to automatically create a pull request against the correct
>> target branch, reducing user pain from needing to select a different branch
>> after a large commit diff is rendered against the default branch. This atom
>> unlocks that ability for Git hosting services that use Git in their backend.
> > Thanks for an update.  This iteration looks good to me.

Thank you for your careful review.

-Stolee

Copy link

gitgitgadget bot commented Aug 20, 2024

This patch series was integrated into seen via git@918c769.

Copy link

gitgitgadget bot commented Aug 20, 2024

This patch series was integrated into next via git@dd5da48.

@gitgitgadget gitgitgadget bot added the next label Aug 20, 2024
Copy link

gitgitgadget bot commented Aug 20, 2024

This patch series was integrated into seen via git@d9c0d90.

Copy link

gitgitgadget bot commented Aug 21, 2024

This patch series was integrated into seen via git@c42f26f.

Copy link

gitgitgadget bot commented Aug 21, 2024

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Will merge to 'master'.
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 22, 2024

This patch series was integrated into seen via git@36102ce.

Copy link

gitgitgadget bot commented Aug 23, 2024

This patch series was integrated into seen via git@f271dbd.

Copy link

gitgitgadget bot commented Aug 23, 2024

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Will merge to 'master'.
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 25, 2024

This patch series was integrated into seen via git@a8e7fad.

Copy link

gitgitgadget bot commented Aug 26, 2024

This patch series was integrated into seen via git@3222718.

Copy link

gitgitgadget bot commented Aug 26, 2024

This patch series was integrated into master via git@3222718.

Copy link

gitgitgadget bot commented Aug 26, 2024

This patch series was integrated into next via git@3222718.

@gitgitgadget gitgitgadget bot added the master label Aug 26, 2024
Copy link

gitgitgadget bot commented Aug 26, 2024

Closed via 3222718.

@gitgitgadget gitgitgadget bot closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant