Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff-tree: fix crash when used with --remerge-diff #1771

Closed

Conversation

blanet
Copy link

@blanet blanet commented Aug 8, 2024

cc: Elijah Newren newren@gmail.com

@blanet
Copy link
Author

blanet commented Aug 8, 2024

/preview

Copy link

gitgitgadget bot commented Aug 8, 2024

Preview email sent as pull.1771.git.1723122774848.gitgitgadget@gmail.com

@blanet
Copy link
Author

blanet commented Aug 8, 2024

/submit

Copy link

gitgitgadget bot commented Aug 8, 2024

Submitted as pull.1771.git.1723123250958.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1771/blanet/xx/fix-diff-tree-crash-on-remerge-v1

To fetch this version to local tag pr-1771/blanet/xx/fix-diff-tree-crash-on-remerge-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1771/blanet/xx/fix-diff-tree-crash-on-remerge-v1

Copy link

gitgitgadget bot commented Aug 8, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Aug 8, 2024 at 6:20 AM blanet via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Xing Xin <xingxin.xx@bytedance.com>
>
> When using "git-diff-tree" to get the tree diff for merge commits with
> the diff format set to `remerge`, a bug is triggered as shown below:
>
>     $ git diff-tree -r --remerge-diff 363337e6eb
>     363337e6eb812d0c0d785ed4261544f35559ff8b
>     BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?

Wow, this bug is around for 2.5 years, and then we both independently
notice and fix it within 3 weeks of each other:
https://github.com/git/git/commit/e5890667c7598e813edee0ac4e76d6e3cdd525ec

My patch is incomplete as it's missing a testcase, and you submitted
first, so let's stick with your fix, though.

> This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
> added in commit 7b90ab467a (log: clean unneeded objects during log
> --remerge-diff, 2022-02-02) detects the absence of `remerge_objdir` when
> attempting to clean up temporary objects generated during the remerge
> process.
>
> After some further digging, I find that the remerge-related diff options
> were introduced in db757e8b8d (show, log: provide a --remerge-diff
> capability, 2022-02-02), which also affect the setup of `rev_info` for
> "git-diff-tree", but were not accounted for in the original
> implementation (inferred from the commit message).
>
> This commit fixes the bug by adding initialization logic for
> `remerge_objdir` in `builtin/diff-tree.c`, mirroring the logic in
> `builtin/log.c:cmd_log_walk_no_free`. A final cleanup for
> `remerge_objdir` is also included.

The commit message from my patch also included an explanation for why
diff-tree was the only caller that was missing the necessary logic
(see the last paragraph, which kind of references the one before it as
well).

> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>     diff-tree: fix crash when used with --remerge-diff
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1771%2Fblanet%2Fxx%2Ffix-diff-tree-crash-on-remerge-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1771/blanet/xx/fix-diff-tree-crash-on-remerge-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1771
>
>  builtin/diff-tree.c     | 13 +++++++++++++
>  t/t4069-remerge-diff.sh | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+)
>
> diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
> index 0d3c611aac0..813be486dad 100644
> --- a/builtin/diff-tree.c
> +++ b/builtin/diff-tree.c
> @@ -9,6 +9,7 @@
>  #include "repository.h"
>  #include "revision.h"
>  #include "tree.h"
> +#include "tmp-objdir.h"

The includes other than this one are in alphabetical order; can you
move this a line before?

Also, as an aside, folks in this project often just put includes at
the end, but I think it's a bad practice.  Whenever someone needs to
backport fixes or merge separate patch topics into seen/next/etc. or
even merge not-yet-upstream topics with newer upstream versions, this
practice increases the odds of unnecessary conflicts.  And it makes it
harder for the next person who comes along to spot whether a header is
already included (and sometimes leaves us including headers twice).
While each case is a small amount of toil so we tend to overlook it,
it's totally unnecessary toil in many cases.  Putting includes in
alphabetical order (other than the one include required to be first,
git-compat-util.h or its documented stand-ins) can often remove this
unnecessary toil.  Anyway, thanks for letting me vent.  :-)

>  static struct rev_info log_tree_opt;
>
> @@ -166,6 +167,13 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
>
>         opt->diffopt.rotate_to_strict = 1;
>
> +       if (opt->remerge_diff) {
> +               opt->remerge_objdir = tmp_objdir_create("remerge-diff");
> +               if (!opt->remerge_objdir)
> +                       die(_("unable to create temporary object directory"));
> +               tmp_objdir_replace_primary_odb(opt->remerge_objdir, 1);
> +       }
> +
>         /*
>          * NOTE!  We expect "a..b" to expand to "^a b" but it is
>          * perfectly valid for revision range parser to yield "b ^a",
> @@ -230,5 +238,10 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
>                 diff_free(&opt->diffopt);
>         }
>
> +       if (opt->remerge_diff) {
> +               tmp_objdir_destroy(opt->remerge_objdir);
> +               opt->remerge_objdir = NULL;
> +       }
> +
>         return diff_result_code(&opt->diffopt);
>  }

Your fix exactly matches mine, other than the header include location.

> diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
> index 07323ebafe0..ca8f999caba 100755
> --- a/t/t4069-remerge-diff.sh
> +++ b/t/t4069-remerge-diff.sh
> @@ -110,6 +110,41 @@ test_expect_success 'can filter out additional headers with pickaxe' '
>         test_must_be_empty actual
>  '
>
> +test_expect_success 'remerge-diff also works for git-diff-tree' '
> +       # With a clean merge
> +       git diff-tree -r -p --remerge-diff --no-commit-id bc_resolution >actual &&
> +       test_must_be_empty actual &&
> +
> +       # With both a resolved conflict and an unrelated change
> +       cat <<-EOF >tmp &&
> +       diff --git a/numbers b/numbers
> +       remerge CONFLICT (content): Merge conflict in numbers
> +       index a1fb731..6875544 100644
> +       --- a/numbers
> +       +++ b/numbers
> +       @@ -1,13 +1,9 @@
> +        1
> +        2
> +       -<<<<<<< b0ed5cb (change_a)
> +       -three
> +       -=======
> +       -tres
> +       ->>>>>>> 6cd3f82 (change_b)
> +       +drei
> +        4
> +        5
> +        6
> +        7
> +       -eight
> +       +acht
> +        9
> +       EOF
> +       sed -e "s/[0-9a-f]\{7,\}/HASH/g" tmp >expect &&
> +       git diff-tree -r -p --remerge-diff --no-commit-id ab_resolution >tmp &&
> +       sed -e "s/[0-9a-f]\{7,\}/HASH/g" tmp >actual &&
> +       test_cmp expect actual
> +'
> +
>  test_expect_success 'setup non-content conflicts' '
>         git switch --orphan base &&

Test looks good too.

I'll be happy to add my Reviewed-by if you fix the header include order.

Copy link

gitgitgadget bot commented Aug 8, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <newren@gmail.com> writes:

> The commit message from my patch also included an explanation for why
> diff-tree was the only caller that was missing the necessary logic
> (see the last paragraph, which kind of references the one before it as
> well).

... which we may want to resurrect.

> Test looks good too.
>
> I'll be happy to add my Reviewed-by if you fix the header include order.

Thanks for a review.

@blanet blanet force-pushed the xx/fix-diff-tree-crash-on-remerge branch from f0b86fa to 02e63b0 Compare August 9, 2024 06:23
When using "git-diff-tree" to get the tree diff for merge commits with
the diff format set to `remerge`, a bug is triggered as shown below:

  $ git diff-tree -r --remerge-diff 363337e
  363337e
  BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?

This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
added in commit 7b90ab4 (log: clean unneeded objects during log
--remerge-diff, 2022-02-02) detects the absence of `remerge_objdir` when
attempting to clean up temporary objects generated during the remerge
process.

After some further digging, I find that the remerge-related diff options
were introduced in db757e8 (show, log: provide a --remerge-diff
capability, 2022-02-02), which also affect the setup of `rev_info` for
"git-diff-tree", but were not accounted for in the original
implementation (inferred from the commit message).

Elijah Newren, the author of the remerge diff feature, notes that other
callers of `log-tree.c:log_tree_commit` (the only caller of
`log-tree.c:do_remerge_diff`) also exist, but:

  `builtin/am.c`: manually sets all flags; remerge_diff is not among them
  `sequencer.c`: manually sets all flags; remerge_diff is not among them

so `builtin/diff-tree.c` really is the only caller that was overlooked
when remerge-diff functionality was added.

This commit resolves the crash by adding `remerge_objdir` setup logic to
`builtin/diff-tree.c`, mirroring `builtin/log.c:cmd_log_walk_no_free`.
It also includes the necessary cleanup for `remerge_objdir`.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
@blanet blanet force-pushed the xx/fix-diff-tree-crash-on-remerge branch from 02e63b0 to 57f0b12 Compare August 9, 2024 06:36
@blanet
Copy link
Author

blanet commented Aug 9, 2024

/submit

Copy link

gitgitgadget bot commented Aug 9, 2024

Submitted as pull.1771.v2.git.1723188292498.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1771/blanet/xx/fix-diff-tree-crash-on-remerge-v2

To fetch this version to local tag pr-1771/blanet/xx/fix-diff-tree-crash-on-remerge-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1771/blanet/xx/fix-diff-tree-crash-on-remerge-v2

Copy link

gitgitgadget bot commented Aug 9, 2024

On the Git mailing list, "Xing Xin" wrote (reply to this):

At 2024-08-09 00:03:53, "Elijah Newren" <newren@gmail.com> wrote:
>On Thu, Aug 8, 2024 at 6:20 AM blanet via GitGitGadget
><gitgitgadget@gmail.com> wrote:
>>
>> From: Xing Xin <xingxin.xx@bytedance.com>
>>
>> When using "git-diff-tree" to get the tree diff for merge commits with
>> the diff format set to `remerge`, a bug is triggered as shown below:
>>
>>     $ git diff-tree -r --remerge-diff 363337e6eb
>>     363337e6eb812d0c0d785ed4261544f35559ff8b
>>     BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?
>
>Wow, this bug is around for 2.5 years, and then we both independently
>notice and fix it within 3 weeks of each other:
>https://github.com/git/git/commit/e5890667c7598e813edee0ac4e76d6e3cdd525ec
>
>My patch is incomplete as it's missing a testcase, and you submitted
>first, so let's stick with your fix, though.

Wow, such an interesting coincidence! And thanks for your quick reply!

>> This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
>> added in commit 7b90ab467a (log: clean unneeded objects during log
>> --remerge-diff, 2022-02-02) detects the absence of `remerge_objdir` when
>> attempting to clean up temporary objects generated during the remerge
>> process.
>>
>> After some further digging, I find that the remerge-related diff options
>> were introduced in db757e8b8d (show, log: provide a --remerge-diff
>> capability, 2022-02-02), which also affect the setup of `rev_info` for
>> "git-diff-tree", but were not accounted for in the original
>> implementation (inferred from the commit message).
>>
>> This commit fixes the bug by adding initialization logic for
>> `remerge_objdir` in `builtin/diff-tree.c`, mirroring the logic in
>> `builtin/log.c:cmd_log_walk_no_free`. A final cleanup for
>> `remerge_objdir` is also included.
>
>The commit message from my patch also included an explanation for why
>diff-tree was the only caller that was missing the necessary logic
>(see the last paragraph, which kind of references the one before it as
>well).

Your explanations better illustrate the impact of this bug,  I've quoted them
in the new patch commit message.

[snip]

>> diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
>> index 0d3c611aac0..813be486dad 100644
>> --- a/builtin/diff-tree.c
>> +++ b/builtin/diff-tree.c
>> @@ -9,6 +9,7 @@
>>  #include "repository.h"
>>  #include "revision.h"
>>  #include "tree.h"
>> +#include "tmp-objdir.h"
>
>The includes other than this one are in alphabetical order; can you
>move this a line before?
>
>Also, as an aside, folks in this project often just put includes at
>the end, but I think it's a bad practice.  Whenever someone needs to
>backport fixes or merge separate patch topics into seen/next/etc. or
>even merge not-yet-upstream topics with newer upstream versions, this
>practice increases the odds of unnecessary conflicts.  And it makes it
>harder for the next person who comes along to spot whether a header is
>already included (and sometimes leaves us including headers twice).
>While each case is a small amount of toil so we tend to overlook it,
>it's totally unnecessary toil in many cases.  Putting includes in
>alphabetical order (other than the one include required to be first,
>git-compat-util.h or its documented stand-ins) can often remove this
>unnecessary toil.  Anyway, thanks for letting me vent.  :-)

Noted. I'll move the new include to the correct position. Thanks for the
guidance!

>>  static struct rev_info log_tree_opt;
>>
>> @@ -166,6 +167,13 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
>>
>>         opt->diffopt.rotate_to_strict = 1;
>>
>> +       if (opt->remerge_diff) {
>> +               opt->remerge_objdir = tmp_objdir_create("remerge-diff");
>> +               if (!opt->remerge_objdir)
>> +                       die(_("unable to create temporary object directory"));
>> +               tmp_objdir_replace_primary_odb(opt->remerge_objdir, 1);
>> +       }
>> +
>>         /*
>>          * NOTE!  We expect "a..b" to expand to "^a b" but it is
>>          * perfectly valid for revision range parser to yield "b ^a",
>> @@ -230,5 +238,10 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
>>                 diff_free(&opt->diffopt);
>>         }
>>
>> +       if (opt->remerge_diff) {
>> +               tmp_objdir_destroy(opt->remerge_objdir);
>> +               opt->remerge_objdir = NULL;
>> +       }
>> +
>>         return diff_result_code(&opt->diffopt);
>>  }
>
>Your fix exactly matches mine, other than the header include location.

High five! :-)

[snip]

Xing Xin

Copy link

gitgitgadget bot commented Aug 9, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Elijah Newren, the author of the remerge diff feature, notes that other
> callers of `log-tree.c:log_tree_commit` (the only caller of
> `log-tree.c:do_remerge_diff`) also exist, but:
>
>   `builtin/am.c`: manually sets all flags; remerge_diff is not among them
>   `sequencer.c`: manually sets all flags; remerge_diff is not among them
>
> so `builtin/diff-tree.c` really is the only caller that was overlooked
> when remerge-diff functionality was added.

That is more than OK as a band-aid, and I'll take the patch as-is,
but I have to wonder if we do even better in a future follow-up
patch.

Any time do_remerge_diff() is entered, we know that either the end
user (from the command line) or the hard-coded caller (like
am/sequencer cited above) wants us to do the remerge-diff, which in
turn requires us to have the temporary object directory rotated into
the status of the primary object store.  And there is nothing in
that object directory rotation code that requires caller-specific
customization---it is the same "create remerge-diff directory as
tmp-objdir, rotate it into the alt object store chain as the
primary" regardless of the actual caller).

So wouldn't it work well if we

 (1) at the beginning of do_remerge_diff(), only once for a rev_info
     structure:
   (1-a) lazily do the "object directory rotation"
   (1-b) set up an atexit handler to clear the temporary object
         store
 (2) remove all the "ah, we need to prepare and tear down the
     temporary object store for _this_ operation" we have sprinkled
     in different code paths (including the one added by the fix we
     are looking at).

That way, we won't have to worry about adding future remerge_diff
users, including existing hard-coded callers.

ANyway, thanks for the fix.  It is very pleasing to see contributors
working well together.

Copy link

gitgitgadget bot commented Aug 9, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> but I have to wonder if we do even better in a future follow-up
> patch.

"if we do" -> "if we can do".

> So wouldn't it work well if we
>
>  (1) at the beginning of do_remerge_diff(), only once for a rev_info
>      structure:
>    (1-a) lazily do the "object directory rotation"
>    (1-b) set up an atexit handler to clear the temporary object
>          store

An atexit handler may not be enough, when a program wants to start
creating a real object after we did a remerge-diff but before
exiting. So we'd probably need to allow an explicit "ok, we are
done" clean-up call for such a program, too.

And the atexit handler can call the same clean-up function if the
program hasn't called it explicitly.  For logically read-only
operations like diff-tree, they do not have to worry about rotating
the real object store back to the primary status as soon as
possible.

Copy link

gitgitgadget bot commented Aug 9, 2024

This patch series was integrated into seen via git@b52560d.

Copy link

gitgitgadget bot commented Aug 9, 2024

This patch series was integrated into next via git@cabe67c.

Copy link

gitgitgadget bot commented Aug 12, 2024

This branch is now known as xx/diff-tree-remerge-diff-fix.

Copy link

gitgitgadget bot commented Aug 12, 2024

This patch series was integrated into seen via git@17311b7.

Copy link

gitgitgadget bot commented Aug 13, 2024

There was a status update in the "New Topics" section about the branch xx/diff-tree-remerge-diff-fix on the Git mailing list:

"git rev-list ... | git diff-tree -p --remerge-diff --stdin" should
behave more or less like "git log -p --remerge-diff" but instead it
crashed, forgetting to prepare a temporary object store needed.

Will merge to 'master'.
source: <pull.1771.v2.git.1723188292498.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 15, 2024

There was a status update in the "Cooking" section about the branch xx/diff-tree-remerge-diff-fix on the Git mailing list:

"git rev-list ... | git diff-tree -p --remerge-diff --stdin" should
behave more or less like "git log -p --remerge-diff" but instead it
crashed, forgetting to prepare a temporary object store needed.

Will merge to 'master'.
source: <pull.1771.v2.git.1723188292498.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 16, 2024

There was a status update in the "Graduated to 'master'" section about the branch xx/diff-tree-remerge-diff-fix on the Git mailing list:

"git rev-list ... | git diff-tree -p --remerge-diff --stdin" should
behave more or less like "git log -p --remerge-diff" but instead it
crashed, forgetting to prepare a temporary object store needed.
source: <pull.1771.v2.git.1723188292498.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Aug 17, 2024

This patch series was integrated into seen via git@0da7673.

Copy link

gitgitgadget bot commented Aug 17, 2024

This patch series was integrated into master via git@0da7673.

Copy link

gitgitgadget bot commented Aug 17, 2024

This patch series was integrated into next via git@0da7673.

@gitgitgadget gitgitgadget bot added the master label Aug 17, 2024
@gitgitgadget gitgitgadget bot closed this Aug 17, 2024
Copy link

gitgitgadget bot commented Aug 17, 2024

Closed via 0da7673.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant