Skip to content

Commit

Permalink
for-each-ref: add --count-matches option
Browse files Browse the repository at this point in the history
In order to count references of different types based on their initial
prefixes, there are two current approaches:

 1. Run 'git for-each-ref' on all refs and parse the output to count
    those that match each prefix.

 2. Run 'git for-each-ref "prefix/"' and pipe that output to 'wc -l' to
    count the number of output lines.

Each of these approaches is wasteful as it requires sending the list of
matching reference names across a pipe plus the cost of parsing that
output.

Instead, it would be helpful to have a Git command that counts the
number of refs matching a list of patterns.

This change adds a new mode, '--count-matches' to 'git for-each-ref' so
we can make use of the existing infrastructure around parsing refspecs
in the correct places. '--count' is already taken as a "maximum number"
of refs to output.

An alternative approach could be to make a brand-new builtin that is
focused on counting ref matches. This would involve duplicating a bit of
code around parsing refpecs, but would not be terribly difficult. The
actual overlap of implementation here with 'git for-each-ref' is small
enough that we could instead extract this elsewhere. My gut feeling is
that this behavior doesn't merit a new builtin.

The implementation is extremely simple: iterate through all references
and compare each ref to each refspec. On a match, increment a counter
value for that refspec.

In the end, output each refspec followed by the count.

If all given refspecs were prefixes, then it is tempting to instead use
a counting behavior that we can use in things like "how many OIDs start
with this short hex string?" Navigating to the start and end of the
range of refs starting with that prefix is possible in the packed-refs
file. However, the records do not have a constant size so we cannot
infer the number of references in that range using the current format.
(Perhaps, in the future we will have a ref storage system that allows
this kind of counting to be easy to do in O(log N) time.)

A new performance test is included to check the performance of iterating
through these references and counting them appropriately. This presents
a 3x improvement over the trivial piping through to 'wc -l', and that
assumes there is a single pattern to match instead of multiple. We can
see that testing three patterns sequentially adds to the total time, but
doing a single process with --count-match continues to be as fast. (It's
difficult to tell since it _also_ matches the sum of the three for this
example repo.)

Test                                                         this tree
----------------------------------------------------------------------------
1501.2: count refs/heads/: git for-each-ref | wc -l          0.01(0.00+0.01)
1501.3: count refs/heads/: git for-each-ref --count-match    0.00(0.00+0.00)
1501.4: count refs/tags/: git for-each-ref | wc -l           0.02(0.00+0.02)
1501.5: count refs/tags/: git for-each-ref --count-match     0.00(0.00+0.00)
1501.6: count refs/remotes: git for-each-ref | wc -l         0.15(0.08+0.07)
1501.7: count refs/remotes: git for-each-ref --count-match   0.04(0.01+0.02)
1501.8: count all patterns: git for-each-ref | wc -l         0.18(0.08+0.10)
1501.9: count all patterns: git for-each-ref --count-match   0.04(0.02+0.02)

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
  • Loading branch information
derrickstolee committed Jun 26, 2023
1 parent 4aa261d commit 9121e02
Show file tree
Hide file tree
Showing 6 changed files with 145 additions and 3 deletions.
5 changes: 5 additions & 0 deletions Documentation/git-for-each-ref.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ OPTIONS
`<pattern>`. This option makes it stop after showing
that many refs.

--count-matches::
Instead of listing references according to the given format, output
the number of references that match each pattern. Incompatible with
`--format`, `--sort`, and `--count`.

--sort=<key>::
A field name to sort on. Prefix `-` to sort in
descending order of the value. When unspecified,
Expand Down
26 changes: 23 additions & 3 deletions builtin/for-each-ref.c
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,17 @@ static void filter_and_output_refs(struct repository *r,
strbuf_release(&output);
}

static void count_and_output_patterns(struct ref_filter *filter)
{
uint32_t *counts = count_ref_patterns(filter);

for (int i = 0; filter->name_patterns && filter->name_patterns[i]; i++)
fprintf(stdout, "%s %"PRIu32"\n",
filter->name_patterns[i], counts[i]);

free(counts);
}

int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
{
struct ref_sorting *sorting;
Expand All @@ -60,6 +71,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
struct ref_format format = REF_FORMAT_INIT;
int from_stdin = 0;
struct strvec vec = STRVEC_INIT;
const char *initial_format = "%(objectname) %(objecttype)\t%(refname)";

struct option opts[] = {
OPT_BIT('s', "shell", &format.quote_style,
Expand All @@ -72,6 +84,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
N_("quote placeholders suitably for Tcl"), QUOTE_TCL),
OPT_BOOL(0, "omit-empty", &omit_empty,
N_("do not output a newline after empty formatted refs")),
OPT_BOOL(0, "count-matches", &count_matches,
N_("output number of references matching each pattern instead of any other output")),

OPT_GROUP(""),
OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
Expand All @@ -93,7 +107,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
memset(&array, 0, sizeof(array));
memset(&filter, 0, sizeof(filter));

format.format = "%(objectname) %(objecttype)\t%(refname)";
format.format = initial_format;

git_config(git_default_config, NULL);

Expand All @@ -102,6 +116,9 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
error("invalid --count argument: `%d'", maxcount);
usage_with_options(for_each_ref_usage, opts);
}
if (count_matches &&
(maxcount || format.format != initial_format || sorting_options.nr))
die("--count-matches incompatible with --count, --format, or --sort");
if (HAS_MULTI_BITS(format.quote_style)) {
error("more than one quoting style?");
usage_with_options(for_each_ref_usage, opts);
Expand Down Expand Up @@ -131,8 +148,11 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
}

filter.match_as_path = 1;
filter_and_output_refs(the_repository, &array, &filter, &format,
sorting, maxcount, omit_empty);
if (count_matches)
count_and_output_patterns(&filter);
else
filter_and_output_refs(the_repository, &array, &filter, &format,
sorting, maxcount, omit_empty);

ref_array_clear(&array);
free_commit_list(filter.with_commit);
Expand Down
47 changes: 47 additions & 0 deletions ref-filter.c
Original file line number Diff line number Diff line change
Expand Up @@ -2560,6 +2560,53 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
return ret;
}

struct filter_and_counts {
struct ref_filter *filter;
uint32_t *counts;
};

static int ref_filter_counter(const char *refname,
const struct object_id *oid,
int flag, void *cb_data)
{
struct filter_and_counts *fc = cb_data;
const char **pattern = fc->filter->name_patterns;
size_t namelen = strlen(refname);
unsigned flags = fc->filter->ignore_case ? WM_CASEFOLD : 0;

for (int i = 0; *pattern; i++, pattern++) {
const char *p = *pattern;
int plen = strlen(p);

if ((plen <= namelen) &&
!strncmp(refname, p, plen) &&
(refname[plen] == '\0' ||
refname[plen] == '/' ||
p[plen-1] == '/'))
fc->counts[i]++;
else if (!wildmatch(p, refname, flags))
fc->counts[i]++;
}
return 0;
}

uint32_t *count_ref_patterns(struct ref_filter *filter)
{
int size = 0;
struct filter_and_counts fc = {
.filter = filter,
};

while (filter->name_patterns[size])
size++;

CALLOC_ARRAY(fc.counts, size);

for_each_fullref_in_pattern(filter, ref_filter_counter, &fc);

return fc.counts;
}

static int compare_detached_head(struct ref_array_item *a, struct ref_array_item *b)
{
if (!(a->kind ^ b->kind))
Expand Down
7 changes: 7 additions & 0 deletions ref-filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,13 @@ char *get_head_description(void);
/* Set up translated strings in the output. */
void setup_ref_filter_porcelain_msg(void);

/*
* Iterate over all references, counting how many match each filter
* pattern. Returns an array of the counts with the ith count matching the
* ith filter->name_pattern entry.
*/
uint32_t *count_ref_patterns(struct ref_filter *filter);

/*
* Print a single ref, outside of any ref-filter. Note that the
* name must be a fully qualified refname.
Expand Down
35 changes: 35 additions & 0 deletions t/perf/p1501-ref-iteration.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/sh

test_description='Ref iteration performance tests'
. ./perf-lib.sh

test_perf_large_repo

# Optimize ref backend store
test_expect_success 'setup' '
git pack-refs
'

for pattern in "refs/heads/" "refs/tags/" "refs/remotes"
do
test_perf "count $pattern: git for-each-ref | wc -l" "
git for-each-ref $pattern | wc -l
"

test_perf "count $pattern: git for-each-ref --count-match" "
git for-each-ref --count-matches $pattern
"
done

test_perf "count all patterns: git for-each-ref | wc -l" "
git for-each-ref refs/heads/ | wc -l &&
git for-each-ref refs/tags/ | wc -l &&
git for-each-ref refs/remotes/ | wc -l
"

test_perf "count all patterns: git for-each-ref --count-match" "
git for-each-ref --count-matches \
refs/heads/ refs/tags/ refs/remotes/
"

test_done
28 changes: 28 additions & 0 deletions t/t6300-for-each-ref.sh
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,34 @@ for i in "--perl --shell" "-s --python" "--python --tcl" "--tcl --perl"; do
"
done

test_expect_success '--count-matches incompatible with some options' '
for opt in "--format=x" "--sort=refname" "--count=10"
do
test_must_fail git for-each-ref --count-matches $opt refs/heads/ 2>err &&
grep "count-matches incompatible" err || return 1
done
'

test_expect_success '--count-matches tallies the number matching each refspec' '
git init multi-refs &&
test_commit -C multi-refs A &&
git -C multi-refs branch A &&
git -C multi-refs branch pre/A &&
test_commit -C multi-refs --no-tag B &&
git -C multi-refs branch B &&
git -C multi-refs for-each-ref --count-matches \
refs/heads/ refs/heads/pre/ refs/tags/ "*A*" >actual &&
cat >expect <<-EOF &&
refs/heads/ 4
refs/heads/pre/ 1
refs/tags/ 1
*A* 3
EOF
test_cmp expect actual
'

test_expect_success 'setup for upstream:track[short]' '
test_commit two
'
Expand Down

0 comments on commit 9121e02

Please sign in to comment.