Skip to content

Commit

Permalink
Merge branch 'tb/incremental-midx-part-1' into seen
Browse files Browse the repository at this point in the history
Incremental updates of multi-pack index files.

* tb/incremental-midx-part-1:
  midx: implement support for writing incremental MIDX chains
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  midx: implement verification support for incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: add new fields for incremental MIDX chains
  Documentation: describe incremental MIDX format
  • Loading branch information
gitster committed Jul 18, 2024
2 parents eda197a + 7a74e28 commit ef7bb39
Show file tree
Hide file tree
Showing 24 changed files with 958 additions and 263 deletions.
11 changes: 10 additions & 1 deletion Documentation/git-multi-pack-index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,12 @@ The file given at `<path>` is expected to be readable, and can contain
duplicates. (If a given OID is given more than once, it is marked as
preferred if at least one instance of it begins with the special `+`
marker).

--incremental::
Write an incremental MIDX file containing only objects
and packs not present in an existing MIDX layer.
Migrates non-incremental MIDXs to incremental ones when
necessary. Incompatible with `--bitmap`.
--

verify::
Expand All @@ -74,6 +80,8 @@ expire::
have no objects referenced by the MIDX (with the exception of
`.keep` packs and cruft packs). Rewrite the MIDX file afterward
to remove all references to these pack-files.
+
NOTE: this mode is incompatible with incremental MIDX files.

repack::
Create a new pack-file containing objects in small pack-files
Expand All @@ -95,7 +103,8 @@ repack::
+
If `repack.packKeptObjects` is `false`, then any pack-files with an
associated `.keep` file will not be selected for the batch to repack.

+
NOTE: this mode is incompatible with incremental MIDX files.

EXAMPLES
--------
Expand Down
100 changes: 100 additions & 0 deletions Documentation/technical/multi-pack-index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,106 @@ Design Details
- The MIDX file format uses a chunk-based approach (similar to the
commit-graph file) that allows optional data to be added.

Incremental multi-pack indexes
------------------------------

As repositories grow in size, it becomes more expensive to write a
multi-pack index (MIDX) that includes all packfiles. To accommodate
this, the "incremental multi-pack indexes" feature allows for combining
a "chain" of multi-pack indexes.

Each individual component of the chain need only contain a small number
of packfiles. Appending to the chain does not invalidate earlier parts
of the chain, so repositories can control how much time is spent
updating the MIDX chain by determining the number of packs in each layer
of the MIDX chain.

=== Design state

At present, the incremental multi-pack indexes feature is missing two
important components:

- The ability to rewrite earlier portions of the MIDX chain (i.e., to
"compact" some collection of adjacent MIDX layers into a single
MIDX). At present the only supported way of shrinking a MIDX chain
is to rewrite the entire chain from scratch without the `--split`
flag.
+
There are no fundamental limitations that stand in the way of being able
to implement this feature. It is omitted from the initial implementation
in order to reduce the complexity, but will be added later.

- Support for reachability bitmaps. The classic single MIDX
implementation does support reachability bitmaps (see the section
titled "multi-pack-index reverse indexes" in
linkgit:gitformat-pack[5] for more details).
+
As above, there are no fundamental limitations that stand in the way of
extending the incremental MIDX format to support reachability bitmaps.
The design below specifically takes this into account, and support for
reachability bitmaps will be added in a future patch series. It is
omitted from this series for the same reason as above.
+
In brief, to support reachability bitmaps with the incremental MIDX
feature, the concept of the pseudo-pack order is extended across each
layer of the incremental MIDX chain to form a concatenated pseudo-pack
order. This concatenation takes place in the same order as the chain
itself (in other words, the concatenated pseudo-pack order for a chain
`{$H1, $H2, $H3}` would be the pseudo-pack order for `$H1`, followed by
the pseudo-pack order for `$H2`, followed by the pseudo-pack order for
`$H3`).
+
The layout will then be extended so that each layer of the incremental
MIDX chain can write a `*.bitmap`. The objects in each layer's bitmap
are offset by the number of objects in the previous layers of the chain.

=== File layout

Instead of storing a single `multi-pack-index` file (with an optional
`.rev` and `.bitmap` extension) in `$GIT_DIR/objects/pack`, incremental
MIDXs are stored in the following layout:

----
$GIT_DIR/objects/pack/multi-pack-index.d/
$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-chain
$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H1.midx
$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H2.midx
$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H3.midx
----

The `multi-pack-index-chain` file contains a list of the incremental
MIDX files in the chain, in order. The above example shows a chain whose
`multi-pack-index-chain` file would contain the following lines:

----
$H1
$H2
$H3
----

The `multi-pack-index-$H1.midx` file contains the first layer of the
multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
the second layer of the chain, and so on.

=== Object positions for incremental MIDXs

In the original multi-pack-index design, we refer to objects via their
lexicographic position (by object IDs) within the repository's singular
multi-pack-index. In the incremental multi-pack-index design, we refer
to objects via their index into a concatenated lexicographic ordering
among each component in the MIDX chain.

If `objects_nr()` is a function that returns the number of objects in a
given MIDX layer, then the index of an object at lexicographic position
`i` within, say, $H3 is defined as:

----
objects_nr($H2) + objects_nr($H1) + i
----

(in the C implementation, this is often computed as `i +
m->num_objects_in_base`).

Future Work
-----------

Expand Down
2 changes: 2 additions & 0 deletions builtin/multi-pack-index.c
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
OPT_BIT(0, "progress", &opts.flags,
N_("force progress reporting"), MIDX_PROGRESS),
OPT_BIT(0, "incremental", &opts.flags,
N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
N_("write multi-pack index containing only given indexes")),
OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot,
Expand Down
8 changes: 2 additions & 6 deletions builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -1217,10 +1217,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (!write_midx &&
(!(pack_everything & ALL_INTO_ONE) || !is_bare_repository()))
write_bitmaps = 0;
} else if (write_bitmaps &&
git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
write_bitmaps = 0;
}
if (pack_kept_objects < 0)
pack_kept_objects = write_bitmaps > 0 && !write_midx;
Expand Down Expand Up @@ -1520,8 +1516,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)

if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
unsigned flags = 0;
if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL, 0))
flags |= MIDX_WRITE_INCREMENTAL;
write_midx_file(get_object_directory(), NULL, NULL, flags);
}

Expand Down
2 changes: 1 addition & 1 deletion ci/run-build-and-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ linux-TEST-vars)
export GIT_TEST_COMMIT_GRAPH=1
export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
export GIT_TEST_MULTI_PACK_INDEX=1
export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
export GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=1
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
export GIT_TEST_NO_WRITE_REV_INDEX=1
export GIT_TEST_CHECKOUT_WORKERS=2
Expand Down
Loading

0 comments on commit ef7bb39

Please sign in to comment.