Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hard links in tarballs #10891

Merged

Conversation

edolstra
Copy link
Member

Motivation

Fixes #10395.

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added with-tests Issues related to testing. PRs with tests have some priority fetching Networking with the outside (non-Nix) world, input locking labels Jun 11, 2024
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't entirely rule out significant problems, as the relative path functionality appears to be untested; see comment.


// For each ../ component at the start, go up one directory.
std::string_view relTargetLeft(relTarget);
while (hasPrefix(relTargetLeft, "../")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to see whether all ..'s are guaranteed to come first when I realised: Is this logic needed? The test file doesn't have any relative paths, suggesting that we could just look up the files directly from the root.

While I like the idea of being lenient towards inputs, we run the risk that a bug here will go unnoticed for a long time, and putting reproducibility at stake.

As far as I can see (I haven't excluded a libarchive quirk), I think we could either

  • Figure out a way to make tar produce relative hard links (idk if it can)
  • Modify the tarball by hand to contain relative paths, see how GNU and BSD tar behave, and use the modified tarball in the test
  • Unit testing would potentially test the wrong behavior
  • Check that .. does not occur in any of our path components and throw UnimplementedError("Nix does not support tarballs with relative hard links yet. Please open an issue.") if it does. We may never have to implement it.

Copy link
Member Author

@edolstra edolstra Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path stored in the tarball and returned by libarchive is absolute (or to be precise, relative to the root of the archive). The relative path here comes from CanonPath::makeRelative(), which does indeed always put the .. elements at the start. The conversion to a relative path is just to deal with the git_tree_builders that are in progress (we can't just look up the path from the start of the root, since some parent directories may not have finished yet).

tests/functional/tarball.sh Show resolved Hide resolved
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
@Ericson2314
Copy link
Member

Do we need a new sink class? I suspect that every sink we have today can support this. The only other ones I know of from memory are writing to the OS filesystem and writing to the memory source accessor, and they both should support it. (The memory one can just copy.)

@roberth
Copy link
Member

roberth commented Jun 12, 2024

Do we need a new sink class? I suspect that every sink we have today can support this.

I find the separation insightful.

Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
@Ericson2314
Copy link
Member

I guess it doesn't really matter.

My thinking was that since we don't support mutation of files, ln = cp = cp --reflink i.e. its not an extension to the data model so much as an expedient way to make copies.

@edolstra
Copy link
Member Author

My thinking is that if you're just parsing a NAR, the interface shouldn't require you to handle features that don't exist in NARs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is hard to understand, making it hard to see whether the corner cases have been covered.
That's not to say that we shouldn't have this; we should, but it'd be very helpful to also have a unit test, which will be easier to understand, review and improve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what is hard to understand about it?

We could get rid of tree.tar.gz by generating it in the test script, though we have to make sure the tar version used generates hard links. Not sure how a unit test would help in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to read a raw tarball. I've looked at the strings inside, to get an idea of what's going on, but without looking at the hexdump or using some sort of parsing tool, I can only guess what is approximately going on.

A unit test isolates us from that complexity, or opacity. It provides a clearly readable "script" of what we can expect to happen on the Nix side when a tarball is read.

We could get rid of tree.tar.gz by generating it

We should not remove the file, because it's our only "end to end" test and we know that it certainly contains hardlinks, even if it's not super easy to figure that out.
Generating it would solve part of the problem, but suffers from the problem you describe. It does make it easier to understand and modify.
A unit test wouldn't be "end to end", but that's mostly covered by keeping the tarball file.
The way it helps is to also make it easier to understand and modify, and besides that, it may let us construct corner cases more easily without other, complicated software getting in our way. Another benefit of a unit test is that it provides an abstracted view of the problem, which is very useful for the next person, whether that's for debugging, reviewing, ... (also it's the reason to keep the tarball file around)

@roberth
Copy link
Member

roberth commented Jul 1, 2024

I've opened DeterminateSystems#1 to add the first unit tests to this, but I haven't got around to completing the testing.

@roberth roberth self-assigned this Jul 3, 2024
@Ericson2314 Ericson2314 merged commit efd4bf6 into NixOS:master Jul 11, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fetching Networking with the outside (non-Nix) world, input locking regression Something doesn't work anymore with-tests Issues related to testing. PRs with tests have some priority
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

fetchTarball does not support hard links
4 participants