-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hard links in tarballs #10891
Support hard links in tarballs #10891
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't entirely rule out significant problems, as the relative path functionality appears to be untested; see comment.
|
||
// For each ../ component at the start, go up one directory. | ||
std::string_view relTargetLeft(relTarget); | ||
while (hasPrefix(relTargetLeft, "../")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to see whether all ..
's are guaranteed to come first when I realised: Is this logic needed? The test file doesn't have any relative paths, suggesting that we could just look up the files directly from the root.
While I like the idea of being lenient towards inputs, we run the risk that a bug here will go unnoticed for a long time, and putting reproducibility at stake.
As far as I can see (I haven't excluded a libarchive quirk), I think we could either
- Figure out a way to make
tar
produce relative hard links (idk if it can) - Modify the tarball by hand to contain relative paths, see how GNU and BSD tar behave, and use the modified tarball in the test
Unit testingwould potentially test the wrong behavior- Check that
..
does not occur in any of our path components andthrow UnimplementedError("Nix does not support tarballs with relative hard links yet. Please open an issue.")
if it does. We may never have to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path stored in the tarball and returned by libarchive is absolute (or to be precise, relative to the root of the archive). The relative path here comes from CanonPath::makeRelative()
, which does indeed always put the ..
elements at the start. The conversion to a relative path is just to deal with the git_tree_builders
that are in progress (we can't just look up the path from the start of the root, since some parent directories may not have finished yet).
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
Do we need a new sink class? I suspect that every sink we have today can support this. The only other ones I know of from memory are writing to the OS filesystem and writing to the memory source accessor, and they both should support it. (The memory one can just copy.) |
I find the separation insightful. |
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
I guess it doesn't really matter. My thinking was that since we don't support mutation of files, |
My thinking is that if you're just parsing a NAR, the interface shouldn't require you to handle features that don't exist in NARs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is hard to understand, making it hard to see whether the corner cases have been covered.
That's not to say that we shouldn't have this; we should, but it'd be very helpful to also have a unit test, which will be easier to understand, review and improve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what is hard to understand about it?
We could get rid of tree.tar.gz
by generating it in the test script, though we have to make sure the tar
version used generates hard links. Not sure how a unit test would help in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to read a raw tarball. I've looked at the strings inside, to get an idea of what's going on, but without looking at the hexdump or using some sort of parsing tool, I can only guess what is approximately going on.
A unit test isolates us from that complexity, or opacity. It provides a clearly readable "script" of what we can expect to happen on the Nix side when a tarball is read.
We could get rid of
tree.tar.gz
by generating it
We should not remove the file, because it's our only "end to end" test and we know that it certainly contains hardlinks, even if it's not super easy to figure that out.
Generating it would solve part of the problem, but suffers from the problem you describe. It does make it easier to understand and modify.
A unit test wouldn't be "end to end", but that's mostly covered by keeping the tarball file.
The way it helps is to also make it easier to understand and modify, and besides that, it may let us construct corner cases more easily without other, complicated software getting in our way. Another benefit of a unit test is that it provides an abstracted view of the problem, which is very useful for the next person, whether that's for debugging, reviewing, ... (also it's the reason to keep the tarball file around)
I've opened DeterminateSystems#1 to add the first unit tests to this, but I haven't got around to completing the testing. |
Motivation
Fixes #10395.
Context
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.