Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make path manipulation play nice with symlinks by deferring symlink resolution #749

Merged
merged 1 commit into from
Jan 13, 2025

Conversation

sogartar
Copy link
Contributor

@sogartar sogartar commented Jan 3, 2025

pathlib.Path.resolve() would follow and resolve symlinks, which is usually not what we want. In most cases we want to defer symlink resolution.
This change makes all places use os.path.abspath(). When doing path manipulation we may care about the surrounding directory structure, which may be a collection of multiple symlinks, so doing a symlink resolution would put you in a different directory structure, which is probably not the intent.

Comment on lines 36 to 37
THIS_DIR = Path(__file__).parent.resolve()
THIS_DIR = Path(__file__).parent.absolute()
REPO_ROOT = THIS_DIR.parent.parent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a motivating example file structure and a sample of what errors you encounter with the current behavior? I can definitely imagine there are unhandled edge cases with symlinks in here, but this change seems counter to the recommendation at https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parent

If you want to walk an arbitrary filesystem path upwards, it is recommended to first call Path.resolve() so as to resolve symlinks and eliminate ".." components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this error when running tests and the repo directory had a symlink in the path.

INTERNALERROR>   File "/home/bpetkant/ws/sharktank/.venv/lib/python3.11/site-packages/_pytest/cacheprovider.py", line 339, in pytest_collection_modifyitems
INTERNALERROR>     res = yield
INTERNALERROR>           ^^^^^
INTERNALERROR>   File "/home/bpetkant/ws/sharktank/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>           ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/home/bpetkant/ws/sharktank/repo/sharktank/conftest.py", line 25, in pytest_collection_modifyitems
INTERNALERROR>     rel_path = item_path.relative_to(root_path)
INTERNALERROR>                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/lib/python3.11/pathlib.py", line 730, in relative_to
INTERNALERROR>     raise ValueError("{!r} is not in the subpath of {!r}"

I think usually we should treat symlinks as if they are regular directories/files. Isn't that the whole point? For example if you have a structure

/a/b -> /some/directory
/a/c -> /some/other/directory

If you resolve, you lose the context of the symlinks and the relationship and you don't treat them transparently.

In cases where you want to copy/package whole directory structures you may want to copy the link or the contents. But in most situations you should operate with the paths as if regular directories/files.

I actually did not know that Path.parent would treat .. lexically. In that case we can use os.path.abspath, which returns the normalized absolute path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale to treat symlinks specifically like that with Path.resolve?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... in other projects I've just done

THIS_DIR = Path(__file__).parent

then only use .resolve() or .absolute() or .relative_to() for printing to stdout/stderr

Copy link
Contributor Author

@sogartar sogartar Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a two examples of prominent software.

In Bash when inside a symlink directory

cd ..

would put you in the directory where the link is. Not the target.

on the other hand

ls ..

would print the contents of the symlink's target parent.

I think the cd behavior is the correct one. Maintaining the context of the symlink. When printing paths we should in general maintain the symlink.

Copy link
Contributor Author

@sogartar sogartar Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more example.

which python

would print the symlink path and not the target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change to just THIS_DIR = Path(__file__).parent (no .resolve() or .absolute()), does your setup work?

This pattern looks like a mess to put in all files :P

THIS_DIR = Path(os.path.abspath(__file__)).parent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated here with Python 3.9 __file__ is absolute. Does not say normalized though, maybe it is assumed.

Copy link
Contributor Author

@sogartar sogartar Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed os.path.abspath where we assume it is normalized.

@sogartar sogartar requested a review from ScottTodd January 7, 2025 14:46
…esolution

pathlib.Path.resolve() would follow and resolve symlinks, which is
usually not what we want. In most cases we want to defer symlink
resolution.
When doing path manipulation we may care about the surrounding
directory structure, which may be a collection of multiple symlinks, so
doing a symlink resolution would put you in a different directory
structure, which is probably not the intent.

This change makes all path manipulations of __file__ not resolve first.
Since Python 3.9 __file__ is absolute
https://docs.python.org/3/whatsnew/3.9.html#other-language-changes
It is not stated explicitly that it should normalized as well but we
assume it.
We use os.path.abspath if required to get unresolved, absolute and
normalized path.
@sogartar sogartar force-pushed the do-not-use-path-resolve branch from e64b4a2 to d98ce18 Compare January 13, 2025 15:54
@sogartar sogartar merged commit 471e56a into nod-ai:main Jan 13, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants