Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRESOLVER-605] Linking support in file transporter #578

Merged
merged 12 commits into from
Oct 12, 2024

Conversation

cstamas
Copy link
Member

@cstamas cstamas commented Sep 15, 2024

Support linking for FileTransport. Still, as FileTransport is remote transport, due checksumming (via Listener) file does have to be read.

Big fat note: this change will cause that symlinks may land in local repository. This PR does NOT enforce rest of Maven plugins knows how to handle them, is out of scope for this PR.

This feature is EXPERIMENTAL.


https://issues.apache.org/jira/browse/MRESOLVER-605

Problem is that file transport is also remote transport, so things
like checksumming are expected to happen (just like for HTTP or
any other remote transport).
@cstamas cstamas self-assigned this Sep 15, 2024
@eolivelli
Copy link

This work will save lot of disk space for large projects and it is definitely worth to expllre this option.

I wonder how we can make this plugins that handle files and they don't expect links. But it only a matter of finding them and adding support.

We should also think about dealing with the compability of these new modes with existing projects and caches maybe without asking the user to clean everything ?
Thanks for building this POC

@cstamas
Copy link
Member Author

cstamas commented Sep 15, 2024

I'd like to know more about requirements, as transport-file is still transport (considered remote) with all the fluff remote transport is expected to do, like mentioned checksum calculation/comparison (and this Pr will work for smaller files, but as I read Javadoc, mapping size limit is 2GB?).
As immediate (and already existing) alternative to this is "chained local repository" for example, that is local and simply logically merges two or more local repositories...

Just like replaced `TestFileUtils.createTempFile`
just create new path, as WinFS is wonky
@tiagobento
Copy link

tiagobento commented Sep 15, 2024

@cstamas @eolivelli Wow! Thanks a lot for putting this together so quickly!

@cstamas I'm not sure if you're talking about "internal requirements" or if you'd like to understand more about this setup where having the symlink+file: transport option would be a game changer for me. Let me know if it's the latter.

Regarding this "chained local repository" alternative you mentioned, is this something exposed via configuration for Maven users or you're referring to using ChainedLocalRepositoryManager?

EDIT: Seems like I had some reading to do first :) Found Chained Local Repository Manager (CLRM) and I guess it's pretty clear now. I guess adding the file:// remote repo as a tail local repo would be analogous to soft-linking stuff to the head local repo? I Need to do some experimentation with it, but CLRM combined with this statement seems like it would. IIUC the head local repo is where things will be downloaded to, and the tails will act as a fallback mechanism before trying to actually go looking remotely.

I'm happy to help in any way I can, though! :)

@tiagobento
Copy link

tiagobento commented Sep 16, 2024

I did some extensive tests with -Dmaven.repo.local.tail and although is works well for compiling, testing etc, some plugins simply don't work (namely the Quarkus Maven Plugin in my case)... They seem to use maven-resolver internally to introspect the project's POM and I found in their code they're using SimpleLocalRepositoryManager.

My experiments were based on the fact that having N repos mapped as "tail" local repos seemed analogous to declaring N repositories via file://, but without all the execution, configuration, and disk usage overhead.

Unfortunately I think symlinks would be much easier for Maven plugin developers to adapt to. Symlinks might also even work out of the box in some cases. In fact, some of the tests I did with cp -r -s worked quite ok.

It seems this will be a trade-off between how unsupported -Dmaven.repo.local.tail is VS. how unsupported symlinks on local maven repos are.

@cstamas
Copy link
Member Author

cstamas commented Sep 17, 2024

Hej, sadly yes. The legacy bits in Maven effectively prevent ChainedMavenRepository "mainstream" use, see issue for explanation: https://issues.apache.org/jira/browse/MNG-7706 In short, all "modern" plugins are ok, but in a moment this legacy API is touched (is deprecated btw), the chained falls apart.

Quarkus plugin is different beast from that story above, as they already have their own chained implementation, moreover, they construct their own Resolver embedded in Maven...

And as for symlinking (as hardlinking fails on Windows for some reason), it may work and save disk space (with all the shortcomings already mentioned, like all the plugins should be able to handle symlinks in local repo) but speed wise it may not be huge boost, since as I said, "file" transport is remote transport, and hence, checksumming (and in future, things like signature validation) is must and expected.

@cstamas
Copy link
Member Author

cstamas commented Sep 17, 2024

On related note: due some strange habit, people wanting "latest and greatest" (features and compatibility) are NOT compiling against latest, and our effort to "signal" people to move off APIs (by deprecation) goes completely unnoticed, like here: repaint-io/maven-tiles#157

Hence we started adding "plugin warnings" that again, caused user uproar :) All that fuss was partially related to "move off deprecated ArtifactRepository classes" among other things, as we knew we cannot improve underlying LRM (in Resolver) without Maven cruft following it.

@cstamas
Copy link
Member Author

cstamas commented Sep 17, 2024

Have to add: this is big time ourselves committing a mistake as well, as we (as in "maven project") let "two parallel worlds" (Resolver and Maven2 inherited APIs) coexists for too long (two repository APIs, even to artifact and artifact version APIs), without giving any guidance to users "where to go". This is now changing, and Maven3 to Maven4 migration will have it's rough edges, and will require "vigilance" from our users too, but Maven4 goal is to keep lives of those migrating simple (so Maven3 plugins not touching into deepest internals will still work as before). Extensions are completely different story...

@tiagobento
Copy link

And as for symlinking (as hardlinking fails on Windows for some reason), it may work and save disk space (with all the shortcomings already mentioned, like all the plugins should be able to handle symlinks in local repo) but speed wise it may not be huge boost, since as I said, "file" transport is remote transport, and hence, checksumming (and in future, things like signature validation) is must and expected.

Got it. Makes total sense. I guess the only real performance gain then would be in not having to actually move bits around and write them as new files. Hard to say if it's worth it to have it or not, though.


As for all the complexity in managing deprecations and having people actually act on using the new stuff, I can only imagine how careful the considerations must be to change/evolve something so widely used like Maven, especially in a foundational part like Resolver. If I may, I guess the simple-enough repository structure and mental model that Maven has for dealing with dependencies kind of encourages people to endeavor a little more on its internals (i.e. "How hard can it be?" 😅), extending it in ways that seem safe but have all sorts of evolution and compatibility problems.. But that's just my limited perspective on it as a user 😝

I'm glad to hear that Maven4 will try and address some of those issues! I'm looking forward for the first stable release.


In the mean time, I couldn't let it go with the Quarkus Maven Plugin and... I got it working. I'm pretty sure this is not the whole thing, but at least my immediate problem was solved by this little patch: tiagobento/quarkus@86a25e0

I did it against the 3.8.4 tag (because that's what I'm using in my project, currently), but I'll try and send a PR targeting the latest stream. Hopefully I can get it to be part of a future patch release on the next LTS coming up alter this month (3.15).

@cstamas Thank you so much for pointing out that the CLRM mechanism exists. It may be the final missing piece on this whole thing I'm trying to do.


As for the symlinks, I guess my need would be a lot less latent now that I understand the gains may not be as great as I initially imagined. Of course, if you want to continue pursuing the feasibility of both symbolic and hard links, I'll be available for helping you test it if you need it!

@cstamas
Copy link
Member Author

cstamas commented Sep 18, 2024

@aloubyansky ping, especially the interesting patch tiagobento/quarkus@86a25e0

@cstamas cstamas changed the title FileOp for file transporter [MRESOLVER-605] Linking support in file transporter Sep 24, 2024
@cstamas cstamas marked this pull request as ready for review September 24, 2024 08:58
@cstamas cstamas added this to the 2.0.2 milestone Sep 24, 2024
@cstamas cstamas merged commit 8b0c7ff into apache:master Oct 12, 2024
5 checks passed
@cstamas cstamas deleted the fileop-tiago branch October 12, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants