Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFS host and guest support #40

Open
wants to merge 48 commits into
base: master
Choose a base branch
from
Open

NFS host and guest support #40

wants to merge 48 commits into from

Conversation

djdv
Copy link
Owner

@djdv djdv commented Nov 27, 2023

Indirectly requested from ipfs/roadmap#83
Creating a draft before the bot locks the thread.

This is still a WIP but basic functionality seems to work for both hosting and attaching.
However, initial testing shows it is not reliable.
Looking at the logs, there seems to be some kind of continuity problem in the NFS server or client where 2 lookup requests back to back will be processed in the wrong order. (Likely some kind of race condition.)

2023/11/21 13:21:51 [TRACE] request: RPC #4290061661 (nfs.Lookup)
2023/11/21 13:21:51 [TRACE] request: RPC #4290061662 (nfs.Lookup)
err on: Qmc2fSA6chjZMbXRi7JGQZ7tXEmDbtG2XF9FXRLCtGLBQV [] xid did not mat ch, expected: ffb5255e, received: ffb5255d
err on: Qmc2fSA6chjZMbXRi7JGQZ7tXEmDbtG2XF9FXRLCtGLBQV [] xid did not mat ch, expected: ffb5255d, received: ffb5255e

The error stems from this section of the client.
and I noticed the server is handling sends on the socket within a goroutine (see references to this struct member).
The issue could lie within the NFS libraries or my own implementations, so I'll have to investigate that when I get the chance.


Example server/host invocation:
fs mount nfs pinfs /ip4/192.168.1.40/tcp/2049
I.e. fs mount nfs $guestAPI $nfsServerMaddr
(Assuming an IPFS node is running on this machine with default settings. Otherwise pass -pinfs-api $ipfsAPIMaddr before the final multiaddr argument.)

Example guest/client invocation:
fs mount fuse nfs -nfs-server /ip4/192.168.1.40/tcp/2049 I:
I.e. fs mount $host nfs $nfsServerMaddr $mountPoint


CC: @willscott
We spoke about utilizing this library for this purpose a very long time ago but I never got around to it until now.
Letting you know about its existence.
I'm planning on looking into the above error myself sometime, but you're more than welcome to hack around too. ;^]

@willscott
Copy link
Collaborator

Looking briefly at the client you linked I think that's right - the client is only coded to support 1 call occurring / outstanding at a time.

@djdv djdv force-pushed the j/nfs branch 2 times, most recently from 8c7cbcc to 4402e35 Compare January 11, 2024 22:23
@djdv
Copy link
Owner Author

djdv commented Jan 11, 2024

Had some time to look at this.
Ended up just making the client lock between operations to prevent any overlapping requests. (And generally fixing up and finishing other parts.)

Still a few things left to do but the server and client seem to work. At least well enough for a smoke test.
https://www.youtube.com/watch?v=19FkIxTzavY

@willscott
Copy link
Collaborator

try updating to the latest v0.0.2 of go-nfs and you may see the stale handle issue you ran into in the video go away.

This is very cool!

@djdv
Copy link
Owner Author

djdv commented Jan 13, 2024

try updating to the latest v0.0.2 of go-nfs and you may see the stale handle issue you ran into in the video go away.

Did a quick test and it seems like it did fix that. :^]
Some very minor changes to get it to build again, but afterwards the Go server and Linux client seemed to work alright.

However, willscott/go-nfs#121 broke some stuff on my end.
So I'm going to look into patching this branch to comply with the changes.

This is very cool!

Thanks!


Extra technical details on the breakage:

On the server side, within mountpoint.go:Mount(...), I'm basically taking an extended fs.FS and wrapping that in a billy.Basic implementation, which gets passed to polyfill, before being wrapped further with NFS specifics (NewNullAuthHandler+NewCachingHandler), and finally starts listening for connections.

The fs.FS implementations I've written, don't (yet) implement Lstat, so when I make target.Getattr calls from go-nfs-client1), they're returning NFS3ERR_NOTDIR (specifically the logs say: [ERROR] call to 0x7ff6e6f6e0a0 failed: Not a directory: lstat ".": unsupported operation which is coming from the fs.FS Lstat extension function I have.2)

To remedy this I can just implement Lstat methods on all the fs.FS implementations I have.
And also add a check for this up front in mountpoint.go:Mount, returning a clear error stating we require an Lstat method.
This is related to the last part in #41 expressing the need for something similar to billy's Capabilities concept so that different abstraction layers can better interoperate.

Footnotes

  1. I do this in the fs.FS' Open method

  2. polyfill's Lstat method also behaves this way.

@djdv
Copy link
Owner Author

djdv commented Jan 30, 2024

Added symbolic link support to the IPFS fs.FS systems.
Specifically Lstat, and Readlink; some systems can conditionally do Symlink (mklink) as well.
Open should now implicitly try to resolve links, as long as they are relative to the fs.FS itself.
Symlink support is not well tested yet, but these changes restore NFS support on go-nfs v0.2, which includes the fix for Linux's client.
We could have got away with just doing Lstat, but proper symlink support had to happen eventually anyway.

Other link related changes were made in other parts of the code too (filesystem interface pkg + FUSE host).
The NFS client implementation imposes similar link restrictions (as above). It will do a best effort inspection on link targets and try to resolve ones which are relative.
There's a constructor option that allows setting a path delimiter token, so if you know your links will be delimited by a token other than / (such as somewhere\relative) you can construct the FS with an option that tells it to replace them in the target string. So that they get split properly instead of treated as a filename character (as opposed to a path separator)

When I get a chance I'm going to clean up some remaining TODOS, and do some testing with IPFS link nodes.
After that should just need to review it and squash some commits.

@willscott
Copy link
Collaborator

happy to do a review pass when you're ready for it

@djdv djdv force-pushed the j/nfs branch 2 times, most recently from ae840b4 to 200d0d7 Compare February 3, 2024 19:48
@djdv
Copy link
Owner Author

djdv commented Feb 5, 2024

This is probably good enough for a first pass.
Will, I can't flag you as a reviewer, so I added you as a collaborator first.
If you accept that I should be able to put you on this one. :^]

Some context on various packages:

/filesystem contains extensions to fs.FS, adding any functionality not present in the standard, with extension functions to match the interface extensions. All of our "guest" systems intend to implement parts of these interfaces.

Its subpkg /filesystem/errors essentially extends fs.PathError, adding a Kind type. Guests can define a general error "kind" which hosts APIs can inspect and translate into their native error values.
E.g. an IPFS guest might return an fserror which can be picked up by a FUSE host, that can translate it into the specific errno value its API requires.

/nfs contains file system host & guest support utilizing the extensions in /filesystem.
/ipfs are guest implementations that can be used by other host implementations. They've been adapted to utilize symlinks in this patchset.

All the other changes could be ignored, but you're welcome to look at them too if you want.
Extra stuff below:


The general idea for /commands/mount.go is that it parses argv and transforms values into structures that hold parameters for the host and guest constructors. Those get marshaled and sent over the wire to a daemon process, which itself unmarshals the data, and constructs the guest fs.FS and passes it to the host instances Mount method.
There are reasons for doing things that way, but the implementation is too complicated as-is. So I'm planning on refactoring it later to reduce some duplication and other funny business.

There will still be a need to keep "mountpoint" data serializable, but I think it can be done better than it currently is.
1 reason is that we need to send our CLI args to the daemon process, and 2 is that we need to be able to store the mountpoint on disk so that we can restore it later, like Unix's fstab.
There's other interesting things you can do when the mount table is a living service in itself, such as changing instance parameters live, with or without needing to remount / migrate handles. As well as integrations with external tools, for example a context menu being able to know it's inside an IPFS mount point, and offer more functions like "copy CID", "copy gatewaylink" like an old shell-extension of mine
Some of this already works:
Desktop 2023.02.03 - 13.25.43.13.mp4, Desktop 2022.08.23 - 18.14.54.08.mp4)
but isn't well documented since it's likely going to change.

@djdv djdv marked this pull request as ready for review February 5, 2024 15:42
@djdv djdv requested a review from willscott February 5, 2024 16:04
HC mountCmdHost[HT, HM],
HM marshaller,
HT any,
](host filesystem.Host,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need the linebreak here after the return?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be gofumpt rules being enabled in my LSP settings.
If I remove it, it just inserts it again.
I like most of the decisions of gofumpt, but it usually does weird stuff with function signatures...
I'll leave this alone for now and hope the entire function gets blown away in a future refactor.

"github.com/multiformats/go-multiaddr"
)

type (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the generics and the combination of host and guest mount in this file make the logic a bit hard to follow - i wonder if some of the generics could be normalized to have a bit more repetition but be more readable.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
There are some benefits to having these generic command constructors, but even as the author I get tripped up on it. Especially with some of the bad constraint names, and huge signatures.
I've been considering alternate ways to deal with it but have been struggling to come up with something that is both good and compiler legal.
Thinking about trying to do a large refactor after this PR that splits a lot of this up, but I'm worried about creating some kind of dependency cycle somewhere. We'll see how it goes.

@@ -63,6 +64,7 @@ func NewIPNS(core coreiface.CoreAPI, ipfs fs.FS, options ...IPNSOption) (*IPNS,
readAll | executeAll,
},
nodeTimeout: 1 * time.Minute,
linkLimit: 40, // Arbitrary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise to constant somewhere?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. This is part of a larger problem as well, detailed below but can be ignored.


The way I handle mountpoint data (the parameters required to instantiate a file system) is currently weird.
And I was thinking about better ways to handle them.

The ipfs pkg has constructors for the IPFS fs.FS systems, but in /command/mount we create a struct containing values for that constructor.
(The mount command is just a client that sends that struct + other data to the fs daemon process, which translates that portion back into parameters for the fs.FS constructor which is calls in its own process space).

I'm putting thought into how to better deal with this, but I'm not sure yet.
One thing I was considering was just changing the constructors to accepting a settings struct, and exposing that. This way the constructor parameter is itself an exposed serializable type/format.
The other was going to be to keep them as-is, but expose all the default values in exported consts DefaultLinkLimit = 40, so that external packages could come up with their own serializable format if they need to, irrelevant of how the Go API is.
Or some other nonsense.

The main reason the constructor options need to be serializable, is so that we can implement something similar to Unix's fstab. We can just dump the current mountpoints to disk, and load them again at startup.
(This currently works, but there's no command to automate it. I want to fix this up first before doing that.)

internal/filesystem/ipfs/shared.go Show resolved Hide resolved
internal/filesystem/nfs/client.go Outdated Show resolved Hide resolved
internal/filesystem/nfs/client.go Show resolved Hide resolved
return closerFn, nil
}

func (*Guest) GuestID() filesystem.ID { return GuestID }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means you're only able to have one mount of this at a time because it'll always have the same global filesystem ID?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. It's just the name of the file system.
That interface method is probably poorly name. It used to be FSID iirc. Maybe it should be .Type or something else?

I forget all the places it's used, but I think the mount table uses it. The mount commands might use it, but I think not for "reasons".

Check this out, if we start the daemon with an exposed socket, it acts as a 9P server.
We can then instantiate a bunch of mount points, then as a 9P client, see what the actual mount table looks like.
Which shows those names being used in both the paths and mountpoint tags.

Windows:
fs daemon -api-server /ip4/192.168.1.40/tcp/564

fs mount nfs keyfs -api-server /ip4/192.168.1.40/tcp/564 /ip4/192.168.1.40/tcp/2050
fs mount nfs pinfs -api-server /ip4/192.168.1.40/tcp/564 /ip4/192.168.1.40/tcp/2051

fs mount nfs pinfs -api-server /ip4/192.168.1.40/tcp/564 /ip4/192.168.1.40/tcp/2049
fs.exe mount fuse nfs -api-server /ip4/192.168.1.40/tcp/564 -nfs-server /ip4/192.168.1.40/tcp/2049 I:

Linux:
mount -t 9p 192.168.1.40 /mnt/9 -o trans=tcp,port=564
tree /mnt/9

Screenshot 2024-02-06 105930

@djdv
Copy link
Owner Author

djdv commented Feb 8, 2024

I'm planning to look over these changes again when I get a chance, and afterwards will probably merge this in.
If people use this and find bugs with it of course they can be patched later.

Will, thanks for the help on this. I appreciate you going beyond just the NFS parts too.
I think we both agree that some of the parts of the codebase are complex and need some attention.
I'm not sure yet how much it can be improved, but I'm going to at least investigate it and prototype some stuff.
But that's going to be done in a separate PR later.

I'm in no rush to get this merged, so if you're still looking at the code, or plan on running some interactive tests against it, just let me know and I'll hold off on pressing the button.

@willscott
Copy link
Collaborator

I'm in no rush to get this merged, so if you're still looking at the code, or plan on running some interactive tests against it, just let me know and I'll hold off on pressing the button.

I wasn't planning to look at this again in the short term. Thanks for proving it out though, I'm excited it exists!

@djdv
Copy link
Owner Author

djdv commented Mar 1, 2024

Sorry for the silence on this.
Instead of reviewing it and merging it ASAP, I ended up trying to refactor a lot of the code related to mount.
(I've also been tied up with unrelated troubles.)

I did end up doing a rather large refactor of the code base, but I'm not actually sure if it's an improvement.
As such, I'm going to finish up that refactor, try to get this merged, then try to get the refactor merged after that.

Afterwards, I'm planning on recording a lengthy video orientation, with each section explaining each package of the code base, along with rationale for why it was done the way it is.
The purpose of this is twofold, 1) it should help people to understand how the program works in case they want to modify it, and 2) act as an open invitation for suggestions.
If someone watches it and thinks there's a better way to implement some part of the codebase, I'd be happy to hear it since this repo is rather complicated and I'm having a hard time reducing that complexity.

Personal note: I have to be gone for a few days but I'll act on this when I get back.


Edit (2024.03.19):
Still working on a large refactor but I have to disappear for a few days again.
Have a temporary commit up in this branch: j/mount-refactor, but it's probably not worth looking at yet.
Didn't have time to split the commits up yet so it's one giant blob for now. I'll remedy that later.

I did port the FUSE host and the IPFS guests over to a newer style on that branch. And most of the command line flag stuff was redone as well.
That branch is built on top of this one, but I still need to port the NFS host and guest support to the newer interfaces there.

Intended to have that finished this week, but got interrupted with something IRL again. Hopefully next week.
When I posted previously, the current iteration of the refactor was not looking much better. But I think the one I landed on is probably somewhat better. Hard to gauge it - since I wrote it, it makes sense to me right now but we'll see in time if the code holds up in terms of clarity.

djdv added 5 commits March 30, 2024 09:03
This was originally quoted to see if the CI was permuting the flags
correctly, but was never changed from a string into a process call.
This amends that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants