-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libp2p + HTTP an exploratory refactor #102
Conversation
publisherHost, err := libp2phttp.New( | ||
libp2phttp.StreamHost(publisherStreamHost), | ||
libp2phttp.ListenAddrs([]multiaddr.Multiaddr{multiaddr.StringCast("/ip4/127.0.0.1/tcp/0/http")}), | ||
) | ||
req.NoError(err) | ||
|
||
go publisherHost.Serve() | ||
defer publisherHost.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the "HTTP Host". It's like the libp2p "stream host" (aka core host.Host), but it uses HTTP semantics instead of stream semantics.
You can pass in options on creation like a stream host to do HTTP over libp2p streams, and multiaddrs to create listeners on.
serverHTTPMa := publisherHost.Addrs()[1] | ||
req.Contains(serverHTTPMa.String(), "/http") | ||
|
||
publisherHost.SetHttpHandlerAtPath(protoID, "/ipni/", http.StripPrefix("/ipni/", publisher)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where we attach our request handler. Note that we are mounting the "/ipni-sync/1" protocol at /ipni/
. libp2phttp manages this mapping and clients can learn about the mapping at .well-known/libp2p
.
In this case we also want out HTTP handler to not even know about the prefix, so we use the stdlib http.StripPrefix
.
dagsync/httpsync/publisher_test.go
Outdated
clientHost, err := libp2phttp.New() | ||
req.NoError(err) | ||
|
||
c, err := clientHost.NamespacedClient(nil, protoID, peer.AddrInfo{Addrs: []multiaddr.Multiaddr{serverHTTPMa}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates an http.Client that automatically knows about the prefix for a given protocol ID.
dagsync/httpsync/publisher_test.go
Outdated
clientHost, err := libp2phttp.New() | ||
req.NoError(err) | ||
|
||
c, err := clientHost.NamespacedClient(clientStreamHost, protoID, peer.AddrInfo{ID: publisherStreamHost.ID(), Addrs: []multiaddr.Multiaddr{serverStreamMa}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same thing as above but it is going over a libp2p stream.
Note: I think I'll change the NamespacedClient
api to not accept a stream host here, and instead use the one passed in as an option to libp2phttp.New
.
} | ||
|
||
for _, tc := range testCases { | ||
t.Run(tc.name, func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing special here. Just plumbing to set up the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM
Neat, but it seems to me that the main trade-off here is that we have to rely on libp2p discovery semantics to figure out how to communicate? a client has to be I really like that the code is simpler in here. But unfortunately it's also only simpler if we get to remove the other form of this - but if we do that, then users on both ends need to opt-in to the libp2p version of talking on http. Either using this library, or writing it themselves. One nice thing about the current incarnation is that HTTP is simple, and writing the server (and client) is fairly straightforward, with all of the complexity being in the ipni protocol itself. This would add an additional layer of complexity. |
Thanks for taking a look!
This is just one way to discover a peer's protocol mapping. IPNI decided on encoding this mapping in the multiaddr (which is fine), but I'd prefer not to have every application have reimplement this. I think it makes sense to have this mapping at the server metadata layer rather than defined per application/protocol. It allows the protocols to be agnostic to their path prefix, gives peers a standard way they can learn about each other, and it's an HTTP native resource. That doesn't mean this is the only solution to create a peer's protocol mapping. IPNI can continue encoding the mapping in the multiaddr, and give that information to the HTTP host. That's not supported right now, but I think that's a great addition to the API (edit: just pushed this change).
I don't think this is a "libp2p version of talking on http". A goal here is to interoperate with existing systems. This is all plain old HTTP. Yes, we're defining the If this Server: mux.HandleFunc("/.well-known/libp2p", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintln(w, `{"protocols":{"/ipni-sync/1":{"path":"/custom-prefix"}}}`)
}) Client: const prefix = await fetch("<remote>/.well-known/libp2p")
.then(resp => resp.json())
.then(mapping => mapping.protocols["/ipni-sync/1"].path) Alright, so you may ask "if it's so easy what's the point of libp2p+HTTP then?" A couple of reasons, but for IPNI, the biggest benefit is that it lets you delete This API is all subject to change, so if you see something that could improve here I'm all ears! |
Overall seems like a positive API addition to libp2p http, it does weaken the argument for It makes me ponder the possibilities for protocol probing, we do all this juggling in lassie to figure out how we can talk to peers, some of it depends on inspecting the multiaddr and some involves opening a stream and probing: https://github.com/filecoin-project/lassie/blob/main/pkg/retriever/directcandidatefinder.go Potentially, if the trustless gateway http protocol were to adopt this, we could do some versioning and make allowances for different root paths. We're already at the point of exploring a mix of OPTIONS, Accept headers, and params to do some of the work around different implementation functionality support. I can imagine this getting out of hand once we add more features to the protocol. I'm not sure what you have here would really solve any of this better, but maybe if we needed to do hard versioning breaks it would help. We're also stuck with the fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for writing this. This is very helpful for understanding how this can be used to make libp2p work easily as a transport for HTTP. Thank you!
I have some questions concerning the best way to adapt this for use in IPNI http-sync in the PR review.
clientHost, err := libp2phttp.New() | ||
req.NoError(err) | ||
|
||
c, err := clientHost.NamespacedClient(protoID, peer.AddrInfo{Addrs: []multiaddr.Multiaddr{serverHTTPMa}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to mean that a separate NamespacedClient
is needed for each separate server that a request is sent to. This is somewhat different than the semantics of the standard http.Client
where the same client instance can be used to send requests to servers at different URLs. Is this correct?
This matters because with the structure of the httpsync code as it currently is, the sync
is a long-lived object (created once) that contains the http.Client
which is reused for separate Syncer
requests. What will need to change is that a new NamespacedClient
will need to be associated with each per-sync Syncer
instance and not with the Sync
.
Can NamespacedClient
instances be closed and removed from their clientHost
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick note: Connections are managed by the underlying Roundtripper. NamespacedClients are a thin wrapper around an underlying RoundTripper that adds a prefix path to http requests. They don't change how connections are managed or cached.
This matters because with the structure of the httpsync code as it currently is, the sync is a long-lived object (created once) that contains the http.Client which is reused for separate Syncer requests. What will need to change is that a new NamespacedClient will need to be associated with each per-sync Syncer instance and not with the Sync.
Yep that makes sense to me. 👍
an NamespacedClient instances be closed and removed from their clientHost?
Not necessary to remove them from the clientHost (the client http host doesn't track them). You may optionally close the idle connections on them with client.CloseIdleConnections, but it's not necessary. The HTTP transport or stream host is responsible for managing/closing these connections. Clients are relatively cheap to make.
urls: nil, | ||
sync: s, | ||
}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Syncer
instance is created for each sync operation and is responsible for making a request to a specific advertisement publisher. This means that each instance of aSyncer
requires the addresses to connect to since these will not be known in advance.
I think instead of NewSyncerWithoutAddrs
here, that NewSyncer
should be modified to create a new clientHost.NamespacedClient
if its parent Sync
is using a libp2phttp
client. The NamespacedClient
can then be used as the http.Client
in Syncer
functions.
Will it be necessary to close each NamespacedClient
instance and remove it from the clientHost
and remove its peer info from the peerstore? Will that happen automatically after some time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of NewSyncerWithoutAddrs here, that NewSyncer should be modified to create a new clientHost.NamespacedClient if its parent Sync is using a libp2phttp client. The NamespacedClient can then be used as the http.Client in Syncer functions.
👍 . I thought about this change, but wanted to make a less invasive change to demo. I like it though.
Will it be necessary to close each NamespacedClient instance and remove it from the clientHost and remove its peer info from the peerstore? Will that happen automatically after some time?
Happens automatically :) (just like the stock http.Client does it)
This is another version of the libp2p + HTTP refactor in #102. The key difference is that a new `NamespacedClient`, having the publisher's address, is created for each sync within the per-sync Syncer.
This is another version of the libp2p + HTTP refactor in #102. The key difference is that a new `NamespacedClient`, having the publisher's address, is created for each sync within the per-sync Syncer.
@MarcoPolo Please take a look at PR #103 as a modification of this PR, and see PR comments within code. cc @masih |
Replying to @rvagg comments first:
Long term, I think we'll push this data closer to the "provider record" equivalent. You should know what datatransfer protocols a peer supports (and in the case of HTTP, where they are mounted). I believe the IPNI metadata already has some support for this (extra metadata for the application protocol, which if it is an HTTP-semantic one can include the prefix path). That would save us a round trip and possible a wasted connection when we want to learn about a new peer. Note that this doesn't conflict with a
I wrote a comment around I think (I'm not super familiar here) that we could mount the existing |
This package is incorporated into the changes in #113. Closing. |
Hi folks!
I'm play-testing the new libp2p+HTTP api libp2p/go-libp2p#2438 so I wanted to try using it somewhere. I think the dagsync code here would benefit from libp2p+HTTP because:
dagsync/dtsync
eventually (after clients upgrade).a. This means your client and server are also simpler. You can toss all the prefix logic into the trash!
I've added a test case that shows a simple head fetch and sync over HTTP both on top of a native HTTP transport and a libp2p stream.
I'd love feedback on what you think about this API.
cc @masih