Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track and measure number of Brave browser IPFS nodes #24

Open
autonome opened this issue Nov 28, 2022 · 14 comments
Open

Track and measure number of Brave browser IPFS nodes #24

autonome opened this issue Nov 28, 2022 · 14 comments

Comments

@autonome
Copy link

Brave browser ships a feature which downloads and runs Kubo.

We want to measure the number of Brave IPFS nodes on the public network.

@lidel said they announce themselves as kubo/0.16.0/brave and that we could find them by:

  • collecting peerids from peer records on DHT
  • read agent version of each via ipfs id QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN | jq .AgentVersion
@aschmahmann
Copy link

collecting peerids from peer records on DHT

This could be somewhat tricky. Collecting peer records from the DHT like that will be painful once the hydra nodes are spun down. This process seems to be starting soon https://discuss.ipfs.tech/t/dht-hydra-peers-dialling-down-non-bridging-functionality-on-2022-12-01/15567 and (given that the linked proposal is a protocol violation) hopefully the hydras will be fully dead soon.

In a hydra-less world some options include:

  • Spin up some small number of peers as DHT servers and just log inbound connections since nodes that are DHT clients will eventually run into them. Particularly if going this route probably worth talking things over with the probelab crew (cc @yiannisbot ) around probability of detection based on how long a peer is online for
  • Leveraging the information from some existing infra peers that connect to a lot of peers (e.g. bootstrappers, gateways, etc.)

IIRC there are already some metrics collected for number of total peerIDs seen by infra nodes. This seems like a) a good time to audit those metrics for accuracy b) like it'd be easy enough to either export more information than just peerIDs or to take the peerIDs and feed them into ipfs id to get the results out

@momack2
Copy link

momack2 commented Dec 5, 2022

@aschmahmann - collecting peer records has been done long before the hydras exist - and AFAIK the hydras actually don't collect this data at all anyway. This already exists.

@aschmahmann
Copy link

IIRC there are already some metrics collected for number of total peerIDs seen by infra nodes.

@momack2 the existing data collection is what I meant by the above. IIRC it comes from aggregates of PL nodes https://github.com/ipfs/kubo/blob/master/plugin/plugins/peerlog/peerlog.go, although I don't recall which nodes participate in the logging. I recall seeing results in grafana and kibana although I'm not sure if they're the same.

I was mostly flagging that the proposal of scraping the DHT will not really work, and without leveraging the hydras you'd need a different approach, like the one we already have.

This already exists.

Yep, what we already have is likely fine (scraping inbound peerIDs and user agents). Although when tracking and reporting these numbers we should be clear how we're collecting them.

@momack2
Copy link

momack2 commented Dec 5, 2022

Yep! just want to clarify that the hydras don't participate in this, and it will not change with the hydras being turned down. 👍

@yiannisbot
Copy link
Member

After a brief discussion with @dennis-tra he brought up the idea of the honeypot deployed for the NAT Hole Punching study being able to help with this. Indeed, the honeypot is acting as a DHT server and also makes itself known to others so that it can attract more clients: https://github.com/dennis-tra/punchr#honeypot. It's also running for a few months now, so information might already be there in the logs?

I've monitored the "Remote Peer Agent Version" reported at: https://punchr.dtrautwein.eu/grafana/d/F8qg0DP7k/punchr-performance?orgId=1 for a while, but haven't seen any kubo/0.16.0/brave peer showing up 🤔

@dennis-tra is the honeypot, or any of our other tools able to get this information?

@dennis-tra
Copy link
Contributor

Yeah, I was thinking that we could use data from the punchr honeypot but I just had a brief look at the data and was searching for the brave agent version. There is no entry in the database that contains the substring brave :/

image

image


However, I just fired up the brave built-in kubo node and requested its AgentVersion + supported protocols. This is the output:

protocols: [/ipfs/bitswap/1.0.0 /libp2p/dcutr /p2p/id/delta/1.0.0 /libp2p/circuit/relay/0.2.0/stop /ipfs/bitswap/1.2.0 /ipfs/bitswap/1.1.0 /ipfs/bitswap /x/ /ipfs/id/1.0.0 /ipfs/id/push/1.0.0 /ipfs/ping/1.0.0 /libp2p/circuit/relay/0.1.0]
agent: kubo/0.16.0/

You can see that the reported agent is kubo/0.16.0/ and not kubo/0.16.0/brave.

@yiannisbot
Copy link
Member

Interesting! Thanks for digging in. So, good news and bad news:

  • Good: we have the infra set up to measure this pretty much directly.
  • Bad: we don't have a good identifier for those nodes (or it's somehow hidden?).

@dennis-tra if we had an identifier (e.g., if they indeed advertised as kubo/0.16.0/brave, do you think we would get an accurate picture of the number of Brave nodes in the network? Do we miss anything that we'd need to track?

@lidel @autonome can you double-check with Brave what agent version they're advertising?

@dennis-tra
Copy link
Contributor

I think there must be a way to extrapolate the number incoming connections to the whole network - perhaps based on the in-degree of the honeypot, which we could get from the crawls.

But this requires a bit of thinking 🤔

@momack2
Copy link

momack2 commented Dec 8, 2022 via email

@autonome
Copy link
Author

autonome commented Dec 8, 2022

Where Brave implemented this: brave/brave-browser#18505

@lidel
Copy link

lidel commented Dec 8, 2022

The suffix was not applied to the user agent exposed via libp2p identify protocol 🙈
Fixed in ipfs/kubo#9457 and will ship with Kubo 0.18 (ipfs/kubo#9417).

We need to wait for Brave to update to Kubo 0.18 to see this. Good news is that they have reliable automatic update mechanism, similar to IPFS Desktop, so no other action is required on their end.

@dennis-tra
Copy link
Contributor

Quick update from our side. With the dashboard from our collaborators, we see the following numbers for the last 7 days:

image

This shows an estimated number of around ~300 brave nodes for the last 7 days. This looks way too low to us.

Our honeypot component (which arguably only covers a tiny bit of the keyspace) shows the following numbers for the last 7 days:

image

This shows the number of unique PeerIDs that have connected to the honeypot with a */brave agent version on a given day. Also, not as many as we would have expected. We're currently thinking about setting up dedicated infrastructure ourselves to measure client activity.

@yiannisbot
Copy link
Member

Just to add that the thinking here is to extrapolate from the portion of the key space that the honeypot is covering to the entire network. Of course, this assumes that there is a uniform distribution of requests across the key space, which isn't very accurate, but gives us a ballpark.

The infrastructure we want to build (that Dennis mentions above) is to have more nodes and cover more of the key space. From that we should get a much more accurate view of the number of */brave client nodes.

@lidel
Copy link

lidel commented May 29, 2023

I know we don't have visibility/extrapolation for full network yet,
but was there any relative change in data produced by our honeypot view since the May 2 (https://brave.com/nft-pinning/)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants