Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDEP-0001:hypercore-header-for-manifestfeed #17

Open
1 of 18 tasks
Tracked by #1
serapath opened this issue Mar 23, 2020 · 6 comments
Open
1 of 18 tasks
Tracked by #1

DDEP-0001:hypercore-header-for-manifestfeed #17

serapath opened this issue Mar 23, 2020 · 6 comments

Comments

@serapath
Copy link
Member

serapath commented Mar 23, 2020

@todo

  • write up first draft
  • start a DEP proposal according to 0001-dep-process
  • follow DDEP-0000
  • commit as DDEP-0001:hypercore-header-for-manifestfeed.md

motivation
There are many ways why feeds need to be linked parent to dependant to dependencies, dependencies to dependant, domain to content, feed to author, related feeds amongst each other and I think it would be bad to have everyone (app/protocol/datastructure) make those things up instead of following a general standard

ongoing discussions

  1. (see comments below)
  2. dat comm-comm discussion

messy incomplete list of involve community/ecosystem in no particular order
(feel free to add/correct the list below by mentioning that in a comment)

  1. https://twitter.com/andrewosh
  2. https://twitter.com/mafintosh
  3. https://twitter.com/carax
  4. https://twitter.com/liminalpunk
  5. https://twitter.com/pvh
  6. https://twitter.com/substack
  7. https://twitter.com/hackergrrl
  8. https://twitter.com/elmasse && https://twitter.com/tinchoz49
  9. https://twitter.com/dan_mi_sun
  10. https://twitter.com/pfrazee
  11. https://twitter.com/heapwolf
  12. https://datpy.decentral1.se/
  13. https://datrs.yoshuawuyts.com/
  14. https://twitter.com/zootella
@serapath
Copy link
Member Author

serapath commented Mar 23, 2020

first draft (work in progress)

context

hypercore creates feeds (a.k.a logs)
many feeds are not published via hyperswarm using their discoveryKey,
but instead are indirectly published via hyperswarm using the discoveryKey of a related feed

  1. currently every protocol/structure has a custom way to communicate it's related feeds
    • e.g. hyperdrive has metafeed with a first message to link contentfeed and further messages form a hypertrie to link mounts all signed by author
      • => only the author can spam and needs to be trusted to not do that and to protect against it i can just reject feed updates
    • e.g. cabal clients just send their feedkeys to new clients for collection via a MANIFEST extension message
      • => anyone can spam with new feeds and to protect against spam subjective whitelisting or blacklisting of feeds can be used
  2. It requires a user to use the right client when joining a swarmkey to request feeds
  3. If the user copy/pasted the "dat address" into a "dat cloud service" it would now be up to the cloud to request related

data structures on top of hypercore

data structure types https://github.com/datprotocol/DEPs/blob/master/proposals/0007-hypercore-header.md

Most dataStructureTypes are either using corestore or multifeed:

  1. multifeed: https://github.com/kappa-db/multifeed/blob/master/mux.js#L71

  2. corestore: https://github.com/andrewosh/corestore/blob/master/index.js

    • hyperdrive

1. hypercore

  • creates the basic type of feed
  • has only one main feed associated with a corresponding "dat address"
  • by default is not aware of any related feeds it might reference via messages

2. hypertrie

  • creates a special type of feed
  • has only one main feed associated with a corresponding "dat address"
  • by default is not aware of any related feeds it might reference via messages

3. hyperdrive / corestore

  • creates special type of feeds
  • has main metafeed associated with a corresponding "dat address"
    • in message 0 references a related contentfeed
    • it otherwise is a hypertrie
      • which gives all related feeds via:
      • => trie.list(‘mounts’, { hidden: true }, cb)
  • the contentfeed is published through the swarm of the related metafeed and it's discoveryKey

4. corestore

  • ...
  • used by hyperdrive, ...and more

5. *multifeed & multiplex / kappa-core

  • creates the basic type of feed
  • has only one main feed associated with a corresponding "dat address"
  • by default is not aware of any related feeds it might reference via messages
  • synthesizing materialized views from many hypercores
  • derives views from log and stores them in leveldb
    • e.g. sorted list of cabal messages per channel
      • => key: channel!timestamp, value: feed@seq
      • => leveldb range query pulls out channel msgs quickly for frontend
  • used by kappa-db, cabal-core, ...and more
  • more...
  • used by:
    • cabal
    • mapeo
    • cobox

6. ...

...

problem

a generic seeding service receives a "dat address" and doesn't know which kind of data structure is behind that address and needs to know all related hypercores and how to retrieve them and a standard way of making them available to anyone who wants to use the data and data structure behind that "dat address"

requirements

a generic seeding service should not need any data structure specific code to know how to seed the data. This means the seeding service should not need to know all currently existing data structures build on top of hypercore nor need to be updated for every such data structure created in the future.

community suggestions

nettle:

this seeding service could also see what the first message they get from a peer is, and from there, figure out whether it's {multifeed,hyperdrive,etc} and do sync from there and just speak the major protocols, not as nice as a unified protocol though :)

  1. [connect to an address via hyperswarm]
  2. receive first [protocol] message from a peer to derive the protocol type
  • e.g. multifeed, hyperdrive, ...
  1. support the "major protocols"

# solution proposal (for a DEP about RELATED FEEDS)

after reading through `mux.js`, It made me feel that `"manifest"`
might as well be a `manifestfeed` and the `manifest handshake`
could share the `manifest feedkey` so any peer online
could share it and eventually related feeds while the author is offline.
  1. we might want to use DEP 0007 dataStructureType to identify if a given structure is one of a few supported types or a generic type
  2. we might want to make up a convention using DEP 0007 MyCustomHeader about what generic types are, where the main point would be to expect under AdditionalData an entry for manifest which points to a feedkey of the manifestfeed, which is probably a hypertrie to somehow store all the related feeds

or alternatively

  1. a special EXTENSION MESSAGE with the main feedkey
  2. receiving peers respond with their manifestkey
    • if there is no single author (e.g. multifeed), many manifestkeys exist
  3. the manifestfeed(s) contain a list of related feedkeys to the main feedkey
    • each entry specifies the swarmkey for that related feed and whether its relation is
      • a part (e.g. contenfeed)
      • or a link (e.g. mounts)
      • and maybe an origin to cite reason/proof why a feed is related
        • (e.g. for a contentfeedkey, that could be the hyperdrive metakey + chunk 0 of the real author)

@martinheidegger
Copy link

To my understanding the reason as to why the hyperdrive structure is how it is relates to performance and memory use. Particularly important is the number of round-trips necessary to until a data structure can be explored. If we have to read the manifest data before we can connect to the feeds it is one entire initial roundtrip added, which can easily mean 200ms added delay. Having the url be the identifier on how the feeds are handled might make it a better best-practice. Maybe we could have "patterns" of feed identification. i.e. the hypertrie pattern of hyperdrive 10 or the cabal pattern, which is part of the url specification.

@serapath
Copy link
Member Author

serapath commented Mar 27, 2020

@RangerMauve
Copy link

I think it'd be cool if the related feeds mechanism was used just for cases when a peer doesn't know your data structure and wants to get related feeds for pinning and stuff, while keeping existing mechanisms that data structures use for performance.

Maybe related feeds is something you opt into loading after you get the initial data or through an extension that only certain peers will bother invoking?

@serapath
Copy link
Member Author

serapath commented May 8, 2020

@RangerMauve i hope this is fullfilled by the latest proposal update which i perceive to be:
dat-ecosystem/comm-comm#134 (comment)

@serapath
Copy link
Member Author

serapath commented Jun 12, 2020

additional considerations:

  1. list not only identifiers of related feeds, but also include the latest known tree hash signatures. (e.g. web packages might need this feature to link to snapshots)
  2. There are two general approaches without involving blockchains I can see:
    1. point to a feed from a certificate feed so you can update it (e.g. key rotation to point to new certified replacement feeds)
    2. point from a feed to a certificate feed e.g. in chunk0 so key compromise can't undo that.

On first sight they might both be able to solve the same problem, but they have some nuances especially if that general kind of pointers/links are used for different use cases.

  1. you might just want an option to update which feed you certify, but it doesn't proof a writekey for the certified subfeed in controlled. Multiple entities can both certify a given feed
  2. on the other hand, you could start any feed with a chunk0 that contains among other things information about:
    1. an authorized certificate feed
    2. together with the publickey of the current feed
    3. signed by the writerkey of the certificate feed

That way, nobody can copy such a message to any other feed to associate ownership over e.g. "child porn" with somebody, because the publickey wouldn't match


Whether people outsource maintanance of certificate feed to third parties or themselves, in both cases, a certificate feed could be HOT (especially if it is operated by some kind of trusted third party CA) so that feed having it's own parent certificate feed would be cool

furthermore, it would be cool to be able to specify a ring of certificate feeds (controlled by the same entity or different), where the majority of keys can vote out or vote in new keys to do key rotation to further improve security in case of lost or stolen private keys for such feeds.


All these issues are technically linking together feeds.

  1. related feeds (like e.g. web packages and dependencies or complicated data structures and protocols that require multiple feeds)
  2. and feed revocation mechanisms
  3. also transfer of feeds to new owners could be thinkable in this way

...it's just generally something that is much tougher to bolt onto things later on I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants