-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DIP-267 Profile discoverability #267
Comments
There is a slightly to much greater legal risk by separating out Profile links instead of requiring them to be part of batched announcements, simply because of there being far more URLs to be stored on the consensus system. It would increase opportunities for corrupted or corrupting (i.e. after the fact) URLs. Content indexers would ignore these URLs and the system would potentially lose many "pairs of eyes" on potentially problematic content. |
Open URLs in consensus system storage I agree is a risk. Let's consider using CIDs only (a la batch publications). Latency in finding profile documents on an external distributed file store (e.g. IPFS) could be mitigated by allowing providers to give hints in the form of URL templates that identify their preferred gateways. (For discussion.) |
An expansion on some comments I made at the Community Call last week. LayersThere are several layers here, and I want to lay them out again for clarity around this discussion.
Layers and this DiscussionThe only layers being discussed here are really 2-3:
ConstraintsAs discussed above, the primary reason for not choosing direct User Data in the initial design is the cost of storage and churn of the additional data.
The current system is better assuming that some of that churn is able to be batched, but does have a longer term growth issue. |
To capture a side discussion: the number of updates required for profiles to be linked from User Data is estimated to be far fewer than the number of social graph updates. In that light, this proposal doesn't change the order of magnitude of consensus system changes significantly. |
Notes from DSNP Spec Community Call 2024-05-02
|
I'll let those closer to implementations look at optimal storage solutions, but compared to graph storage and updates I don't think profile CIDs would add significant requirements. There's a separate discussion to be had about the different types of profile data as noted in the spec call comment. I think the goal at the moment would be to cover the existing (public Activity Content) profile definition, but provide a structure to enable future profile-linked formats via the file type enumeration. The notion of context-based profiles is a deep one and gets into the notion of identity expression via personas. DSNP currently lacks a two-level structure for this, so it is assumed that a DSNP User Id represents a single persona. Structurally, we might consider a way for a user (e.g. a human) to participate in various social networking activities with their choice of personas (and/or participate anonymously in some activities). For separation of personas to be effective, we would like to create a system that (through cryptographic techniques, say) did not enable easy correlation of multiple personas belonging to an individual by a third party. For this reason I think alternate profile expressions are out of scope for this work item, but I'd be open to further discussion. Change visibility: DSNP requires systems to generate and emit State Change Records whenever User Data changes. While systems are not required to make historical data available, observers can still build their own history set and watch for changes that impact their caches. |
@wesbiggs from a Frequency implementation view, usually we build Frequency Schemas to be versioned via the schema identifier as that is needed to parse the particular data. If we did this, I could see here we do an on-chain data structure that is something like we see below with the DSNP Profile version inside of the data structure instead of outside as most are/should be. {
type: "record",
name: "UserProfilePointer",
namespace: "org.dsnp",
fields: [
{
name: "cid",
type: "bytes",
},
{
name: "version",
type: "int",
},
],
} It would be stored in the User data store (Stateful Storage): {
model: profile,
modelType: "AvroBinary",
payloadLocation: "Paginated",
settings: [],
dsnpVersion: "1.x",
} It wouldn't handle the persona issue, which one could argue should be handled at the metadata level, but I think even if there were a persona setup in the generalized profile (instead of being in a group settings or such), then having that data still in one file (with links to media such as profile pictures), still makes sense as the profile, even if it has several duplicated pieces of data, is quite small. Side Note: I could also see the on-chain metadata including a length value for the expected IPFS file as Frequency has for the IPFS Payload Location structure as well. This informs consumers of the data before they attempt to download extremely large files that might not actually be real profiles. |
I'm not following, why is the version needed within the Avro? I think including the expected byte length is a useful addition. |
It isn't required, but it is an optimization for the search. Let's imagine a future where there are 10 non-backward compatible versions. In this case, each query to Frequency to look for a profile could (assuming you wished to support all 10) require up to 10 queries to the chain to discover the profile. On Frequency the map for Stateful storage is User Id, Schema Id. (Then page/item depending). By shifting the version from effectively the Schema Id layer into the data layer, it allows the content version to shift as long as the metadata doesn't. |
I was envisioning that the spec itself might add or replace profile document types over time, hence the inclusion of the I think if we do this we keep it open for future evolution of types while not suffering version-related issues, as the core data (cid, type, and size) remains consistent regardless of the target file type. |
Problem ======= #267 discusses the advantages of DSNP profile documents being discoverable based on a user's identifier, rather than being temporal data that must be externally tracked and indexed to be useful. Solution ======== This proposal adds a new User Data type, `profileResources`, which is currently defined for the Activity Content profile document type, but is extensible to other document types that may need to be user-centric in the future. A profile resource is simply a link type (as an integer) and a CID as a string plus byte length. We define link type 1 for AC profiles. This change assumes we are versioning for 1.3, but does not include versioning updates, which are in other open PRs at this time. Change summary: --------------- - [ ] ~~Updated Spec Versions~~ - [x] Added definition for ProfileResource Avro type - [x] Added entry in User Data page for `profileResource` - [x] Updated navigation; added ProfileResource, moved Profile announcement to "Migrated" section --------- Co-authored-by: Wes Biggs <wes.biggs@amplica.io>
This functionality has been integrated for DSNP 1.3. |
Should Profiles be discoverable without the aid of a content indexer? A Profile link type could be specified as User Data that references a URL and hash. In effect, we would be moving Profile announcements from offline batches to User Data.
Pros:
Cons:
Rejected design option:
Adjacent concerns:
Linked PR:
The text was updated successfully, but these errors were encountered: