DIP-267 Profile discoverability #267

wesbiggs · 2023-12-14T15:58:29Z

Should Profiles be discoverable without the aid of a content indexer? A Profile link type could be specified as User Data that references a URL and hash. In effect, we would be moving Profile announcements from offline batches to User Data.

Pros:

Better alignment with usage patterns expected in social media.
Ability for applications to navigate to profile without the aid of other services.

Cons:

Higher storage cost for consensus system.

Rejected design option:

Putting user-generated "speech" content directly in a consensus system is an anti-goal of DSNP for legal risk reasons as well as technical storage requirement concerns.

Adjacent concerns:

The ability to derive a current URL for a Profile could be used by applications to activate Mention tags without outside knowledge; cf. DIP-266 Activity Content "Mention" alignment #266
Mastodon uses Person instead of Profile, though most of the fields have overlapping semantics.

Linked PR:

DIP-267 Provide a User Data mechanism for links to profiles #277

shannonwells · 2024-03-04T23:36:50Z

There is a slightly to much greater legal risk by separating out Profile links instead of requiring them to be part of batched announcements, simply because of there being far more URLs to be stored on the consensus system. It would increase opportunities for corrupted or corrupting (i.e. after the fact) URLs. Content indexers would ignore these URLs and the system would potentially lose many "pairs of eyes" on potentially problematic content.

wesbiggs · 2024-03-08T13:36:25Z

Open URLs in consensus system storage I agree is a risk. Let's consider using CIDs only (a la batch publications).

Latency in finding profile documents on an external distributed file store (e.g. IPFS) could be mitigated by allowing providers to give hints in the form of URL templates that identify their preferred gateways. (For discussion.)

wilwade · 2024-03-13T17:15:38Z

An expansion on some comments I made at the Community Call last week.

Layers

There are several layers here, and I want to lay them out again for clarity around this discussion.

DSNP Id (aka the user account identifier)
Discovery of the existence of a user's profile
Discovery of the user's current profile document location
Retrieval of the user's current profile document
Validation of the user's current profile document
Parsing of the user's current profile document
Attributes on the user's current profile document
- Public Existence, Public Content (currently the only type)
- Public Existence, Content Protected (There could also be existence protected, but they wouldn't be in the profile document)
- Additional/Conditional: Attributes that only show under some circumstances or secondary profile that overrides the "default" in some situations or such.
Retrieval of references from user's current profile document
- Avatar, etc...

Layers and this Discussion

The only layers being discussed here are really 2-3:

1. Discovery of the existence of a user's profile
- Current: Discovered by exhaustive search of all published Profiles
- Proposed by @wesbiggs: Discovered by direct User Data query by the consensus system
1. Discovery of the user's current profile document location
- Current: Most recent profile that returns
- Proposed by @wesbiggs: Resolved URL or none

Constraints

As discussed above, the primary reason for not choosing direct User Data in the initial design is the cost of storage and churn of the additional data.

Assuming an IPFS data, it is approximately 40 bytes per user. So 100 million users that's about 4 Gb. (Likely more as the overhead isn't exact)
100 million users with assuming only 10% new profiles each year and a 50% churn. That's 60 million transactions or ~2 per second. In the case of a blockchain with a block time is 6 seconds, that's 12 transactions per block just used by profile changes.

The current system is better assuming that some of that churn is able to be batched, but does have a longer term growth issue.

wesbiggs · 2024-04-04T16:06:23Z

To capture a side discussion: the number of updates required for profiles to be linked from User Data is estimated to be far fewer than the number of social graph updates. In that light, this proposal doesn't change the order of magnitude of consensus system changes significantly.

wilwade · 2024-05-02T17:27:41Z

Notes from DSNP Spec Community Call 2024-05-02

Trend to move things into User Data from Announcement Data. We need to profile the data usage on the implementation side
- Various options for implementation, but it makes sense from the perspective of keeping the spec separate.
What about private profile data
What about alternate profile expressions such as into a group
Loss of the ability to know recently changed profiles
- CID changes at least tell you about a change given you have a cached version

wesbiggs · 2024-05-13T17:37:18Z

I'll let those closer to implementations look at optimal storage solutions, but compared to graph storage and updates I don't think profile CIDs would add significant requirements.

There's a separate discussion to be had about the different types of profile data as noted in the spec call comment. I think the goal at the moment would be to cover the existing (public Activity Content) profile definition, but provide a structure to enable future profile-linked formats via the file type enumeration.

The notion of context-based profiles is a deep one and gets into the notion of identity expression via personas. DSNP currently lacks a two-level structure for this, so it is assumed that a DSNP User Id represents a single persona. Structurally, we might consider a way for a user (e.g. a human) to participate in various social networking activities with their choice of personas (and/or participate anonymously in some activities). For separation of personas to be effective, we would like to create a system that (through cryptographic techniques, say) did not enable easy correlation of multiple personas belonging to an individual by a third party. For this reason I think alternate profile expressions are out of scope for this work item, but I'd be open to further discussion.

Change visibility: DSNP requires systems to generate and emit State Change Records whenever User Data changes. While systems are not required to make historical data available, observers can still build their own history set and watch for changes that impact their caches.

wilwade · 2024-05-14T16:22:56Z

@wesbiggs from a Frequency implementation view, usually we build Frequency Schemas to be versioned via the schema identifier as that is needed to parse the particular data.

If we did this, I could see here we do an on-chain data structure that is something like we see below with the DSNP Profile version inside of the data structure instead of outside as most are/should be.

{
  type: "record",
  name: "UserProfilePointer",
  namespace: "org.dsnp",
  fields: [
    {
      name: "cid",
      type: "bytes",
    },
    {
      name: "version",
      type: "int",
    },
  ],
}

It would be stored in the User data store (Stateful Storage):
And the schemas repo style deploy config:

{
      model: profile,
      modelType: "AvroBinary",
      payloadLocation: "Paginated",
      settings: [],
      dsnpVersion: "1.x",
}

It wouldn't handle the persona issue, which one could argue should be handled at the metadata level, but I think even if there were a persona setup in the generalized profile (instead of being in a group settings or such), then having that data still in one file (with links to media such as profile pictures), still makes sense as the profile, even if it has several duplicated pieces of data, is quite small.

Side Note: I could also see the on-chain metadata including a length value for the expected IPFS file as Frequency has for the IPFS Payload Location structure as well. This informs consumers of the data before they attempt to download extremely large files that might not actually be real profiles.

wesbiggs · 2024-05-15T01:59:29Z

I'm not following, why is the version needed within the Avro?

I think including the expected byte length is a useful addition.

wilwade · 2024-05-15T13:34:28Z

I'm not following, why is the version needed within the Avro?

It isn't required, but it is an optimization for the search.

Let's imagine a future where there are 10 non-backward compatible versions. In this case, each query to Frequency to look for a profile could (assuming you wished to support all 10) require up to 10 queries to the chain to discover the profile. On Frequency the map for Stateful storage is User Id, Schema Id. (Then page/item depending).

By shifting the version from effectively the Schema Id layer into the data layer, it allows the content version to shift as long as the metadata doesn't.

wesbiggs · 2024-06-03T22:02:38Z

I was envisioning that the spec itself might add or replace profile document types over time, hence the inclusion of the type enum value for each link record. For example, the DSNP community might start with the existing Activity Content Profile JSON doc, but later decide that something like a Solid WebID Profile is important to enable for various use cases. The user data payload could include both or either of these.

I think if we do this we keep it open for future evolution of types while not suffering version-related issues, as the core data (cid, type, and size) remains consistent regardless of the target file type.

Problem ======= #267 discusses the advantages of DSNP profile documents being discoverable based on a user's identifier, rather than being temporal data that must be externally tracked and indexed to be useful. Solution ======== This proposal adds a new User Data type, `profileResources`, which is currently defined for the Activity Content profile document type, but is extensible to other document types that may need to be user-centric in the future. A profile resource is simply a link type (as an integer) and a CID as a string plus byte length. We define link type 1 for AC profiles. This change assumes we are versioning for 1.3, but does not include versioning updates, which are in other open PRs at this time. Change summary: --------------- - [ ] ~~Updated Spec Versions~~ - [x] Added definition for ProfileResource Avro type - [x] Added entry in User Data page for `profileResource` - [x] Updated navigation; added ProfileResource, moved Profile announcement to "Migrated" section --------- Co-authored-by: Wes Biggs <wes.biggs@amplica.io>

wesbiggs · 2024-07-08T21:12:12Z

This functionality has been integrated for DSNP 1.3.

wesbiggs mentioned this issue Apr 15, 2024

DIP-267 Provide a User Data mechanism for links to profiles #277

Merged

4 tasks

wesbiggs changed the title ~~Discussion: Profile discoverability~~ DIP-267 Profile discoverability Jun 6, 2024

wesbiggs closed this as completed Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DIP-267 Profile discoverability #267

DIP-267 Profile discoverability #267

wesbiggs commented Dec 14, 2023 •

edited

Loading

shannonwells commented Mar 4, 2024

wesbiggs commented Mar 8, 2024

wilwade commented Mar 13, 2024

wesbiggs commented Apr 4, 2024

wilwade commented May 2, 2024

wesbiggs commented May 13, 2024

wilwade commented May 14, 2024 •

edited

Loading

wesbiggs commented May 15, 2024

wilwade commented May 15, 2024

wesbiggs commented Jun 3, 2024

wesbiggs commented Jul 8, 2024

DIP-267 Profile discoverability #267

DIP-267 Profile discoverability #267

Comments

wesbiggs commented Dec 14, 2023 • edited Loading

shannonwells commented Mar 4, 2024

wesbiggs commented Mar 8, 2024

wilwade commented Mar 13, 2024

Layers

Layers and this Discussion

Constraints

wesbiggs commented Apr 4, 2024

wilwade commented May 2, 2024

Notes from DSNP Spec Community Call 2024-05-02

wesbiggs commented May 13, 2024

wilwade commented May 14, 2024 • edited Loading

wesbiggs commented May 15, 2024

wilwade commented May 15, 2024

wesbiggs commented Jun 3, 2024

wesbiggs commented Jul 8, 2024

wesbiggs commented Dec 14, 2023 •

edited

Loading

wilwade commented May 14, 2024 •

edited

Loading