Skip to content

EmLauber/first-party-sets

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

First-Party Sets

This document proposes a new web platform mechanism to declare a collection of related domains as being in a First-Party Set.

A Work Item of the Privacy Community Group.

Editors:

Participate

Table of Contents

Introduction

Browsers have proposed a variety of tracking policies and privacy models (Chromium, Edge, Mozilla, WebKit) which scope access to user identity to some notion of first-party. In defining this scope, we must balance two goals: the scope should be small enough to meet the user's privacy expectations, yet large enough to provide the user's desired functionality on the site they are interacting with.

One natural scope is the domain name in the top-level origin. However, the website the user is interacting with may be deployed across multiple domain names. For example, https://google.com, https://google.co.uk, and https://youtube.com are owned by the same entity, as are https://apple.com and https://icloud.com, or https://amazon.com and https://amazon.de.

We may wish to allow user identity to span related origins, where consistent with privacy requirements. For example, Firefox ships an entity list that defines lists of domains belonging to the same organization. This explainer discusses a mechanism to allow organizations to each declare their own list of domains, which is then accepted by a browser if the set conforms to its policy.

Goals

  • Allow related domain names to declare themselves as the same first-party.
  • Develop a coherent definition of "first-party" vs "third-party" for privacy mechanisms on the web platform.
  • Allow for browsers to understand the relationships between domains of multi-domain sites such that they can effectively present that information to the user.
  • Uphold existing web security principles such as the Same Origin Policy.

Non-goals

  • Expansion of capabilities beyond what is possible without recent browser-imposed privacy mitigations such as restrictions on third party cookies or cache partitioning.
  • Third-party sign-in between unrelated sites.
  • Information exchange between unrelated sites for ad targeting or conversion measurement.
  • Other use cases which involve unrelated sites.
  • Define specific UI treatment.

(Some of these use cases are covered by other explainers from the Privacy Sandbox.)

Use Cases

On the modern web, sites span multiple domains and many sites are owned & operated by the same organization. Organizations may want to maintain different top-level domains for:

  • App domains - a single application may be deployed over multiple domains, where the user may seamlessly navigate between them as a single session.
    • office.com, live.com, microsoft.com (reference)
    • lucidchart.com, lucid.co, lucidspark.com, lucid.app (reference)
  • Brand domains
    • uber.com, ubereats.com
  • Country-specific domains to enable localization
    • google.co.in, google.co.uk
  • Common eTLD
    • For example, gov.uk, and service.gov.uk are on the Public Suffix List and have UK government agencies/services as subdomains which get treated as separate registrable domains by browsers; but share services such as consent management that rely on access to cross-domain cookies.
  • Sandbox domains that users never directly interact with, but exist to isolate user-uploaded content for security reasons.
    • google.com, googleusercontent.com
    • github.com, githubusercontent.com
  • Service domains that users never directly interact with, but provide services across the same organization’s sites. - github.com, githubassets.com - facebook.com, fbcdn.net

Note: The above have been provided only to serve as real-world illustrative assumed examples of collections of domains that are owned by the same organization; and have not all been validated with the site owners.

Without compatibility measures such as Firefox and Edge browsers’ use of Disconnect.me’s Entities list, blocking cross-site communication mechanisms such as access to third-party cookies breaks many first-party use-cases.

First-Party Sets is a proposal to standardize a mechanism that solves this issue in a coherent way by declaring a collection of domains as being part of the same site or 'party'; so that they can be treated as one privacy boundary. This allows for browsers to enable protections against tracking across this privacy boundary, and ensures continued operation of existing functionality which would otherwise be broken by blocking cross-domain cookies (“third-party cookies”). It would support seamless operation of functionality such as:

  • Sign-in across owned & operated properties
    • bbc.com and bbc.co.uk
    • sony.com and playstation.com
  • Support for embedded content from across owned & operated properties (e.g. videos/documents/resources restricted to the user signed in on the top-level site)
  • Separation of user-uploaded content from other site content for security reasons, while allowing the sandboxed domain access to authentication (and other) cookies. For example, Google sequesters such content on googleusercontent.com, GitHub on githubusercontent.com, CodePen on cdpn.io. Hosting untrusted, compromised content on the same domain where a user is authenticated may result in attackers’ potentially capturing authentication cookies, or login credentials (in case of password managers that scope credentials to domains); and cause harm to users.
  • Shared services, such as consent management across domains with a common eTLD suffix; such as gov.uk. Repeatedly asking for cookie consent on individual gov.uk sites may be confusing to users, erode trust in the website’s functioning, and cause fatigue; because users think of all subdomains as being part of one gov.uk website.
  • Analytics/measurement of user journeys across O&O properties to improve quality of services.

Applications

In support of the various browser privacy models, first-party sets only control when embedded content that would otherwise be considered third-party can access its own state. Examples:

  • Sites may annotate individual cookies to be sent across same-party, cross-domain contexts by using the proposed SameParty cookie attribute.
  • Top-level key for partitioned cookies a.k.a “chips”. This allows third-party sites (such as embedded SaaS providers) to provide access to the same user session across multiple top-level sites within the same first-party set (reference use-case)
  • Issuing WebID directed identifiers by First-Party Set, so the same account can be shared across multiple applications or services provided by the same first-party.
  • Applying Privacy Budget across an entire First-Party Set, in order to prevent fingerprinting entropy from being accumulated across domains that are able to communicate in an unconstrained manner due to access to cross-domain, same-party cookies.
  • Top and/or second level key for cache partitioning, potentially with site opt-in.

Site-Declared Sets in Browsers

Browsers should maintain a static list of site-declared groups of domains which meet UA (User Agent) policy, and ship it in the browser as a reliably updateable component. This is analogous to the list of domains owned by the same entity used by Edge and Firefox to control cross-site tracking mitigations.

The differences between this proposal and the use of the Disconnect entities list in Edge and Firefox are:

  • All sites with use-cases that depend on cross-domain, same-party communication will be required to declare a set for the corresponding group of sites. As opposed to the Disconnect list, which only applies to sites classified as a tracker.

  • Site authors must submit their First-Party Set declarations for acceptance (see UA Policy for proposed documented criteria).

  • Sets will expire after a prescribed period of time, and be required to undergo renewal. This prevents sets from becoming stale, in case domain ownership changes.

  • Each set is indicated by the owner site, and member sites.

    { owner: "https://fps-owner.example", 
      members: ["https://fps-member1.example",
      "https://fps-member2.example"]}
    
    

Technical consistency and freshness checks must be performed on the list:

  • No domain can appear in more than one set.
  • Expired sets must be removed.

A different approach that does not involve consumption of a static list is discussed in the Alternative designs section

Acceptance Process

This section proposes a possible model for a First-Party Set acceptance process that could be shared across all browsers. However, many aspects of the process and policy will need to be tuned based on feedback from the web ecosystem.

Submission

Sites will need to submit their proposed group of domains to a public tracker (such as a dedicated GitHub repository, like that of the Public Suffix List, and Disconnect’s entities list), along with information needed to satisfy the UA policy. Technical verification of the submitter’s control over the domains may also require a challenge to be served at a .well-known location on each of the domains in the set.

UA Policy

For a set of guiding principles in defining UA policy, we can look to how the various browser proposals describe first parties (emphasis added):

  • A Potential Privacy Model for the Web (Chromium Privacy Sandbox): "The notion of "First Party" may expand beyond eTLD+1, e.g. as proposed in First Party Sets. It is reasonable for the browser to relax its identity-sharing controls within that expanded notion, provided that the resulting identity scope is not too large and can be understood by the user."
  • Edge Tracking Protection Preview: "Not all organizations do business on the internet using just one domain name. In order to help keep sites working smoothly, we group domains owned and operated by the same organization together."
  • Mozilla Anti-Tracking Policy: "A first party is a resource or a set of resources on the web operated by the same organization, which is both easily discoverable by the user and with which the user intends to interact."
  • WebKit Tracking Prevention Policy: "A first party is a website that a user is intentionally and knowingly visiting, as displayed by the URL field of the browser, and the set of resources on the web operated by the same organization." and, under "Unintended Impact", "Single sign-on to multiple websites controlled by the same organization."

In addition, the DNT specification defines “party” as: “a natural person, a legal entity, or a set of legal entities that share common owner(s), common controller(s), and a group identity that is easily discoverable by a user.”

We propose the following high level policy as an initial version for discussion, subject to change based on ecosystem feedback:

  • Domains must have a common owner, and common controller.
  • Domains must share a common group identity that is easily observable by users.
  • Domains must share a common privacy policy that is surfaced to the user via UI treatment.

We expect the UA policy to evolve over time as use cases and abuse scenarios come up. For instance, otherwise unrelated sites forming a consortium in order to expand the scope of their site identities would be considered abuse.

Verification Entity

An independent entity must verify that submissions conform to the documented UA policy before acceptance. The entity must also assign an expiration date, following which sets are removed from the browser-baked static lists.

The possibility of purely technical enforcement without a verification entity is discussed in the Alternative Designs section.

Administrative controls

For enterprise usage, browsers typically offer administrators options to control web platform behavior. UA policy is unlikely to cover private domains, so browsers might expose administrative options for locally-defined first-party sets.

UI Treatment

In order to provide transparency to users regarding the First-Party Set that a web page’s top-level domain belongs to, browsers may choose to present UI with information about the First-Party Set owner and the members list. One potential location in Chrome is the Origin/Page Info Bubble - this provides requisite information to discerning users, while avoiding the use of valuable screen real-estate or presenting confusing permission prompts. However, browsers are free to choose different presentation based on their UI patterns, or adjust as informed by user research.

Note that First-Party Sets also gives browsers the opportunity to group per-site controls (such as those at chrome://settings/content/all) by the “first-party” boundary instead of eTLD+1, which is not always the correct site boundary.

Domain Schemes

In accordance with the Fetch spec, user agents must "normalize" WebSocket schemes to HTTP(S) when determining whether a particular domain is a member of a First-Party Set. I.e. ws:// must be mapped to http://, and wss:// must be mapped to https://, before the lookup is performed.

User agents need not perform this normalization on the domains in their static lists; user agents may reject static lists that include non-HTTPS domains.

Clearing Site Data on Set Transitions

Sites can change which First-Party Set they are a member of. We need to pay attention to these transitions so that they don’t link user identities across all the FPSs they’ve historically been in. In particular, we must ensure that a domain cannot transfer a user identifier from one First-Party Set to another when it changes its set membership.

In order to achieve this, site data needs to be cleared on certain transitions. The clearing should behave like Clear-Site-Data: "*", which includes cookies, storage, cache, as well as execution contexts (documents, workers, etc.). We don’t differentiate between different types of site data because:

  • A user identifier could be stored in any of these storage types.
  • Clearing just a few of the types would break sites that expect different types of data to be consistent with each other.

Since member sites can only add/remove themselves to/from FPSs with the consent from the owner, we look at first-party set changes as a site changing its FPS owner.

If a site’s owner changed:

  1. If this site had no FPS owner, the site's data won't be cleared.
    • Pro: Avoids adoption pain when a site joins a FPS.
    • Con: Unclear how this lines up with user expectations about access to browsing history prior to set formation.
  2. Otherwise, clear site data of this site.

Potential modification, which adds implementation complexity:

  1. If this site's new owner is a site that previously had the same FPS owner as the first site, the site's data won't be cleared.
    • Pro: Provides graceful transitions for examples (f) and (g).
    • Con: Multi-stage transitions, such as (h) to (i) are unaccounted for.

Examples


a. Site A and Site B create a FPS with Site A as the owner and Site B as the member. Site data will not be cleared.

b. Site C joins the existing FPS as a member site where Site A is the owner. Site data will not be cleared.


c. Given an FPS with owner Site A and members Site B and Site C, if Site D joins this FPS and becomes the new owner; the previous set will be dissolved and the browser will clear data for Site A, Site B and Site C.

d. Given an FPS with owner Site A and members Site B and Site C, if Site B leaves the FPS, the browser will clear site data for Site B.

e. Given two FPSs, FPS1 has owner Site A and members Site B and Site C and FPS2 has owner Site X and member Site Y, if they join together as one FPS with Site A being the owner, the browser will clear site data for Site X and Site Y.


With the potential modification allowing sites to keep their data if the new set owner was a previous member:

f. Given an FPS with owner Site A and members Site B and Site C, if no site is added or removed, just Site C becomes the owner and Site A becomes the member, no site data will be cleared.

g. Given an FPS with owner Site A and members Site B and Site C, if Site A leaves the FPS and Site B becomes the owner, the browser will clear site data for Site A.

h. & i. Given the FPS with owner Site A and member Site B and Site C, if Site D joins this set as a member and later becomes the owner, site data of Site A, Site B and Site C is only preserved if the user happens to visit during the intermediate stage.

Alternative designs

Signed Assertions and set discovery instead of static lists

Static lists are easy to reason about and easy for others to inspect. At the same time, they can develop deployment and scalability issues. Changes to the list must be pushed to each user's browser via some update mechanism. This complicates sites' ability to deploy new related domains, particularly in markets where network connectivity limits update frequency. They also scale poorly if the list gets too large.

The Signed Assertions based design proposes an alternative solution that involves the browser learning the composition of sets directly from the websites that the user visits. To prevent privacy risks from personalized sets and ensure policy conformance, they are still verified by an independent entity through a digital signature.

This design is significantly more complex than the consumption of a static list, especially when implementing discovery and fetching of sets in a privacy-preserving manner. As such, we prefer to start with the simpler static list approach, leaving the possibility of introducing a more complex alternative in the future.

Using EV Certificate information for dynamic verification of sets

Extended Validation (EV) Certificates, in addition to backing encrypted exchange of information on the web, require verification of the legal entity associated with the website a certificate is issued for and encode information about this legal entity in the certificate itself. It might be possible to match this information for sites presenting EV certificates (or use the subjectAltName on a single EV certificate) to build First-Party Sets. This could be used in place of Signed Assertions as part of a dynamic set discovery mechanism.

However, such an automatic mechanism would result in a very tight coupling of identity and feature exposure through First-Party Sets to the existing certificate infrastructure.

It's likely that this would negatively impact the deployment and use of encryption on the web, for example by forcing sites to obtain EV certificates as the only way to ensure continued functionality. A revocation of a certificate that is used for FPS would have grave implications (such as deletion of all local data through the Clear Site Data mechanism) and thus complicate the revocation process.

See Issue 12 for an extended discussion.

Self-attestation and technical enforcement

Instead of having a verification entity check conformance to policy; it may be possible to rely on a combination of:

  • Self-attestation of UA Policy conformance by submitter.
  • Technical consistency checks such as verifying control over domains, and ensuring that no domain appears in more than one set.
  • Transparency logs documenting all acceptances and deletions to enable accountability and auditability.
  • Mechanism/process for the general public to report potential violations of UA Policy.

However, at this time we do not believe it is possible to enforce against the formation of consortiums of unrelated entities, and thus will require some form of verification entity to guard against that.

Origins instead of registrable domains

A first-party set is a collection of origins, but it is specified by registrable domains, which carries a dependency on the public suffix list. While this is consistent with the various proposed privacy models as well as cookie handling, the security boundary on the web is the origin, not registrable domain.

An alternate design would be to instead specify sets by origins directly. In this model, any https origin would be a possible first-party set owner, and each origin must individually join a set, rather than relying on the root as we do here. For continuity with the existing behavior, we would then define the registrable domain as the default first-party set for each origin. That is, by default, https://foo.example.com, https://bar.example.com, and https://example.com:444 would all be in a set owned by https://example.com. Defining a set explicitly would override this default set.

This would reduce the web's dependency on the public suffix list, which would mitigate various problems. For instance, a university may allow students to register arbitrary subdomains at https://foo.university.example, but did not place university.example on the public suffix list, either due to compatibility concerns or oversight. With an origin-specified first-party set, individual origins could then detach themselves from the default set to avoid security problems with non-origin-based features such as cookies. (Note the __Host- cookie prefix also addresses this issue.)

This origin-defined approach has additional complications to resolve:

  • There are a handful of features (cookies, document.domain) which are scoped to registrable domains, not origins. Those features should not transitively join two different sets. For instance, we must account for one set containing https://foo.bar.example.com and https://example.com, but not https://bar.example.com. For cookies, we can say that cookies remember the set which created them and we match both the Domain attribute and the first-party set. Thus if https://foo.bar.example.com sets a Domain=example.com cookie, https://example.com can read it, but not https://bar.example.com. Other features would need similar updates.
  • The implicit state should be expressible explicitly, to simplify rollback and deployment, which means first-party set manifests must describe patterns of origins, rather than a simple bounded list of domains. In particular, we should support subtree patterns.
  • https://foo.example.com's implicit owner is https://example.com. If https://example.com then forms an explicit set which does not include https://foo.example.com, we need to change https://foo.example.com's implicit state, perhaps to a singleton set.
  • This complex set of patterns and implicit behaviors must be reevaluated against existing origins every time a first-party set is updated.
  • Certificate wildcards (which themselves depend on the public suffix list) don't match an entire subtree. This conflicts with wanting to express implicit states above.

These complexities are likely solvable while keeping most of this design, should browsers believe this is worthwhile.

Security and Privacy Considerations

Avoid weakening new and existing security boundaries

Changes to the web platform that tighten boundaries for increased privacy often have positive effects on security as well. For example, cache partitioning restricts cache probing attacks and third-party cookie blocking makes it much harder to perform CSRF by default. Where user agents intend to use First-Party Sets to replace or extend existing boundaries based on site or origin on the web, it is important to consider not only the effects on privacy, but also on security.

Sites in a common FPS may have greatly varying security requirements, for example, a set could contain a site storing user credentials and another hosting untrusted user data. Even within the same set, sites still rely on cross-site and cross-origin restrictions to stay in control of data exposure. Within reason, it should not be possible for a compromised site in an FPS to affect the integrity of other sites in the set.

This consideration will always involve a necessary trade-off between gains like performance or interoperability and risks for users and sites. User agents should facilitate additional mechanisms such as a per-origin opt-in or opt-out to manage this trade-off. Site owners should be aware of the potential security implications of creating an FPS and form only the smallest possible set of domains that encompasses user workflows/journeys across an application, especially when some origins in the set opt into features that may leave them open to potential attacks from other origins in the set.

Prior Art

Acknowledgements

This proposal includes significant contributions from previous co-editor, David Benjamin.

We are also grateful for contributions from Chris Fredrickson and Shuran Huang.

Appendix

SameParty Cookies and First-Party Sets

Sites may annotate individual cookies to be sent across same-party, cross-domain contexts by using the proposed SameParty cookie attribute.

To illustrate the above use cases, we'll suppose that https://member1.example and https://member2.example are in the same first-party set, and consider the following two pages.

Cross-party and same-party embeddings

On browsers where cross-site tracking protections are enabled, the first page, case a, is hosted on a third-party domain (https://other.example) and embeds an iframe from https://member1.example. We say that this iframe is in a cross-party context, since the top-level frame's domain is not in the same first-party set as the embedded iframe's domain. The second page, case b, is hosted on https://member2.example, and also embeds an iframe from https://member1.example. We say that this iframe is in a same-party context, since the top-level frame's domain is in the same first-party set as the iframe's domain. The aforementioned uses of first-party sets aim to grant a site access to its own state (e.g. cookies) when in a same-party context (case b), while blocking access when in a cross-party context (case a).

  • In case a, https://member1.example's SameParty cookie is not sent in the iframe's subresource request, since the iframe is in a cross-party context.
  • In case b, https://member1.example's SameParty cookie is sent in the iframe's subresource request, since the iframe is in a same-party context.

Note that First-Party Sets does not grant access to one domain's state to any other domain, regardless of the context, in this example. I.e., neither https://other.example nor https://member2.example ever have access to https://member1.example's cookies.

The above example (where access to a domain's own cookies is granted when embedded in certain domains, but is disallowed when embedded in others) is not possible without a proposal like First-Party Sets.

This proposal is consistent with the same-origin policy. That is, Web Platform features must not use first-party sets to make one origin's state directly accessible to another origin in the set. For example, if a.example and b.example are in the same first-party set, the same-origin policy would still prevent https://a.example from accessing https://b.example's cookies or IndexedDB databases.