Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate the contents of identity centric metadata #8635

Open
5 of 6 tasks
dstufft opened this issue Sep 30, 2020 · 35 comments · Fixed by #16472
Open
5 of 6 tasks

Validate the contents of identity centric metadata #8635

dstufft opened this issue Sep 30, 2020 · 35 comments · Fixed by #16472
Labels
needs discussion a product management/policy issue maintainers and users should discuss

Comments

@dstufft
Copy link
Member

dstufft commented Sep 30, 2020

Currently if I'm looking at a project on PyPI, it can be difficult to determine if it's "real" or not. I can look and see the user names that are publishing the project as well as certain key pieces of metadata such as the project home page, the source repository, etc.

Unfortunately, there's no way to verify that a project that has say.. https://github.com/pypa/pip in it's home page, is actually the real pip, and isn't a fake imposter pip. The same could go for other URLs, or email addresses etc. Thus it would be useful if there was some way to actually prove ownership of those URLs/emails, and either differentiate them in the UI somehow, or hide them completely unless they've been proven to be owned by one of the publishing users.


Metadata to verify:

@jamadden
Copy link
Contributor

I recall issues like this coming up at least once, if not a few times, over in pypi-support. Someone would fork a repository, change the name in setup.py, and then upload it to PyPI, with or without any other changes. All of the documentation and other links would remain pointed to the originals. This was confusing for users and frustrating to owners of the original package.

So I'm 👍 for some sort of blue verified checkmark or something from that perspective.

With my publisher hat on, though, I would hope this would be completely automated and I wouldn't have to do anything special to earn that blue checkmark.

@di di added the needs discussion a product management/policy issue maintainers and users should discuss label Sep 30, 2020
@ewjoachim
Copy link
Contributor

ewjoachim commented Oct 2, 2020

One idea: we could add a blue checkmark for all links in the sidebar that contain a link back to the project's pypi page or pip install <project-name>. This would force us to load those links on the server, but it would be zero-effort for most packages.

That being said, it wouldn't help if they point to forked versions, but in that case, the github star count might be a tell.

@calebbrown
Copy link

👍

Any progress on this issue?

I've been looking at malware from PyPI and it is common for the author_email to be "spoofed" (either pointing to nowhere, or using somebody else's email address).

Some related context is this HN discussion: https://news.ycombinator.com/item?id=33438678 Many commenters are asking about providing this sort of information.

I see some considerations that need discussion:

  • how do you support small personal projects (one or two maintainers) and large projects (big team, company) at the same time?
  • how do you provide verification signals without creating a false sense of security? (e.g. hide unvalidated data from the UI versus showing a blue checkmark against it)

Some validation is easier than others as well - e.g. email validation is pretty straightforward, but homepage validation would require something like the ACME protocol.

@ewjoachim
Copy link
Contributor

ewjoachim commented Nov 9, 2022

Haha, rereading my 2-year-old comment above about a blue check marks seems to resonate strangely in today's terms 😅

Who would have guessed...

@di
Copy link
Member

di commented Nov 15, 2022

My general thoughts here is that for metadata that we can 'verify', we should probably elevate that metadata in the UI over 'unverified' metadata.

We can already validate email addresses that correspond to verified emails of maintainers. That won't include the ability to verify mailinglist-style emails, but that could potentially be added to organizations once that feature lands.

With #12465, we'll be able to 'validate' the source repository as well, so any metadata that either references the given upstream source repository can be considered verified as well.

I agree that domains/urls will need to use the ACME protocol or something similar. I think there's probably a UX question on how these would be done per-project, if we wanted to go that route.

@ewjoachim
Copy link
Contributor

Mastodon has a link verification system, that might be nice.

That's never going to be foolproof though.

@miketheman
Copy link
Member

Related: #8462 #10917

@jayaddison
Copy link
Contributor

From attempting to perform identity-assurance checks on packages manually: bidirectional references can be a reassuring indicator.

In context here: when a PyPi package points to a GitHub repository as its source code, then that's interpretable as a useful but as-yet-untrusted statement. When up-to-date references are inspected within the contents of the cloned linked repository and they point back to the same original package on PyPi, then confidence in the statement increases.

For reproducible-build-compliant packages the situation improves further: any third party can confirm not only that the source origin and package destination are in concordance, but also whether the published artifact from the destination is bit-for-bit genuine by comparing it a build-from scratch of the corresponding raw origin source materials. This can be verified on both a historic and ongoing basis.

So that's two orthogonal identity validation mechanisms:

  • Is-where-it-claims-to-be (it's possible to navigate a graph from the content distribution location A to the source at B and then from B back again to A).
  • Is-what-it-claims-to-be (the claimed source code for a package and version builds into the bit-for-bit identical content found at the content distribution location for the same package and version).

These don't prevent an attacker copying the source in entirety and creating a duplicate under a different name with an internally-consistent reference graph. Given widespread free communication I think it's reasonable to expect that enough of the package consumer population will be (or become) aware of and gravitate towards the authentic package to solve that problem.

@di
Copy link
Member

di commented Mar 19, 2024

Following on to my previous comment, here's a mockup of what I'm imagining to separate the metadata we can verify today (source repository, maintainer email, GitHub statistics, Owner/Maintainers) from the unverifiable metadata:

image

  • "Source" should only be verified as a project link if the project has a Trusted Publisher with a matching URL, and the publisher has been used at least once to publish to this project
  • "Metadata" should only be emails that are included in Author-Email or Maintainer-Email that are also verified user emails for any collaborator on the project
  • "GitHub Statistics" should only be included if "Source" is verified
  • "Owner"/"Maintainers" should always be included.

Over time we can move things from below the fold to above it, but this should be a big improvement as-is for now.

I pushed the diff for the mockup here, there's some hacky stuff in there just to get the mockup to look good, but it could be a good starting point.

@javanlacerda
Copy link
Contributor

javanlacerda commented Mar 21, 2024

I'm starting working on this for creating the verified session and adding "Owner"/"Maintainers" on that :)

@ewjoachim
Copy link
Contributor

ewjoachim commented Mar 22, 2024

I wonder if it makes more sense to have verified details and then unverified details or to have each category with a verfified sub-section and a non-verified sub-section. It feels weird to break the project links apart from one another. When your eyes have reached the place where the repository is, it's not very clear that if the documentation isn't there, you have to look a some place else entirely to find a different link section that might contain the link to the docs.

I'd even argue that in this case, the whole thing would look more readable if the project doesn't use trusted publishers, which is

What about something like this ? (not arguing it's better, just a suggestion for the discussions)
image
(also, this needs a link, or a hover infobox or something to lead people to the documentation that says what this means, what this certifies, and how to get the various parts of their metadata certified)

(Would @nlhkabu have an opinion on the matter ?)

@di
Copy link
Member

di commented Aug 12, 2024

#16205 Starts marking URLs as verified, this now just needs to be surfaced in the UI.

@facutuesca
Copy link
Contributor

#16205 Starts marking URLs as verified, this now just needs to be surfaced in the UI.

I'm working on the UI part now

@di
Copy link
Member

di commented Aug 13, 2024

Reopening this, we have solved this for a subset of project urls that relate to Trusted Publishing, but these two remain:

"Metadata" should only be emails that are included in Author-Email or Maintainer-Email that are also verified user emails for any collaborator on the project
"GitHub Statistics" should only be included if "Source" is verified

I think we also want to think about a solution for validating non-trusted publisher project URLs (e.g., ACME).

@di di reopened this Aug 13, 2024
@woodruffw
Copy link
Member

woodruffw commented Aug 14, 2024

I think we also want to think about a solution for validating non-trusted publisher project URLs (e.g., ACME).

I am a massive fan of this idea 🙂

To spitball a little bit, a given FQDN's (foo.example.com) resources could be considered verified for a project example if:

  1. foo.example.com is a secure origin (HTTPS)
  2. foo.example.com/.well-known/pypi exists and is JSON
  3. The contents of the pypi JSON resource are something like this:
{
  "version": 1,
  "packages": ["example"]
}

(where packages is plural since a FQDN may host links/resources for multiple projects.)

Another lower-intensity option would be the rel="me" approach that Mastodon and similar services use. This approach has the benefit of being per-resource (meaning that the entire FQDN isn't considered valid, since the user may not control the entire FQDN), at the cost of requiring the user to tweak their HTML slightly. Using example again:

  1. foo.example.com is a secure origin
  2. /some/resource on foo.example.com is (X)HTML
  3. /some/resource contains a <link ...> as follows:
<head>
  <link rel="me" href="https://pypi.org/p/example">
</head>

Like with the .well-known approach, the user could include multiple <link> tags to assert ownership of multiple PyPI projects.

Alternatively, this could use meta instead to prevent implicit resolution of links:

<head>
  <meta rel="me" namespace="pypi.org" package="example">
</head>

...or even multiple in the same tag:

<head>
  <meta rel="me" namespace="pypi.org" package="example another-example">
</head>

Edit: one downside to the rel="me" approach is that PyPI needs to parse (X)HTML.

Ref on Mastodon's link verification: https://docs.joinmastodon.org/user/profile/#verification

@di
Copy link
Member

di commented Aug 14, 2024

Using .well-known would be interesting but I think this wouldn't cover enough use cases (I'm thinking things like Read the Docs pages, etc, where the user doesn't control the FQDN).

Using rel="me" makes sense to me. We probably also need to think about verifying these outside the upload loop via a task.

@woodruffw
Copy link
Member

Using .well-known would be interesting but I think this wouldn't cover enough use cases (I'm thinking things like Read the Docs pages, etc, where the user doesn't control the FQDN).

Yeah -- my thinking was that it'd be most useful for projects/companies with full-blown domains, e.g. company $foo might want to blanket-verify all of their PyPI project names without having to add them to each docs page/homepage. But that's probably a much more niche case than Read The Docs etc. 🙂

Using rel="me" makes sense to me. We probably also need to think about verifying these outside the upload loop via a task.

Makes sense! That reminded me: do we also want to consider periodic reverification? My initial thought is "no" since URLs are verified on a per-release basis, but I could see an argument for doing it as well (or at least giving project owners the ability to click a "reverify" button once per release or similar).

@di
Copy link
Member

di commented Aug 14, 2024

I think I would want to stick to the "verified at time of release" model, at least for now (not trying to reinvent Keybase here).

@ewjoachim
Copy link
Contributor

If we specifically plan for this to be used on ReadTheDocs, it makes sense to ensure that whatever format we decide on is easy to use with Sphinx & mkdocs.

I've made a small test with Sphinx:

.. meta::
   :rel=me namespace=pypi.org package=example: pypi

produces

<meta content="pypi" namespace="pypi.org" package="example" rel="me" />

Note: it's going to be much harder to have multiple values in package separated by a space as it would break the meta directive parsing.
It's easy to have multiple meta tags though:

.. meta::
   :rel=me namespace=pypi.org package=example: pypi
:rel=me namespace=pypi.org package=other: pypi

Also, it's impossible to not have a content element, as far as I can tell. But then maybe instead of package= we should list the packages in content, which would solve both issues at once:

.. meta::
   :rel=me namespace=pypi.org: example other
<meta content="example other" namespace="pypi.org" rel="me" />

On mkdocs, it looks like one would need to extend the theme, which is a bit cumbersome. But I'm sore someone would make a plugin soon enough to solve this (it's probably the same with sphinx, realistically)

@woodruffw
Copy link
Member

Thanks for looking into that @ewjoachim! Using content for the package(s) makes sense to me, given that 🙂

@facutuesca
Copy link
Contributor

facutuesca commented Aug 26, 2024

I'm working on this. One thing that we should be aware of is that implementing this kind of verification, where each URL is accessed and parsed to see if it contains the meta tag specified above, means that warehouse will start doing a lot more outgoing requests.

I'm not sure of how many releases are uploaded per second (let's call it N), but assuming (conservatively) that each of those releases has 1 URL in its metadata, that means that PyPI will have at least N new outgoing requests per second, to arbitrary URLs.

We can have restrictions to reduce network activity and protect against DoS attacks (like what Mastodon does, limiting the response size to 1 MB), but we'll still need to handle all the new outgoing requests.

Since I don't know the number of releases uploaded per second, I'm leaving the question open here to see if PyPI's infrastructure can handle the extra network activity downloading those webpages would cause.

@woodruffw
Copy link
Member

Yeah, thanks for calling that out! I think there are a few things PyPI could do to keep the degree of uncontrolled outbound traffic to a minimum:

  • Limit the number of URLs allowed per release. I suspect that there are no (legit?) projects that expect to attach more than a dozen unique URLs to each release, but we should probably confirm that 🙂
  • Limit the number of unique FQDNs during verification, regardless of number of metadata URLs.
  • Limit the amount of allowed network traffic, like you said. In practice I think we can even do well under 1MB, since the relevant <meta> tag should be in the <head>.
  • "Collate" verified URLs, i.e. don't re-perform URL verification if another file in the same release has already verified the URL within a particular time window (~15-60 minutes?)

PyPI could do some or all of them; I suspect limiting unique FQDNs might be a little too extreme compared to the others.

@ewjoachim
Copy link
Contributor

ewjoachim commented Aug 26, 2024

Also, as far as I can tell, pypi.org's DNS point to fastly. If we were to easily know the IPs of the real servers beneath fastly, DDoS attacks could become easier. We need to make sure that the outgoing ip used to connect to the website is not same as the inbound IP for the website. It's probably the case as the worker run on different machines, but it's worth mentally checking this by anyone who knows the infrastructure well enough. (Also, ensure that inbound traffic that's not coming from fastly of firewalled, it's probably already the case but in case it's not, it's probably worth doing it)

Limit the number of unique FQDNs during verification, regardless of number of metadata URLs.

Also, limit the number of underlying IPs after DNS resolving, and/or domain names, because if a server has a wildcard DNS, we can generate infinite FQDNs that will hit the same server.

Oh, and we may want to ensure we put a reasonable timeout for the requests (for us).

Also, we may want to control (& advertise) the user agent we use to make those requests. Potentially also the outbounds IPs if possible. Some large players (RTD, github.io, ...) might found out we make a large number of requests to them that would fall in their own rate limit, but they might be inclined to put a bypass for us, and it's much easier if we make it clear how to identify our requests.

And maybe keep a log of all the requests we made and what release it's linked to ? Could we end up in trouble if someone makes us make a request to illegal content ? Could the PyPI IPs end up on some FBI watchlist ? (I wonder if using Cloudflare's 1.1.1.3 resolver "for families" that blocks malware & adult content could mitigate this risk... But I don't know if it's within the terms of use)

Oh, we also need to protect ourselves from SSRF, even though in this case we're not displaying what we requested back to the user, in the hopefully inexistant but possible case that an internal GET request can have side effect, this could be catastrophic. e.g. we're on AWS, if the user publishes a package with URL http://169.254.169.254/latest/meta-data/, then suddenly the page we manipulate contains an usable AWS token from the machine that made the request. This is not a problem as we're just going to disregard it due to not containing the meta tag (not tag at all, it's json). We need to make sure that the URL contains a proper domain (not an IP, not localhost)

Oh and MITM of course. We should only try to validate HTTPS urls, validating an HTTP URL would only lead to an untrustable result.

Just for completeness: if the page is cut due to being more than 1MB, but we still want to check the headers, we'll need a HTML parser that doesn't crash on partial content.

Should we request <domain>/robots.txt and do something in case it disallows us?

I guess this is the kind of question everyone should ask when they implement a webhook or a crawler or anything. There surely is a resource out there from people who have solved these headaches already.

@woodruffw
Copy link
Member

Also, we may want to control (& advertise) the user agent we use to make those requests.

Yes, this makes a lot of sense to do!

And maybe keep a log of all the requests we made and what release it's linked to ? Could we end up in trouble if someone makes us make a request to illegal content ?

This log would presumably just be the list of URLs listed in the project's JSON API representation/latest release page on PyPI, no? I'm personally wary of PyPI retaining any more data than absolutely necessary, especially since in this case we're not actually storing any data from the URL, only confirming that the URL is serving HTML with a particular <meta> tag at a particular point in time.

Oh, we also need to protect ourselves from SSRF, even though in this case we're not displaying what we requested back to the user, in the hopefully inexistant but possible case that an internal GET request can have side effect, this could be catastrophic.

For SSRF, I think the main thing we'll need to do is prevent server-controlled redirects. In other words: if the URL itself doesn't serve the <meta> tag itself, we won't allow it to redirect us anywhere else. I don't think PyPI should worry about GETs being non-idempotent -- any web service that allows that is simultaneously thoroughly out of spec and would be immediately broken by the first spider to crawl the project on PyPI anyways 🙂

We need to make sure that the URL contains a proper domain (not an IP, not localhost)

I could be convinced that we should add this restriction as a practical matter, but I'm not sure it's that important in terms of security? If the URL has an IP as its host but otherwise matches the secure origin rules (i.e. HTTPS), is there a reason we shouldn't validate it?

Oh and MITM of course. We should only try to validate HTTPS urls, validating an HTTP URL would only lead to an untrustable result.

FWIW, this one at least is covered under "must be a secure origin" in #8635 (comment).

@ewjoachim
Copy link
Contributor

ewjoachim commented Aug 26, 2024

For SSRF, I think the main thing we'll need to do is prevent server-controlled redirects. In other words: if the URL itself doesn't serve the tag itself, we won't allow it to redirect us anywhere else. I don't think PyPI should worry about GETs being non-idempotent -- any web service that allows that is simultaneously thoroughly out of spec and would be immediately broken by the first spider to crawl the project on PyPI anyways 🙂

The danger of SSRF is internal urls. The request will be made from within the PyPI infrastucture and may have access to network-protected endpoints that might not be accessible to random spiders

FWIW, this one at least is covered under "must be a secure origin" in #8635 (comment).

Ah you're right sorry

If the URL has an IP as its host but otherwise matches the secure origin rules (i.e. HTTPS), is there a reason we shouldn't validate it?

This could make sense indeed.

@ewjoachim
Copy link
Contributor

This could make sense indeed.

Hm, thinking again, if someone uses https://10.0.0.1, this means we ARE going to make the request and if it just so happens that this IP is listening on 443, the request will go through, and we will evaluate the result. It's probably not a high attack vector, but I'm not at all comfortable that the PyPI server will be able to request any internal HTTPS URL (with a GET request where the path is the only thing controlled by the attacker)

@ewjoachim
Copy link
Contributor

Oh, btw, should we make sure the port is not overridden (or force it ourselves to 443) ? I don't know if there are protocols out there where we could do nasty things just by opening a TCP connection. I hope not.

@woodruffw
Copy link
Member

The danger of SSRF is internal urls. The request will be made from within the PyPI infrastucture and may have access to network-protected endpoints that might not be accessible to random spiders

Ah, I see what you mean. Yeah, I think the expectation here would be that we deny anything that resolves to a local/private/reserved address range. IP addresses would be allowed only insofar as they represent public ranges (and serve HTTPS, per above).

Oh, btw, should we make sure the port is not overridden (or force it ourselves to 443) ? I don't know if there are protocols out there where we could do nasty things just by opening a TCP connection. I hope not.

I think this falls under the "PyPI isn't responsible if your thing breaks after you stuff a URL to it in your metadata," but this is another datapoint in favor of making our lives easy and simply not supporting anything other than HTTPS + domain names + port 443, with zero exceptions 🙂

@facutuesca
Copy link
Contributor

"Collate" verified URLs, i.e. don't re-perform URL verification if another file in the same release has already verified the URL within a particular time window (~15-60 minutes?)

What if we only perform this kind of verification once per release? As in, during the upload of the first file that creates the release.

The reason why we currently re-verify URLs for each file upload of the same release is because Trusted Publisher verification means that some file uploads might come from the Trusted Publisher URL and some not. So it makes sense to re-verify: the first file upload might not come from a relevant Trusted Publisher, but subsequent ones might.

However, this is not the case for this type of verification, since we're accessing resources independent of the upload process and authentication. So checking the URLs once during release creation might be a simple way of limiting the amount of requests we make.

@ewjoachim
Copy link
Contributor

The reason why we currently re-verify URLs for each file upload of the same release is because Trusted Publisher verification means that some file uploads might come from the Trusted Publisher URL and some not. So it makes sense to re-verify: the first file upload might not come from a relevant Trusted Publisher, but subsequent ones might.

But the pages might change. I agree that once a page has been verified, it's probably fair to trust it for some amount of time, but if someone already has a URL set up, and learns about this feature and adds their meta tag and pushes a new version, we should recheck even if we've checked before.

@facutuesca
Copy link
Contributor

but if someone already has a URL set up, and learns about this feature and adds their meta tag and pushes a new version, we should recheck even if we've checked before.

Yes, that's what I meant with my comment, we should do this kind of verification (meta tag) once per release:

So checking the URLs once during release creation might be a simple way of limiting the amount of requests we make.

Maybe the confusion is because I'm using "release" to refer to a new version of a package, so I'm saying we should recheck every time the user uploads a new version.

@ewjoachim
Copy link
Contributor

Ah no my bad, you said it right, I misunderstood. It's just that as far as I had understood, we had dismissed the idea of re-verifying a link, so it was already what I thought we were at. So I thought you suggested once per project, but it's my bad :)

@jpl-jengelke
Copy link

jpl-jengelke commented Oct 11, 2024

So how does this work for institutional projects or are there any plans to implement it?

For instance, there's a team of developers and one devops staffer is tasked with releasing. After the staffer email is verified (email-click_link) that allows PyPi to publish that username as "Verified details" under the "Maintainers" header. But what about project URLs that point to a specific Web site not associated with the repository provider? Is there any mechanism to get the actual project Web site verified? (Concrete example: Repo is at github.com; staffer is at staffer@emailer.biginstitution.com; project contact is at project@emailer.biginstitution.com (mailing-list style); project site is at project.biginstitution.com. The project site and project contact will be reported as "Unverified details".) I see that ACME-style verification is the approach in this ticket's description but I should add it is not always easy to get metadata injected into top-level institutional URLs which may be maintained by divergent teams.

Apparently to get a project-focused email address reported as verified, an account must be opened at PyPi; it must be verified (email-click_link); and then the project-focused email address must be tied to the project. If it's a legitimate email and shared amongst the team, well that feels a little security-crunchy -- I wouldn't want bunches of people with capacity to reset a password with PyPi rights but that seems the only approach right now that might work.

Anyway, what about License details? "Unverified" when it's a custom institutional license. Project keywords in Setuptools are "Unverified". Python version (e.g. Requires) is unverified.

Maybe I missed something in docs...

@jpl-jengelke
Copy link

jpl-jengelke commented Oct 11, 2024

This is a four-year old ticket, and I know there is a lot of work here to automate detection of "Verified details", which is great. However, the current solutions may not cover all scenarios, specifically as described in my previous comment. A simple option would be to provide text fields on a project basis to add verified URLs and/or email addresses. Since the account holder has to be verified, shouldn't this be an acceptable addition? Better yet, restrict additional details to the parent domain of the account holder, e.g. staffer@foo.bar.biginstitution.com can only add emails and URLs that are at biginstitution.com or its subdomains.

@woodruffw
Copy link
Member

I see that ACME-style verification is the approach in this ticket's description but I should add it is not always easy to get metadata injected into top-level institutional URLs which may be maintained by divergent teams.

Yeah, that's why we haven't gone forwards with full URL verification yet -- the design space is somewhat open, and there are tradeoffs to an ACME well-known approach, a <meta> approach, etc.

Apparently to get a project-focused email address reported as verified, an account must be opened at PyPi; it must be verified (email-click_link); and then the project-focused email address must be tied to the project. If it's a legitimate email and shared amongst the team, well that feels a little security-crunchy -- I wouldn't want bunches of people with capacity to reset a password with PyPi rights but that seems the only approach right now that might work.

Could you say more about what you're expecting here? The point of the "verified" badge next to an email in the project view is only to provide a visual indicator that the email in the project's metadata matches one on an owning account, i.e. one entitled to publish for the account. PyPI could do non-user verifications of emails, but that would have some consequences:

  1. Email verification is fraught under good conditions; doing it outside of a user flow means needing to consider things like periodic reverification since there'll be no consistent delivery signal (from project emails)
  2. We'd need to figure out how to appropriately signal to users that some verified emails are verified because they're user emails (i.e. because they're empowered to publish that project) and others are verified because they were 'just' verified.

If what you want to do is share a contact non-owner email for a project, perhaps you could link to it from an already-verified page? That would accomplish the same thing in terms of transitive verification, I think.

Anyway, what about License details? "Unverified" when it's a custom institutional license. Project keywords in Setuptools are "Unverified". Python version (e.g. Requires) is unverified.

Everything that isn't currently "verifiable" (i.e. attributable to a verifiable external resource) is currently marked as "unverified." It would perhaps make sense to have a third category here for things that are "not verified because verifying them doesn't make sense," although it's not immediately clear where we'd order that in the current metadata pane 🙂

A simple option would be to provide text fields on a project basis to add verified URLs and/or email addresses.

This sounds simple, but it's not that simple in practice 🙂 -- PyPI's project view is generally careful not to conflate "project originated" and "index originated" metadata, except where the latter can explicitly verify the former. We could add a notion of free-form index-originated project metadata, but doing so would need to still verify those things, since verification is meant to be two-way.

Better yet, restrict additional details to the parent domain of the account holder, e.g. staffer@foo.bar.biginstitution.com can only add emails and URLs that are at biginstitution.com or its subdomains.

PyPI isn't actually aware of domains or subdomains at all, really -- there's some very basic specialized stuff in place around github.com and some other well-known URLs, but PyPI doesn't have a "structured" view of a domain or its subdomains, its email inboxes, etc.

There's been some discussion on that topic recently (and how it merges with PyPI's org feature), which you might find interesting + could benefit from a potential user's perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion a product management/policy issue maintainers and users should discuss
Projects
None yet
Development

Successfully merging a pull request may close this issue.