Read the official announcement on the PyPI blog as well!
For the past year, we’ve worked with the Python Package Index (PyPI) on a new security feature for the Python ecosystem: index-hosted digital attestations, as specified in PEP 740.
These attestations improve on traditional PGP signatures (which have been disabled on PyPI) by providing key usability, index verifiability, cryptographic strength, and provenance properties that bring us one step closer to holistic, cryptographically verifiable provenance for our software supply chains.
The good news: if you already publish packages to PyPI using Trusted Publishing, you likely won’t have to change a single thing: the official PyPI publishing workflow has attestation support built right in, enabled by default as of v1.11.0 and newer. In other words, so long as you already use (or upgrade to) pypa/[email protected]
or newer and with a Trusted Publisher, your packages will get build provenance by default!
Enablement by default was a key design constraint of ours: we wanted an attestation feature that could integrate with existing publishing identities, sidestepping the challenges of key and identity management that recur in traditional digital signature designs. Sigstore afforded itself as the solution to these challenges: its support for identity-based keyless signing provides the publicly verifiable link between PyPI’s support for Trusted Publishing and package provenance.
Check out the official PyPI documentation for practical information about how to create and use index-hosted attestations, and read on here for our technical summary of how these attestations work and where we see them going in the future!
Background: Trusted Publishing
Last year, we worked with PyPI to design and implement Trusted Publishing, a new, more convenient, and more secure way to upload packages to PyPI. Thanks to its usability wins, we’ve seen Trusted Publishing become a huge success over the intervening 18 months: over 19,000 individual projects have registered a Trusted Publisher, and those projects have collectively published almost half a million files to PyPI using Trusted Publishing:
We have an entire separate blog post on Trusted Publishing and PyPI, but to briefly summarize:
- Trusted Publishing removes the need for a manually configured and scoped API token.
- Projects declare approved Trusted Publisher (GitHub, GitLab, Google Cloud Build, ActiveState, etc.) identities that can upload new releases.
- To ensure the authenticity of requests from those identities (i.e., the CI/CD workflows purporting to be them), Trusted Publishing uses public key cryptography via OpenID Connect (OIDC).
- The OIDC flow allows the Trusted Publisher to automatically obtain a PyPI API token without user intervention, reducing the opportunity for user errors like credential leaks and accidental over-scoping.
- The resulting tokens issued via this OIDC flow are short-lived and minimally-scoped, reducing an attacker’s ability to hoard them for future use or pivot between different projects with a single credential.
Trusted Publishing’s success on PyPI has garnered interest from other ecosystems as well: RubyGems implemented it just a few months later, and Rust’s crates.io has an open RFC for it!
From Trusted Publishing to Sigstore
Trusted Publishing connects PyPI-hosted projects to cryptographically verifiable machine identities (such as release.yml @ github.com/example/example
) that handle publishing.
This is fantastic for eliminating manual API token flows, but it also gives us something much more fundamental: provenance!
In particular, in the context of a GitHub (or GitLab, etc.) packaging workflow, the machine identity found in an OIDC credential gives us something resembling “publish provenance”: a set of claims about repository and workflow state corresponding to the time at which a package was published to PyPI.
However, in the form of an OIDC credential, this provenance isn’t immediately valuable to external users:
- PyPI can’t share the credential itself, since it’s fundamentally secret material. Even with appropriate controls (expiry and a fixed audience), there’s simply too much risk of PII disclosure and misbehaving JWT verifiers to risk disclosure for external (meaning non-PyPI) verification.
- PyPI could disclose the claims within the credential, such as by publishing metadata to the effect of “project
sampleproject
was published by a GitHub workflowpypi-publish.yml
that ran frompypa/sampleproject
.” This would result in a model where downstream users are forced to trust that PyPI honestly serves those claims.
This is where Sigstore comes in. We have another entire separate blog post on Sigstore and how it works, but the key part for our purpose is that Sigstore binds short-lived signing keys to machine identities via a free, publicly accessible, auditable certificate authority (Fulcio).
Fulcio accepts machine identities in the form of OIDC credentials, meaning that PyPI’s Trusted Publishing flow is implicitly compatible with Sigstore signing: all that the Trusted Publisher needs to do is submit a Certificate Signing Request to Fulcio with the OIDC credential and receive a signing certificate for subsequent use.
Fulcio will embed the appropriate claims from the OIDC credential into the public certificate, giving us a publicly verifiable source of provenance that doesn’t require disclosing the credential itself or unilaterally trusting PyPI to serve it correctly!
The steps involved in this can be a little hard to follow, so let’s visualize them. Here’s the “traditional” Trusted Publishing flow, before any involvement from Sigstore:
And then, with Sigstore in the loop:
Observe that, while there’s one more entity in the flow (Sigstore), nothing changes from the user’s perspective: all that’s needed from them is their one-time Trusted Publisher configuration, which comes from the original flow.
From Sigstore to attestations and provenance
Sigstore narrows the gap between Trusted Publishing and provenance by giving us a public, verifiable credential (in the form of an X.509 certificate) that binds an ephemeral key pair to a machine identity (such as a GitHub repository and workflow that publishes to PyPI).
However, there’s still one step left: the certificate issued by Sigstore is bound to the Trusted Publishing identity, but it doesn’t itself sign for the thing being published (i.e., the actual Python package distribution).
To cover the latter, we need to use our ephemeral key pair to sign over an attestation for our package distribution, cryptographically binding the distribution’s own identity (its name and digest) to its provenance (the GitHub repository or other source that actually produced it).
This is where PEP 740 comes in. PEP 740 weds Sigstore and Trusted Publishing to the actual package distribution through a fixed attestation payload, itself defined within the confines of the in-toto Attestation Framework.
Here’s an example of an actual attestation, as generated for sigstore
v3.5.1:
These attestations then get signed by the private half of the ephemeral key pair, itself bound to the X.509 certificate, completing the full binding of distribution identity (filename and digest) to provenance (OIDC claims baked into the X.509 certificate) in a manner verifiable by PyPI itself (since the OIDC claims correspond to the Trusted Publisher identity registered by the user).
Of course, it isn’t enough to just generate attestations—these attestations also need to be stored so that users can verify them on their own! PEP 740 also defines this: distributions that are uploaded with attestations are given a provenance
key in the JSON simple API and a corresponding data-provenance
attribute in the PEP 503 index.
These fields contain URLs that point to a “provenance” object, which is a rollup of one or more attestation objects for each distribution, along with the Trusted Publisher identity that PyPI used to verify those attestations. We can poke through the guts of these to get back to our original payload, from above:
Where does this leave us?
As of October 29, attestations are the default for anyone using Trusted Publishing via the PyPA publishing action for GitHub. That means roughly 20,000 packages can now attest to their provenance by default, with no changes needed. We expect that number to go up over time as well, as more projects (especially newer ones) default to Trusted Publishing as both the user-friendly and more secure alternative to manually configured API tokens.
The total number of packages producing attestations is just one perspective, however, and arguably an incomplete one: the value of a package’s attestations is correlated closely to that package’s “importance”—that is, the number of users or downstreams that depend on it. PyPI doesn’t know a project’s dependencies, but total download counts are a strong proxy for a project’s relative importance in the ecosystem.
To gain insight into the latter, we’ve built Are We PEP 740 Yet?, which tracks the adoption of PEP 740 attestations by the 360 most-downloaded packages on PyPI:
So far, 5% of the 360 most-downloaded packages have attestations uploaded. But there’s a confounding factor: around two-thirds of the most-downloaded packages haven’t been updated at all since attestation enablement, meaning that we don’t yet know how many will have attestations, once they make a new release!
Where do we go from here?
One thing is notably missing from all of this work: downstream verification.
As specified, PEP 740 concerns only the index itself: it tells PyPI how to receive and verify attestations for its own purposes as well as how to redistribute them on the public index endpoints, but it doesn’t mandate (or even define) a verification flow for installing clients (like pip
and uv
).
In practice, this means that the short-term impact of index-hosted attestations is limited: they introduce transparency to the Trusted Publisher identities used in PyPI, but downstream clients still need to trust PyPI itself to serve attestations honestly.
This isn’t an acceptable end state (cryptographic attestations have defensive properties only insofar as they’re actually verified), so we’re looking into ways to bring verification to individual installing clients. In particular, we’re currently working on a plugin architecture for pip
that will enable users to load verification logic directly into their pip install
flows.
Longer term, we can do even better: doing “one off” verifications means that the client has no recollection of which identities should be trusted for which distributions. To address this, installation tools need a notion of “trust on first use” for signing identities, meaning that subsequent installations can be halted and inspected by a user if the attesting identity changes (or the package becomes unattested between versions).
If that sounds like a lockfile problem to you, it’s because it is! We’re following PEP 751 closely, since it defines the metadata format that we’ll need to store expected distribution identities within. Once the Python ecosystem begins adopting standardized lockfiles, we’ll be able to use them to store and verify identities much like how hashes are used to verify distribution integrity today.
All in all, we have a bit to go before the common default installation flows are verifying attestations under the hood. But, unlike with earlier attempts at index-hosted signatures, we have a good idea of how to get there. In the meantime, however, there are demographics that can take early advantage of PyPI’s newly hosted attestations:
- Researchers: PEP 740 attestations are built on top of Sigstore, and provide a key verifiable missing link between source repositories and packages (as they appear on PyPI). This makes them a great source of data for security and supply chain research!
- Incident responders: When available, attestations drastically shorten and simplify some of the most annoying and error-prone parts of incident investigation: tracking a particular artifact back to its source, figuring out exactly when and how it was produced, and so forth.
- Users with full control over their build systems: If you maintain an open source or professional project that fully controls its Python package dependencies (i.e., doesn’t use
pip
or another tool for resolution and installation), then you can probably work attestation verification directly into your build process! Check out ourpypi_attestations
documentation for a starting point.