Quarantining would prevent anyone from building / installing new copies of the compromised software, so this utility would only help people who were a) monitoring the project, and b) had a local version installed pre-quarantine. That's a pretty narrow scope of users, so now that I type all this out, I'm realizing that the juice is likely not worth the squeeze.
> Your comment also has me dreaming about a Dependabot-esque utility that opens Github issues on repositories that have quarantined projects in their requirements.txt.
It's not a bad idea, let Github know! Their security team is very good from my interactions with them.
Lowkey surprised that any well-resourced org would use it given the outsized risk profile and poor performance.
I’m curious, as I also deal with this tension. What (human and automated) processes do you have for the following scenarios?
1. Application developer wants to test (locally or in a development environment) and then use a net new third party package in their application at runtime.
2. Application developer wants to bump the version used of an existing application dependency.
3. Application developer wants to experiment with a large list of several third party dependencies in their application CI system (e.g. build tools) or a pre-production environment. The experimentation may or may not yield a smaller set of packages that they want to permanently incorporate into the application or CI system.
How, if at all, do you go about giving developers access via jfrog to the packages they need for those scenarios? Is it as simple as “you can pull anything you want, so long as X-ray scans it”, or is there some other process needed to get a package mirrored for developer use?
(This is a larger issue - or feature, depending on your perspective - with Python packaging. But it's important to understand that PyPI itself can't force `pip` or any other client to pick any particular resolution order between indices.)
But again: this has nothing to do with blessings or not. The fact that pip is the official installer and PyPI is the official index does not mean that everything about them stems from an official edict. That's not how Python's community is structured, and it's certainly not how the technical development on anything in Python packaging has ever progressed.
so it's PSF's decision to document using pip, right? And they might decide to change that page without asking for anyone's consent right?
So they also don't get to complain about pip's shortcomings because it is their decision to point users towards pip. Correct?
Have I been explicit enough for you now?
This doesn't somehow imply that Python has a top-down authority structure where the PSF dictates the development flow of PyPI, pip, or any other official/semi-official/blessed tooling or infrastructure. That's not how Python works as a community.
The PSF is also not complaining here about pip's shortcomings. I'm the only one here and I don't represent the PSF, nor am I complaining: I think pip is great. I'm trying to explain (apparently unsuccessfully) why pip (and PyPI's) behavior isn't always 100% congruous with a single stream of development thinking. As mentioned above there are more optimal structures given different kinds of governance and community, but this one aligns with the (IMO good) values that the Python community espouses.
But this final claim is essentially right: PSF can suggest things, and the community will (to some extent) accept those suggestions as blessed. But this doesn't mean that the PSF can dictate what would essentially be a significant breaking change to pip's behavior.
The PyPI and Pip developers of course.
(I think namespacing is a good idea regardless, if only because it eliminates artificial scarcity in a one-level namespace.)
If you disagree I would love to hear of a concrete way that solution would be vulnerable.
That’s the operative part of “authoritative.” It’s a distributed trust problem, and there’s no particular guarantee that your namespace on one index will be honored by another. Namespacing is great for eliminating scarcity on one index at a time; I don’t think it helps much with this kind of cross-index security.
Domains provide a globally unique namespace and ownership can be verified automatically with domain validation. Bluesky did an ok job of it, but they didn't do anything to account for domain ownership changes and re-validation is non-existent, which is disappointing to see from the first big adopter since the oversight will eventually invite criticism.
I've wanted domain validated namespaces for 5+ years. Here's a comment I made about using domain validated namespaces in package managers a couple of years ago [1]:
---
I think one possible solution to that would be to assume namespaces can have their ownership changed and build something that works with that assumption.
Think along the lines of having 'pypi.org/example.com' be a redirect to an immutable organization; 'pypi.org/abcd1234'. If a new domain owner wants to take over the namespace they won't have access to the existing account and re-validating to take ownership would force them to use a different immutable organization; 'pypi.org/ef567890'.
If you have a package locking system (like NPM), it would lock to the immutable organization and any updates that resolve to a new organization could throw a warning and require explicit approval. If you think of it like an organization lock:
v1:
pypi.org/example.com --> pypi.org/abcd1234
v2:
pypi.org/example.com --> pypi.org/ef123456
If you go from v1 to v2 you know there was an ownership change or, at the very least, an event that you need to investigate.Losing control of a domain would be recoverable because existing artifacts wouldn't be impacted and you could use the immutable organization to publish the change since that's technically the source of truth for the artifacts. Put another way, the immutable organization has a pointer back the current domain validated namespace:
v1:
pypi.org/abcd1234 --> example.com
v2:
pypi.org/abcd1234 --> example.net
If you go from v1 to v2 you know the owner of the artifacts you want has moved from the domain example.com to example.net. The package manager could give a warning about this and let an artifact consumer approve it, but it's less risky than the change above because the owner of 'abcd1234' hasn't changed and you're already trusting them.---
But there is a better solution coming, PEP 708 was developed for this and is in prototype on pypi.org, so it's an overstatement to say "don't even have a way to avoid dependency confusion attacks ".
It is, however, a non-trivial problem, and more solutions will likely come over the years, many Python packaging tools like uv and poetry (and likely others) have way to name indexes and pin specific packages to indexes, which appears to be a promising UX.
This way, pip will fail if a dependency does not provide a `.whl` package, instead of automatically falling back to the "build from source" mode that can lead to arbitrary code execution at install time (via setuptools' `setup.py` or any other build backend mechanism).
However, installing from wheels just protects from arbitrary code execution at install time. If you do not trust the source and integrity of the package you install, you would still be subject to arbitrary code execution at import time.
Therefore, tools and processes to improve package provenance tracing and integrity checking are useful for both kinds of installations.
Again, is not that because others are worse, is ok, but I would cut a little slack. Specially for the fact that having all packages somehow signed/audited would be a titanic task. And I guess I’m not willing to pay for it.
But then success attracts trust abusers and forces raising the fences (which comes with higher costs, both direct and indirect).
Direct costs in the people and infrastructure that must be dedicated to the task. Indirect costs in the frictions generated by complicating workflows.
It all points to the need for open source ecosystems to be taken more seriously by the economically able users who most benefit from this amazing development.
In their FAQ[1], they mention that they have plans to expand to PyPI as well.
Interesting, I didn’t know that. While I haven’t released anything obfuscated on PyPI, I’ve certainly written Python projects that include obfuscated code by necessity, namely scrapers packing duktape (embedded JS interpreter) and third party obfuscated JS blobs to generate signatures and stuff. I know for a fact there are projects like that on PyPI. I wonder if those are allowed.
(Come to think of it, those probably can be DMCAed if the targeted service provider is sufficiently motivated.)