PyPI Blog: Project Quarantine

92 points by miketheman 8 days ago | 60 comments

toomuchtodo 8 days ago |
Awesome work, kudos to the PyPI team. Will it be possible to receive notifications of projects quarantine as a member of the public?
HanClinto 7 days ago |
Your comment also has me dreaming about a Dependabot-esque utility that opens Github issues on repositories that have quarantined projects in their requirements.txt.
Quarantining would prevent anyone from building / installing new copies of the compromised software, so this utility would only help people who were a) monitoring the project, and b) had a local version installed pre-quarantine. That's a pretty narrow scope of users, so now that I type all this out, I'm realizing that the juice is likely not worth the squeeze.
toomuchtodo 7 days ago |
One of my responsibilities is software supply chain security in a financial services org, so this signal would be valuable for vulnerability management of dependencies. I wouldn't call it "threat hunting" per se, but ground truth around threat actor patterns helps us build better defensive systems in this regard. Keeping the bad bits out is way easier than remediating once they've been ingested into systems.
> Your comment also has me dreaming about a Dependabot-esque utility that opens Github issues on repositories that have quarantined projects in their requirements.txt.
It's not a bad idea, let Github know! Their security team is very good from my interactions with them.
intelVISA 5 days ago |
That sounds quite daunting, Python and supply chain security are almost at odds with each other these days.
Lowkey surprised that any well-resourced org would use it given the outsized risk profile and poor performance.
toomuchtodo 4 days ago |
It’s not used in the core or for anything load bearing, but has some ancillary uses, and we strive for total coverage (as much as practical). If we use something, we want to secure it as best we can.
alsodumb 5 days ago |
Given how widespread PyPI usage is, I'm surprised they only have one full time security staff. I mean I guess it makes sense, usage doesn't always mean they get more donations/money, but damn.
spencerchubb 5 days ago |
companies that actually care about security have a more secure solution and don't allow devs to use pypi
cjalmeida 5 days ago |
You’d be surprised by the amount of companies handling critical infrastructure that are OK with using PyPI directly
LtWorf 5 days ago |
He said companies that care, not companies that should care but do not.
f1shy 5 days ago |
That is somewhat terrifying
spencerchubb 5 days ago |
really depends on the company. my company cares a lot about security because it's a huge fortune 50 company with sensitive data and a lot of reputation could be lost with a security scandal
davidshepherd7 5 days ago |
Could you give some examples of more secure solutions?
spencerchubb 5 days ago |
jfrog is the one my company uses
zbentley 5 days ago |
How do you decide what externally available packages to store/cache in artifactory?
I’m curious, as I also deal with this tension. What (human and automated) processes do you have for the following scenarios?
1. Application developer wants to test (locally or in a development environment) and then use a net new third party package in their application at runtime.
2. Application developer wants to bump the version used of an existing application dependency.
3. Application developer wants to experiment with a large list of several third party dependencies in their application CI system (e.g. build tools) or a pre-production environment. The experimentation may or may not yield a smaller set of packages that they want to permanently incorporate into the application or CI system.
How, if at all, do you go about giving developers access via jfrog to the packages they need for those scenarios? Is it as simple as “you can pull anything you want, so long as X-ray scans it”, or is there some other process needed to get a package mirrored for developer use?
CableNinja 5 days ago |
Where i am, every package repo - docker, pypi, rpm, deb, npm, and more - all go through artifactory and are scanned. Packages are autopulled into artifactory when a user requests the package and scanned by xray. Artifactory has a remote pull through process that downloads once from the remote, and then never again unless you nuke the content. Vulnerable packages must have exceptions made in order to get used. Sadly, we put the burden of allowances on the person requesting the package, but it at least makes them stop and think before they approve it. Granting access to new external repos is easy, and we make requesting them painfree, just making sure that we enable xray. Artifactory also supports local repos so users can upload their packages and pull them down later.
f1shy 5 days ago |
For example we have it behind a kind of transparent proxy, where you get only packages which were tested and scan by a team of experts.
IshKebab 5 days ago |
The still don't even have a way to avoid dependency confusion attacks when using private package repos (other than also registering every single private package name you use on pypi.org). Blows my mind.
woodruffw 5 days ago |
Who is "they"? PyPI is an index; it doesn't control your installing client.
(This is a larger issue - or feature, depending on your perspective - with Python packaging. But it's important to understand that PyPI itself can't force `pip` or any other client to pick any particular resolution order between indices.)
LtWorf 5 days ago |
For all intents and purposes "pip" is the official client. It is referenced in the official documentation https://docs.python.org/3/installing/index.html
woodruffw 5 days ago |
The fact that pip is the official client isn’t in dispute. The point was that pip and PyPI are different entities, per a larger pattern of devolved ownership/control/standards-over-tools in Python packaging. PyPI has little to no say over how pip and other tools choose to handle resolutions across multiple indices.
LtWorf 5 days ago |
The PSF has a saying in which is the default installer and how pypi is run.
pyuser583 5 days ago |
PSF has little control over anything. The Python ecosystem is consensus-based.
woodruffw 5 days ago |
They have a say insofar as they can participate in the same standards process as everyone else. But no, the PSF has no unique say in how PyPI is run, or how pip behaves. This is a pretty fundamental aspect of how Python-qua-ecosystem works.
LtWorf 5 days ago |
They have a say in that if it doesn't behave like they want they can point the documentation to something that does. If pip is the tool linked in the documentation it's the official one that has the PSF's blessing, clearly.
woodruffw 5 days ago |
We're going in circles. PSF can't unilaterally change any documentation of particular relevance here; the most immediately relevant docs would be controlled by PyPA and PyPI itself. The former has a standards/community review process, and the latter is particular to PyPI.
But again: this has nothing to do with blessings or not. The fact that pip is the official installer and PyPI is the official index does not mean that everything about them stems from an official edict. That's not how Python's community is structured, and it's certainly not how the technical development on anything in Python packaging has ever progressed.
LtWorf 4 days ago |
Who controls the python.org domain that I linked? Am I wrong to think that PSF controls it?
woodruffw 4 days ago |
PSF controls that domain. But that domain doesn’t host PyPI (anymore) or the PyPA docs, so I’m not sure what connection you’re making there.
LtWorf 4 days ago |
https://docs.python.org/3/installing/index.html
so it's PSF's decision to document using pip, right? And they might decide to change that page without asking for anyone's consent right?
So they also don't get to complain about pip's shortcomings because it is their decision to point users towards pip. Correct?
Have I been explicit enough for you now?
woodruffw 4 days ago |
You've pointed to a stub document that basically explains Python packaging and why pip comes with Python (which, notably, distributions are fond of breaking).
This doesn't somehow imply that Python has a top-down authority structure where the PSF dictates the development flow of PyPI, pip, or any other official/semi-official/blessed tooling or infrastructure. That's not how Python works as a community.
The PSF is also not complaining here about pip's shortcomings. I'm the only one here and I don't represent the PSF, nor am I complaining: I think pip is great. I'm trying to explain (apparently unsuccessfully) why pip (and PyPI's) behavior isn't always 100% congruous with a single stream of development thinking. As mentioned above there are more optimal structures given different kinds of governance and community, but this one aligns with the (IMO good) values that the Python community espouses.
LtWorf 4 days ago |
I think the PSF has the means and opportunity to say "please don't use xxx, use yyy instead". Which is why conda and linux distributions are not seen on the same footing at all.
woodruffw 4 days ago |
I think it's more because Conda has explicitly positioned itself for a specific domain (scientific Python) and because Linux distributions aren't operating at the same level of specificity. The success of uv has demonstrated that Python packaging tools don't need PSF or PyPA affiliation to be extremely popular; they just have to be good.
But this final claim is essentially right: PSF can suggest things, and the community will (to some extent) accept those suggestions as blessed. But this doesn't mean that the PSF can dictate what would essentially be a significant breaking change to pip's behavior.
IshKebab 5 days ago |
> Who is "they"?
The PyPI and Pip developers of course.
woodruffw 5 days ago |
Those are largely disjoint sets, and the post in question is about PyPI.
IshKebab 5 days ago |
So? The issue requires coordination between Pip and PyPI. I don't see what point you're trying to make.
woodruffw 5 days ago |
The issue does not require coordination; that's the point. It's a behavioral aspect of `pip` that's completely opaque to PyPI, because all PyPI does is serve index responses to installers. It doesn't know how many indices the installer contacts, or the order in which it contacts them (and it has no good reason to know those things, ever).
IshKebab 5 days ago |
The simplest way to fix this problem is to support namespaces, which PyPI absolutely does need to be aware of.
woodruffw 4 days ago |
This would not be meaningfully addressed by namespaces, since there's no authoritative, authenticated unique name system across indices. Two separate indices can (and will, based on what ecosystems like piwheels do[1]) advertise `foo/*`-namespaced packages, leaving installers where they are today.
(I think namespacing is a good idea regardless, if only because it eliminates artificial scarcity in a one-level namespace.)
[1]: https://www.piwheels.org/
IshKebab 4 days ago |
This absolutely would be meaningfully addressed by namespaces because the typical use case is PyPI + a private repo you control. Register the namespace in both repos and you're done.
If you disagree I would love to hear of a concrete way that solution would be vulnerable.
woodruffw 4 days ago |
> Register the namespace in both repos and you're done.
That’s the operative part of “authoritative.” It’s a distributed trust problem, and there’s no particular guarantee that your namespace on one index will be honored by another. Namespacing is great for eliminating scarcity on one index at a time; I don’t think it helps much with this kind of cross-index security.
IshKebab 4 days ago |
Right, so it would provide a solution to the specific problem I'm taking about.
ryan29 4 days ago |
> since there's no authoritative, authenticated unique name system across indices
Domains provide a globally unique namespace and ownership can be verified automatically with domain validation. Bluesky did an ok job of it, but they didn't do anything to account for domain ownership changes and re-validation is non-existent, which is disappointing to see from the first big adopter since the oversight will eventually invite criticism.
I've wanted domain validated namespaces for 5+ years. Here's a comment I made about using domain validated namespaces in package managers a couple of years ago [1]:
---
I think one possible solution to that would be to assume namespaces can have their ownership changed and build something that works with that assumption.
Think along the lines of having 'pypi.org/example.com' be a redirect to an immutable organization; 'pypi.org/abcd1234'. If a new domain owner wants to take over the namespace they won't have access to the existing account and re-validating to take ownership would force them to use a different immutable organization; 'pypi.org/ef567890'.
If you have a package locking system (like NPM), it would lock to the immutable organization and any updates that resolve to a new organization could throw a warning and require explicit approval. If you think of it like an organization lock:
v1: pypi.org/example.com --> pypi.org/abcd1234 v2: pypi.org/example.com --> pypi.org/ef123456
If you go from v1 to v2 you know there was an ownership change or, at the very least, an event that you need to investigate.
Losing control of a domain would be recoverable because existing artifacts wouldn't be impacted and you could use the immutable organization to publish the change since that's technically the source of truth for the artifacts. Put another way, the immutable organization has a pointer back the current domain validated namespace:
v1: pypi.org/abcd1234 --> example.com v2: pypi.org/abcd1234 --> example.net
If you go from v1 to v2 you know the owner of the artifacts you want has moved from the domain example.com to example.net. The package manager could give a warning about this and let an artifact consumer approve it, but it's less risky than the change above because the owner of 'abcd1234' hasn't changed and you're already trusting them.
---
1. https://news.ycombinator.com/item?id=32754029
oblvious-earth 4 days ago |
If you're concerned about dependency confusion attacks you should host your own index and vet what goes on to it.
But there is a better solution coming, PEP 708 was developed for this and is in prototype on pypi.org, so it's an overstatement to say "don't even have a way to avoid dependency confusion attacks ".
It is, however, a non-trivial problem, and more solutions will likely come over the years, many Python packaging tools like uv and poetry (and likely others) have way to name indexes and pin specific packages to indexes, which appears to be a promising UX.
xgstation 5 days ago |
the fact that `pip install` just runs whatever is in `setup.py` is still mind baffling, even if the author weren't mallicious the `setup.py` can still do harm (say delete a file by mistake), there really needs to be an official way of sandbox its running.
woodruffw 5 days ago |
It's not good, but it should also not be baffling: it's the exact same thing other ecosystems do (npm with install hooks/scripts, Rust with build.rs, Ruby with gemspecs, etc).
xgstation 5 days ago |
I know other ecosystems do the same and those are baffling too, especially for the newer created languages like rust, which is why https://internals.rust-lang.org/t/pre-rfc-sandboxed-determin... exists
woodruffw 5 days ago |
Sandboxing is a great idea. But the fact that this is a near-universal feature of language packaging reveals a preference that's going to be hard to counter: users do want effectively-arbitrary system access at build time, because that's the paradigm that's supported by the million-and-one different ways in which a build environment can be valid.
f1shy 5 days ago |
Notably also common lisp (quicklisp)
ogrisel 5 days ago |
Note that it's possible to disable that behavior with `pip install --only-binary :all:`.
This way, pip will fail if a dependency does not provide a `.whl` package, instead of automatically falling back to the "build from source" mode that can lead to arbitrary code execution at install time (via setuptools' `setup.py` or any other build backend mechanism).
However, installing from wheels just protects from arbitrary code execution at install time. If you do not trust the source and integrity of the package you install, you would still be subject to arbitrary code execution at import time.
Therefore, tools and processes to improve package provenance tracing and integrity checking are useful for both kinds of installations.
xgstation 5 days ago |
I think sometimes the problem is coming from accidental typos instead of not trusting, say if one accidentally typed `pip install requests` into `pip install requestss` and if `requestss` is malacious then by the time one noticed the typo the setup.py could have already run to do the harm
pjc50 5 days ago |
I don't think that makes much of a difference from the risk of bugs in the rest of the package when it's run.
f1shy 5 days ago |
I see some comments about the lack of security of Pypi. And they are totally right, I’m also concerned. But to be fair, many other languages don’t fare better in that arena. I don’t want to give examples, but everyone knows horror histories with other languages.
Again, is not that because others are worse, is ok, but I would cut a little slack. Specially for the fact that having all packages somehow signed/audited would be a titanic task. And I guess I’m not willing to pay for it.
nathanmills 5 days ago |
Quarantining projects is just a band-aid. If you’re worried about malware, maybe stop letting random people upload code to the official package index. Or just write better docs so people stop using random packages in the first place.
openrisk 5 days ago |
Its always an interesting dynamic: assuming a high trust society pays dividends - Python would be nowhere close the success it has been without PyPI.
But then success attracts trust abusers and forces raising the fences (which comes with higher costs, both direct and indirect).
Direct costs in the people and infrastructure that must be dedicated to the task. Indirect costs in the frictions generated by complicating workflows.
It all points to the need for open source ecosystems to be taken more seriously by the economically able users who most benefit from this amazing development.
LtWorf 5 days ago |
They won't pay anything unless they are forced to do so. Basic capitalism brings to externalise costs to society
NeutralCrane 5 days ago |
Perhaps, but can you explain how an alternative to capitalism wouldn’t result in people no paying for a service they don’t have to pay for?
LtWorf 5 days ago |
In an alternative system you can get a salary from the government to work on open source software, and the companies pay for that in taxes. Of course you must embargo Malta, Netherlands and all the other countries that thrive on grabbing taxes from other countries.
yencabulator 5 days ago |
People in more communally oriented societies pay for things they "don't have to" pay for because there's a social obligation.
me_vinayakakv 5 days ago |
https://socket.dev/ does a good job in detecting malicious packages in npm.
In their FAQ[1], they mention that they have plans to expand to PyPI as well.
[1]: https://docs.socket.dev/docs/faq
oefrha 5 days ago |
> The one project cleared was a project containing obfuscated code, in violation of the PyPI Acceptable Use Policy.
Interesting, I didn’t know that. While I haven’t released anything obfuscated on PyPI, I’ve certainly written Python projects that include obfuscated code by necessity, namely scrapers packing duktape (embedded JS interpreter) and third party obfuscated JS blobs to generate signatures and stuff. I know for a fact there are projects like that on PyPI. I wonder if those are allowed.
(Come to think of it, those probably can be DMCAed if the targeted service provider is sufficiently motivated.)
IshKebab 5 days ago |
They also allow binary packages if you want an easy way of hiding malware.