Remove the experimental fast-deps feature#11478
Conversation
This feature did not yield significant speed-ups, possibly due to insufficient optimisation of the calls being performed.
|
I'm not sure yet if this is the route we should go down, FWIW, but I'm filing a PR to facilitate the discussion around this and simplify it to a go/no-go decision for us. :) |
|
Something to time: a lazier lazy wheel #11481 |
|
Given #10748 (comment) it may be good to keep. If we can make |
|
Agree with @casassg Warehouse doesn't yet support pep 658, and many people (myself included) use different package repositories which may not support pep 658 even if warehouse does. So I think it's at least worth profiling/looking over the existing implementation to see how it can be improved before removing it entirely. Speaking of... anyone have a good testcase? |
|
Hi 👋 xref #11512 (comment) the warehouse PR is now merged is the plan still to remove |
|
@ddelange what I meant by that was that in #11111, pip extends the code paths and classes that we initially developed for the purposes of Analogy: optimizing
|
|
Just created #12184 to move forward on all the stuff in that comment instead of taking up space here. |
|
Also take a look at #12208, which I hope makes a convincing enough case for instead turning on |
|
I'm +1 for removing |
|
Personally I feel like flag would be useful as a transition to the ”new” range fetch approach (mentioned above by @cosmicexplorer). After we merge the entire addition, we can use the usual use-feature/use-deprecated mechanism to enable it by default. |
But do we need that in a world where PyPI has backfilled PEP 658 metadata? |
|
This feature has new energy, and we don't live in a world where PyPI has backfilled PEP 658 metadata. Also, it can work in a simpler non-pypi index. |
|
Do we have any indications from PyPI of the timescales for backfilling? Also, as @dholth mentions, fast-deps would be useful for indexes that don't support PEP 658. The pytorch binaries, for example, are served from their own index - does that support PEP 658? I just checked https://download.pytorch.org/whl/cu117, and it doesn't seem to... |
|
We don't have a set schedule for backfilling, but we were talking about it last Friday and I believe we hope to do it soon. I think we may be waiting until a new version of packaging is released and we can pull in the packaging metadata API to validate the metadata at the same time (obviously invalid metadata we can't do anything about, but we can at least record which metadata is valid or invalid). |
That came out last weekend, so this should be unblocked from that perspective I think. :) |
|
Given we have backfilling, and we have a released I did a quick side-by-side comparison of how resolve works out in I'd be happy to pick it up if there is consensus, and nobody else has the bandwidth to do so. |
|
I personally implemented the PEP 658 support in pip #11111 because as you say it is indeed superior, if we can expect it. I have had this PR open for multiple years #12208 conclusively describing the problem space and solution, with initial investigation from @dholth which found pypi had suddenly stopped supporting negative http range requests. @radoering provided immensely helpful performance analysis from his adaptation of the technique to poetry, after I gave a lengthy conference talk on the subject which I believe you should find invigorating: https://web.archive.org/web/20250425181258/https://cfp.packaging-con.org/2023/talk/hpuhu7/ PEP 777 correctly describes the situation: https://peps.python.org/pep-0777/#backwards-compatibility
|
|
I have described repeatedly and at great length the rationale to accept the corrected version of fast-deps: #12208
Furthermore, upon request I have described at great length how the metadata resolution process can be abstracted and extended: #12921 The result makes pip take 1-2 seconds for most resolves and supports very efficient parallel downloads as well. There is no interest in accepting those PRs I spent several years repeatedly polishing to professional standards suitable for pip. I would be very interested in following up with anyone who can help me get these changes merged. I am also available to call/chat/pair on this topic. |
|
As I described in the conference talk, these years of work I did were intended to help the whole python community by making everything instant. I got exactly one statement from pypi about whether reducing bandwidth costs would be helpful to them: #12257 (comment) I know I personally benefit from using my pip fork because I have a metered internet connection with a monthly limit. I have provided very clear and direct benchmarks for speed and bandwidth for every PR in the metadata workstream. I have also now further indulged in a much greater rewrite of pip's
|
|
Given that I have been unable to get any statement from pypi for years and that PEP 658 metadata took so many years to backfill again without any feedback, I don't know who I could contact to discuss e.g. a PEP to augment the Simple Repository API with the ability to filter results by upload time. There are further approaches we could consider too, in particular filtering by complex predicates like python version, etc. But that would require pypi and other repos to support python-specific logic, which would be a step change from mostly serving cache entries. Filtering by upload time alone should be unambiguous and easy enough to codify in any server backend. Since you mentioned PEP 658 support, and seemed particularly interested in removing functionality from pip, I wanted to emphasize that while it indeed is strange that pip has kept the fast-deps flag unchanged for so long, that is not because there has been no work on the feature. Instead, the feature has been developed into a metadata subsystem by a professional packaging engineer over several years, and neither pip nor pypi are interested. |
|
Furthermore, the metadata API from PEP 658 should also generate JSON METADATA instead of the unspecified email format which is vulnerable to text injection attacks (this can for example introduce hidden dependencies to download and install). This format was codified in PEP 566 and is available through I have not yet named my project forking pip yet. I began with string operations (url quoting, xml/html encoding, json parsing) in a rust module. I am currently implementing zstd in rust, because as PEP 777 describes, there are immense opportunities available for compression we haven't yet standardized: https://codeberg.org/cosmicexplorer/corporeal I have been drafting a tentative specification document for a new wheel format based on the above approach, which produces very high compression ratios on practical inputs in testing. This would of course reduce the burden on pypi and on user bandwidth. I would love to not be alone in this. Please let me know if you're interested. |
|
@cosmicexplorer I appreciate you have a lot of context that you want to use to back your statements, but please try and keep PR comments short and to the point. Long posts, and along series of posts, are difficult to tackle because it takes some super-linear time to review, understand, and respond to posts based on the length (or at least number of different points being brought up). Especially in PRs, where comments and reviews are ideally about the implementation, no the validity of the idea (which we should have an issue for). I'm going to be honest, I haven't spent any time yet looking at what the core idea of I will put it on my 2026 to do list to start to understand the core concept of |
This feature did not yield significant speed-ups, possibly due to insufficient optimisation of the calls being performed.
Adding a cross reference to #7049 (comment), which links to other issues that are relevant for making a decision on what to do with this feature.
FWIW, we did reuse nearly all of the code for PEP 658 though, so the effort put toward this feature was certainly not in vain. It might just be that our implementation of
lazy_wheelcould've been optimised further, but given that PEP 658 is on the horizon, I reckon it's OK to trim this functionality since we're going to get the similar semantics with thedist-info-metadata.