-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer closest conflicting causes when backtracking #12459
Conversation
a61383a
to
3ed5bb2
Compare
Seems new version of setuptools is causing a lot of test failures at the moment, I'll rebase once #12458 hopefully lands. |
3ed5bb2
to
0d9f8eb
Compare
Another thing to mention, with this optimization it is possible to remove backjumping and still stay performant on complex requirements, see sarugaku/resolvelib#142. |
0d9f8eb
to
8888ce9
Compare
I've broken up the previous function I beleive individually they are easier to read, their intent more obvious, will be easier to add tests, and actually improves the performance because of the way |
cfb8d33
to
97d7a36
Compare
It seems I got confused about how packaging handles the intersection of SpecifierSets, I thought it in some sense it "resolved" them and I could use that to work out if two SpecifierSets were mutually exclusive. This was incorrect. I've created a fairly simple I also need to retest all my performance results. |
97d7a36
to
9b19347
Compare
I've been working on, and exhaustively testing, a function that can determine if two specifiers are mutually exclusive. I think this would depend on upgrading the packaging library, because I don't want to think about legacy versions, also I think I may have found a bug in the packaging library (pypa/packaging#766). It occurs to me, that if possible, a much better approach to this would be to check if all the specifiers produce 0 possible packages from all possible packages, but I don't know if this is possible at provider resolver time, I will investigate. If this is obvious to anyone, feedback would be welcome. This PR is becoming a bit messy from me repeatedly updating it as I test and self-review. I may delete it and recreate it so it's easier to review when it's at that stage. |
9b19347
to
0bbfd8c
Compare
0bbfd8c
to
ec890bf
Compare
I was able to implement this, I replaced the function However this has the downside that it may cause more downloads to get candidate metadata and getting that metadata might throw an exception if an sdist fails to build. I have therefore made It may be worth adding back |
8218374
to
5c73c71
Compare
5c73c71
to
d5a569b
Compare
Interestingly the PR, in it's current state, seems to fix all known cases of #12317, this was not intentional but I do have some intuitive understanding of why it might have happened. I have written about it in a little more details here sarugaku/resolvelib#142 (comment) |
I think it's important that we try to make progress on some of the resolver improvements going on here, so I took a look at this PR. On a high level, though, I'm not 100% sure how the various changes that you have discussed hang together (the ones to do with backjumping, this one, and possibly others). So ideally, it would be helpful if you could put together a unified summary of what you are actually proposing, and what are superseded approaches. Otherwise, I'm concerned that by looking at PRs individually, I'll miss the "big picture". It's quite possible it's actually very simple - my view of the work is based on a flow of github notifications from the pip and resolvelib projects, which is not the most organised way of getting information!!! If so, that's great, and I apologise for my confusion. Having said that, my biggest concern with this PR is not actually with the PR itself, but with the associated resolvelib change. You're introducing a new So what I'd like here from pip's side of things is much clearer documentation in resolvelib on what the The reason I'm asking for this is that my biggest concern here is that we could end up with some sort of heuristic that no-one really understands, and which mostly "just works" as long as no-one touches it. That would be a nightmare for maintenance. The idea of the new resolver being based on resolvelib was to make sure that the pip maintainers didn't have to manage the details of the resolution process, and conversely to ensure that if someone wanted to write a standalone package resolver, they would not need to copy chunks of pip's logic to do so. |
You haven't overlooked anything; the situation is a bit tangled right now. Real-world examples revealing the limitations of the current approach have come in slowly, giving me different ideas on how to speed up the resolution over time, and when making this PR I found some of my assumptions faulty and had to pivot parts of the approach accordingly. I'm thinking of replacing the current issue and PR and create a new issue outlining the overall approach more coherently and following up with a clean PR showcasing the implementation attempt. Would you find this approach more helpful or more frustrating?
Introducing the This is crucial when it would make sense to short-circuit checking remaining requirements, or when the checks are not independent of checking other requirements, which even in Pip currently, results in duplicated work (and O(n^2) situations). The approach in this PR doesn't strictly need
I agree that clearer guidance on
Currently, resolvelib and pip segregate the generic resolution aspects to resolvelib and the Python packaging nuances to pip. However, addressing the intricacies of Python packaging requires accounting for many of those nuances, and solving complex backtracking efficiently, using a DFS-style algorithm, demands additional insights. I don't see how these nuances could be moved to resolvelib, to move them out of Pip I think you would need to make another abstracting library, e.g. a Python packaging specific resolvelib. |
Sorry if it sounded like I was suggesting it was. What I was trying to say was that it's important that using that method (whether in pip, or in some other client of resolvelib) shouldn't be a matter of guesswork. That was the problem with
Sorry to be blunt here, but that sounds awfully hand-wavy. I'm not suggesting that resolvelib knows about Python packaging - definitely not, having it be a generic resolver library is important IMO. But conversely, resolvelib should be able to describe its API in its own terms, without referring to Python packaging. And that means basically describing "what does this knob do" so that when clients want to tweak it, they can do so in an informed way, rather than just (for example) "let's prefer packages with lots of dependencies over packages with only a few and see what that does" (which is how I always felt I was reasoning about Maybe it'd be better to wait and see the resolvelib PR - debating principles is probably not very productive at this point. |
I do not see resolvelib adding a new API unless there is agreement from Pip maintainers it would be a good idea. Some changes on the resolvelib side are currently stuck, awaiting input from Pip maintainers: sarugaku/resolvelib#134 (comment). I would really prefer that the outcome of these discussions not lead to a circular lock.
Well, as I didn't design or implement resolvelib or the new resolver, I can't really take offense. I'm just describing to you the current situation. The simplest way I can think to answer the question of "what does In the DFS, how do you pick which child node should be chosen? You implement a method for choosing them, and in resolvelib, this is done with " Now, each node has conditions on what the final global state should be, and you're trying to select a collection of nodes so none of them disagree about the final global state. To do that performantly, you should understand the conditions different nodes have and the relationships between different conditions, and select child nodes that you think would find this selection of satisfying conditions quicker. resolvelib, at least currently, does not assume almost anything about those conditions and how they may or may not contradict. It is, therefore, a poor algorithm to solve for complex dependency graphs in a performant manner, e.g. taking less time than the heat death of the universe. The provider must, therefore, encode what it knows about these conditions and guide resolvelib, and this currently happens entirely via |
This PR makes Pip far more efficent at resolving when there are many possible causes of as part of a backtracking conflict. E.g.
sphinx sphinx-rtd-theme sphinx-toolbox myst-parser sphinxcontrib-bibtex nbsphinx
currently produces "ResolutionTooDeep" but solves quickly under this PR.This PR comes in three commits:
get_preference
to a new methodfilter_unsatisfied_names
, which solves this users issue: #10621 (comment)causes_with_conflicting_parent
andcauses_with_no_candidates
functions and uses them as part of the newfilter_unsatisfied_names
to prefer conflicting causesStill to do:
Confirm correctly testing emptyness of specifier set: packaging#760causes_with_conflicting_parent
andcauses_with_no_candidates