-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package ordering and duplication substantially affects solve times #3058
Comments
On my machine, moving shapely and pydeck to the very top increases solve time only by 10% (6 s to 6.6 s with Micromamba) |
Thanks Jonas! 🙏
This is admittedly a complicated question. We discussed this a bit offline with @msarahan as well. Didn't immediately arrive at a general solution. Though thought it was worthwhile to discuss with everyone here to see if we might arrive at an improved approach cc @jaimergp @mbargull (who may also have insights into this question)
Yeah Vyas worked pretty hard to simplify the original performance issue into an MRE. Admittedly that can make the changes less dramatic. The original use case involved multiple channels and a whole bunch of packages (explicitly listed with constraints). Vyas might still have the original use case floating around if that is of interest too That said, think the fact that order does affect performance in this way is the more interesting bit, which then raises the question of could we order the dependencies in a more consistent fashion for more optimal performance |
I think it would be nice if you guys could provide an example where ordering shows a more dramatic effect than 10% difference. 50% or more would be nice. I don't think 10% is worth investigating |
The performance difference was more like 400% as of Friday. My guess is that some package in the environment released a new version over the weekend that is causing different solves. Perhaps we could reproduce the same performance by inspecting the total environment and seeing which packages released a new version over the weekend and then constraining that one? I was still observing this behavior as of about 4 PST on Friday Dec 8.
No specific data unfortunately. As far as intuition goes, my intuition would be to put packages with very tight constraints early since that would force the solver to choose only acceptable versions of other packages on the first try later, as opposed to putting packages with loose constraints first where something could be chosen that is not compatible with a package listed later, resulting in additional backtracking. I think that's also what @msarahan suggested. |
FWIW I didn't see any new release in the packages explicitly listed in the environment.yml, so whatever changes must be some dependency. Also to clarify what John said above, the original example (with hundreds of packages and 6 channels) also solves quickly now. The minimal example I posted exhibited nearly the same performance characteristics, they weren't ameliorated in any way by shrinking the environment. The less severe impact now is almost certainly due to some other change in the ecosystem of cf packages since Friday. |
Funny. I also observed multiple environments magically solve much faster today. We must have gotten a new release of a very core package... Would be interesting to understand what exactly made it harder to solve previously |
For sure, I'd be interested as well. The differences have been dramatic; the worst case examples I had on Friday were taking 15 minutes, and now even those take ~10 s. |
See PR description at conda/conda-libmamba-solver#381 (comment). I also saw this and decided to sort by spec strictness (strictest first), because I assume the user is really interested in those. I didn't go for alphabetic order because that usually sends I also know that @wolfv and @BastianZim have introduced some tricks in For the latest 100 updates in conda-forge: https://conda.anaconda.org/conda-forge/rss.xml For all package names since last Thursday:
You can also use this script to rollback repodata to a given moment in time and see what changed. |
We've also seen that some specific orders (but I don't know which ones) make libsolv crash 😂 xref conda/conda-libmamba-solver#391 |
Good to know that the conda libmamba solver is doing similar things! Less excited to find that libsolv has order-dependent behavior that leads to crashes (not just performance changes)... |
I'm adding #3163 to arbitrarily sort the input MatchSpecs before passing it to the solver. |
Sounds great, thank you! I have not noticed anything like this recently, which is good. Congrats on the 2.0 release! |
Troubleshooting docs
Anaconda default channels
How did you install Mamba?
Mambaforge or latest Miniforge
Search tried in issue tracker
is:issue slow order; is:issue order; is:issue solve order
Latest version of Mamba
Tried in Conda?
Not applicable
Describe your issue
I have an environment where the time to solve is very slow, and oddly is dramatically different based on the order in which packages are specified (see the env file pasted below). Moving shapely and pydeck to the end of the environment list has a dramatic reduction in the solve time (from ~400 seconds to ~80 seconds). I originally had a much larger environment but whittled it down to this minimal one a couple of days ago. Unfortunately, when I tried the same thing today the solve has become very fast, so I'm guessing that a new package release has made it much easier for the solver to converge quickly.
I realize that since mamba using a backtracking solver order-dependence is to be expected, so there may not be much that can be done here. One minor suggestion might be to sort and deduplicate the dependency list for the sake of predictability (while reducing the file down I also found that duplicates could also have odd behaviors, again depending on the ordering of those duplicates relative to other dependencies in the list).
I marked this problem as N/A for the "Tried in conda" dropdown above because the conda solve never converged in either case, so I couldn't get useful metrics (that was without the new libmamba solver, and unfortunately I didn't think to try the libmamba solver until today when the mamba issues also seem to have vanished). I also did not try forcing strict channel priority (my default is currently flexible).
mamba info / micromamba info
Logs
No response
environment.yml
~/.condarc
The text was updated successfully, but these errors were encountered: