-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce entropy for fastestmirror option #324
Conversation
Traditionally the fastestmirror option has relied on taking the provided mirrorlist from the repo CDN and measured the latency to each mirror by timing the libCurl socket open time. This has resulted in several undesirable properties for the fastestmirror option: 1. Socket open latency is not always a good proxy for download bandwidth performance, so if the nearest mirror to a user happens to be particularly slow, they will always experience a very slow download despite having fastestmirror enabled. 2. Mirror operators don't appreciate fastestmirror being used by high density deployments (i.e. data centers) where thousands to millions of hosts with fastestmirror enabled all dogpile the single lowest latency mirror despite there being several mirrors within almost as close proximity to the site This change introduces entropy into the mirror selection process by taking all mirrors with latency measurements less than twice that of the nearest mirror and shuffling those mirrors evenly, then simply appending all other mirrors to the end of the list in strict latency order.
Any thoughts on this? The CI failures seem like they have nothing to do with the code I touched? |
Thanks and sorry for the delay. I think this is a great idea!
I agree, 2x seems reasonable. As you said, latency to begin with is not a perfect proxy for overall mirror speed; it's quite likely to have 10ms mirror that is slower (less available bandwidth) than a 20ms mirror. One unit test is failing on my end ( I'd like to hear a second opinion from someone else on our team before merging this, there might be some use case I'm not thinking of, and so we might want to provide a config option for users to opt-out of this behavior. |
Signed-off-by: Kenneth Finnegan <kennethfinnegan2007@gmail.com>
Thanks Evan. I was just worried that no one wanted to touch this knob and the discussion was going to get memory-holed. Good changes. My C rust is showing. Cherry-picked and signed-off. The only non-consumer/home user use case for fastestmirror that I could come up with was using it to prefer the on-campus mirror as an alternative for the traffic engineering knobs available in Mirror Manager. Picking typical numbers out of the air, the local mirror being 0.8ms away vs public mirrors being 4ms away means there would be zero behavior change here. But I would also love to hear other devs thoughts on how much flexibility we have on the fastestmirror behavior in general. |
Thanks. What if this behavior could be configurable by another option, called We could make |
My biggest concern is the moral hazard of making the option available. A layman reading the documentation would say "well of course I always want the fastest mirror; why wouldn't I?" and every "top 10 things to do right after installing Fedora" article would just start including I think there's upsides to users for this change if the lowest latency mirror happens to be low performance, but I think there is also value for the mirror operators here that this protects mirrors from any one large install base (or network stud, i.e. countries without mirrors) being able to turn this on and dogpile a random mirror they've never heard of. (Granted, they can always explicitly configure a single URL for a repo, but that takes quite a bit more intent) As a mirror operator I've been working with VPS providers who built their golden OS images with bad settings, so I'd rather continue to have fastestmirror be a heuristic blackbox and if users really need more fine grained traffic engineering controls I think there is/should be other avenues for those problem. I'm trying to imagine who would be using fastestmirror for fine grained traffic engineering instead of "eh, just pick something good", who would would be impacted by this change, who shouldn't just be talking to the distro about routing their subnet/ASN to the ideal mirror in the first place. |
Good points. We've decided to merge this as-is and put off deciding about the I'll create PRs to update the documentation of the The CI failures are unrelated to this patch. |
ea8ce15
into
rpm-software-management:master
Thanks. For completeness on this thread, I did try and drum up some use cases on Reddit and Mastodon. The only use case anyone mentioned was being in country codes with no mirror coverage and the need to not use geographically close mirrors in other countries that route via another continent and back. So the difference is already between bad and worse when there isn't local mirrors, and this confirms my suspicion that this change will make DNF pick other mirrors than the one furthest south in Florida to serve all of south america. I'm looking forward to hearing field experience with this change once it trickles its way into general deployment. |
Howdy,
I do want to caveat this PR with the fact that this is the first time I've touched the librepo codebase, and I haven't gotten the test suite running on my local dev box (didn't attempt to after moderate issues just getting the library to build) so I have not run this through all the unit tests. If any of the unit tests expect fastestmirror to always return a stable list of mirrors, that test will break. I didn't see any test definitions that looked to be testing this in a cursory review... I did build and test this change locally on my Alma9 box.
Traditionally the fastestmirror option has relied on taking the provided mirrorlist from the repo CDN and measured the latency to each mirror by timing the libCurl socket open time. This has resulted in several undesirable properties for the fastestmirror option:
This change introduces entropy into the mirror selection process by taking all mirrors with latency measurements less than twice that of the nearest mirror and shuffling those mirrors evenly, then simply appending all other mirrors to the end of the list in strict latency order.
I selected 2x as the upper cutoff for shuffling in mirrors based on nothing analytical. I expect that 2x would do a good job of including all mirrors in the same metro if there's other nearby mirrors since the latency is likely dominated by last mile connectivity. For other users who are already seeing significant >100ms latency to the best mirror, picking <200ms mirrors is going to possibly be noticeably worse, but users 100ms away from the closest mirror are going to be having a poor user experience regardless of which mirror they select.