-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursor latency increased 600% after updating from 48 to 51 #14832
Comments
What is measuring "Query latency"? |
Apologies, just updated the issue. This is measured by prometheus via the |
Any clues in the pdns_recursor_answers* metrics (or the auth4/auth6 separate ones)? |
Please show your configuration. It might a case of https://docs.powerdns.com/recursor/performance.html#threading-and-distribution-of-queries (specific ally the Imbalance section). Please show some lines of the reported periodic stats. There were also changes in the way metrics are computed between 4.8 and 5.1, but we need more data to decide what is going on. Do you have any measures of latency seen by clients? |
phonedph1, looks like a lot more queries started to time out after the upgrade? I'm not sure. omoerbeek, I don't measure latency seen by clients. Config is fairly minimal - I've updated the issue with the config I'm using. Note that I'm also using a lua script, but that's been there since 48. I don't have any logs from the day I upgraded, but here are some from this morning:
|
I edited the stats to make the more readable. They look reasonable at first sight. Though Can you push stats to our public metronome server?
This allows us to see most relevant stats over time, often it reveals something. If that does not reveal anything I'm going to try to reproduce (that will be tomorrow earliest). I might need your sqlite DB for proper reproduction. |
Oh, and please show |
Sure, I've set $ rec_control top-timeouts
Over last 2070 entries:
4.30% 205.251.193.96
4.06% 205.251.197.100
3.82% 205.251.193.127
3.57% 205.251.197.116
3.19% 205.251.192.166
2.80% 205.251.197.213
2.75% 205.251.192.248
2.61% 205.251.197.38
2.37% 205.251.196.72
2.32% 205.251.193.178
1.93% 205.251.192.153
1.74% 205.251.197.248
1.59% 205.251.192.159
1.55% 47.241.207.18
1.55% 205.251.196.174
1.50% 205.251.192.235
1.45% 205.251.196.48
1.40% 47.241.207.16
1.40% 205.251.197.41
1.30% 64.98.148.18
52.80% rest |
I've uploaded the adblock.db to backblaze in case you need it |
Almost all of those IPs — the ones in For example, does “ Or what does “ I don’t think |
I don't see your server coming through yet on https://metronome1.powerdns.com/ |
hmm, interesting. Im not getting anything back from that IP. I cancelled the mtr because there was no output
|
omoerbeek I restarted the service but for some reason it didn't pick up the new config 🤔 I did a start/stop just now and looks like it appears on the list now! Edit: no it doesn't. let me change it to something else |
Ok, I think it didn't like dots in the name (which docs warn against: "Be careful not to include any dots in this setting, unless you know what you are doing." 😅 I changed it to just |
It's unusual to be unable to dig or ping/mtr one of the world's larger DNS services. Do only some zones and nameservers fail or all of them? Is there a firewall blocking ICMP, or intercepting some DNS traffic? Or a BGP routing issue with Amazon? Hard to imagine network issues being related to a PowerDNS upgrade, but it is odd. |
Fixing the reachability of Route 53 servers is important of course. My hypothesis is that the network issues already existed with the 4.8 recursor, but is now more visible in the metrics. |
My firewall is pretty basic and I'm not blocking any hosts explicitly. I can receive replies if I ping hosts on the internet. I'll dig a bit more to see which hosts exactly are failing. Yeah, potentially. I've raised this with my ISP so they take a look too |
Did a couple of test runs with both 4.8 and 5.1
I saw nothing like the increase in reported latency between 4.8 and 5.1 like you did. Actually, in all test runs 5.1 had a bit lower reported latency. So I'm not sure what is causing your reported latency to be higher. Some guesses
Probably it's best to leave this for a moment until the routing to Route53 is fixed and see how that influences the reported latency. |
Sure, the I also did a quick test on |
OK, in that case (forwarding to an authoritative server) you should normally use |
This is not a support question, I have read about opensource and will send support questions to the IRC channel, GitHub Discussions or the mailing list.
I have read and understood the 'out in the open' support policy
Program: Recursor
Issue type: Bug report
Short description
Hello! 👋
First of all - big thanks for making this software. I just wanted to drop a quick note on
a possible performance regression after updating debian repo from 48 to 51.
Query latency increased from 4ms to about 35ms on average after the upgrade with
no changes in config from my side.
Environment
Steps to reproduce
Expected behaviour
I expected latency to roughly stay the same
Actual behaviour
Latency increased increased 600%
Other information
Before update latency hovers at 4ms
After update latency does not go below 15ms and peaks at 40ms
Answer buckets
Prometheus queries for the graphs:
recursor.conf
adblock.lua
The text was updated successfully, but these errors were encountered: