Can log phase plugins impact receive time of backend API proxy responses? #7754
-
Lets say we had this plugin enabled globally: https://github.com/Optum/kong-kafka-log
I think the Kafka team restarted their brokers or something but it caused the plugin to fail to log: We disabled the plugin because a few consumers reported increased latency issues, specifically in our splunk logs we could see some heavy receive time(which is supposed to be time spent receiving the backends response etc. ). https://github.com/Optum/kong-splunk-log
Log ex:
When the kong kafka logging plugin now disabled the higher receive times have decreased across the gateway it seems. Something about a log phase plugin doing retries and storing messages in memory could cause this potentially? I have receive time documented as: "
So seems weird log phase plugins could hurt such a fields latency results. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey @jeremyjpj0916 👋 I realize this is an old thread, but I wanted to drop my $0.02 in here just in case. The log phase runs after the response headers and body have already been sent, so it shouldn't be able to cause latency between Kong and the downstream client--not directly, at least. Not super relevant, but maybe of interest: any plugin that ships logs over the network actually schedules almost all of its work in a timer that runs after the log phase is complete--largely because the cosocket API (ngx.socket.tcp/ngx.socket.udp) is disabled during the log phase. Most of the work that is done in shipping log data over the network is non-blocking, so it certainly could be the case that the downed Kafka brokers caused enough back pressure to gunk things up and cause NGINX/Kong to start struggling. If your rate of traffic was really high maybe you could have amassed an amount of running log timers large enough to consume sufficient CPU time to block requests. This sounds really unlikely though, I can't imagine this happening without blowing up all of the other latency metrics. I'm curious what the OS-level metrics had to say about the state of the system during this window (CPU saturated? Network interface saturated? Swapping? Maxing out file descriptors?). Maybe there were some transient networking issues that could have caused the logging errors and the slow receive time? |
Beta Was this translation helpful? Give feedback.
Hey @jeremyjpj0916 👋
I realize this is an old thread, but I wanted to drop my $0.02 in here just in case.
The log phase runs after the response headers and body have already been sent, so it shouldn't be able to cause latency between Kong and the downstream client--not directly, at least.
Not super relevant, but maybe of interest: any plugin that ships logs over the network actually schedules almost all of its work in a timer that runs after the log phase is complete--largely because the cosocket API (ngx.socket.tcp/ngx.socket.udp) is disabled during the log phase.
Most of the work that is done in shipping log data over the network is non-blocking, so it certainly could be the case that t…