Can log phase plugins impact receive time of backend API proxy responses? #7754

jeremyjpj0916 · 2021-08-24T05:51:20Z

jeremyjpj0916
Aug 24, 2021

Lets say we had this plugin enabled globally: https://github.com/Optum/kong-kafka-log
Config:

{
  "consumer": null,
  "id": "6c620116-7585-4e78-86ae-eb23314977b9",
  "route": null,
  "name": "kong-kafka-log",
  "service": null,
  "config": {
    "bootstrap_servers": [
      "server24621.company.com:9093",
      "server24622.company.com:9093",
      "server24626.company.com:9093",
      "server24627.company.com:9093",
      "server24628.company.com:9093",
      "server24632.company.com:9093",
      "server24633.company.com:9093",
      "server42705.company.com:9093"
    ],
    "app_name": "Prod",
    "producer_request_acks": 1,
    "topic": "my_topic_in_config",
    "timeout": 10000,
    "producer_request_limits_bytes_per_request": 1048576,
    "producer_request_retries_max_attempts": 5,
    "producer_request_retries_backoff_timeout": 100,
    "producer_async": true,
    "ssl": true,
    "producer_async_buffering_limits_messages_in_memory": 10000,
    "producer_async_flush_timeout": 1000,
    "ask_id": "UHGWM***-******",
    "keepalive": 60000,
    "producer_request_limits_messages_per_request": 50,
    "producer_request_timeout": 2000,
    "ssl_verify": false
  },
  "protocols": [
    "grpc",
    "grpcs",
    "http",
    "https"
  ],
  "enabled": false,
  "tags": null,
  "created_at": 1588646543
}

I think the Kafka team restarted their brokers or something but it caused the plugin to fail to log:

We disabled the plugin because a few consumers reported increased latency issues, specifically in our splunk logs we could see some heavy receive time(which is supposed to be time spent receiving the backends response etc. ).

https://github.com/Optum/kong-splunk-log
Ex splunk log:

index=cba_stargate URI=/api/thing/proxy/v1* "KongLatency.ReceiveTime">2000

Log ex:

{ [-]
   BackendLatency: 1263
   CID: 270c15ea-556c-48fd-b6c2-abe0ecba6385
   ClientIP: 10.******
   Consumer: *****
   GatewayHost: gateway-core.company.com
   GatewayPort: 443
   HTTPMethod: POST
   HTTPStatus: 200
   KongLatency: { [-]
     AccessTime: 4
     BalancerTime: 0
     ReceiveTime: 2110
     RewriteTime: 0
   }
   RequestSize: 1090
   ResponseSize: 17404
   RoutingURL: *********
   ServiceName: ********.api.prod
   TotalLatency: 3377
   Tries: [ [+]
   ]
   URI: /api/thing/proxy/v1
}
Show as raw text
host = gateway-core-DC.company.com source = kong-318-hv8hq sourcetype = AccessLog

When the kong kafka logging plugin now disabled the higher receive times have decreased across the gateway it seems. Something about a log phase plugin doing retries and storing messages in memory could cause this potentially?

I have receive time documented as:

"
Receive Time: Interval between when Stargate receives the first TCP data packet of the response, and either:

Kong has buffered the exact amount of bytes the API specified in it's Content-Length response header.
The API closes it's connection with Kong (TCP FIN-ACK/RST).
"

So seems weird log phase plugins could hurt such a fields latency results.

Answered by flrgh

Dec 8, 2021

Hey @jeremyjpj0916 👋

I realize this is an old thread, but I wanted to drop my $0.02 in here just in case.

The log phase runs after the response headers and body have already been sent, so it shouldn't be able to cause latency between Kong and the downstream client--not directly, at least.

Not super relevant, but maybe of interest: any plugin that ships logs over the network actually schedules almost all of its work in a timer that runs after the log phase is complete--largely because the cosocket API (ngx.socket.tcp/ngx.socket.udp) is disabled during the log phase.

Most of the work that is done in shipping log data over the network is non-blocking, so it certainly could be the case that t…

View full answer

flrgh · 2021-12-08T01:04:26Z

flrgh
Dec 8, 2021
Maintainer

Hey @jeremyjpj0916 👋

I realize this is an old thread, but I wanted to drop my $0.02 in here just in case.

The log phase runs after the response headers and body have already been sent, so it shouldn't be able to cause latency between Kong and the downstream client--not directly, at least.

Not super relevant, but maybe of interest: any plugin that ships logs over the network actually schedules almost all of its work in a timer that runs after the log phase is complete--largely because the cosocket API (ngx.socket.tcp/ngx.socket.udp) is disabled during the log phase.

Most of the work that is done in shipping log data over the network is non-blocking, so it certainly could be the case that the downed Kafka brokers caused enough back pressure to gunk things up and cause NGINX/Kong to start struggling. If your rate of traffic was really high maybe you could have amassed an amount of running log timers large enough to consume sufficient CPU time to block requests. This sounds really unlikely though, I can't imagine this happening without blowing up all of the other latency metrics.

I'm curious what the OS-level metrics had to say about the state of the system during this window (CPU saturated? Network interface saturated? Swapping? Maxing out file descriptors?). Maybe there were some transient networking issues that could have caused the logging errors and the slow receive time?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can log phase plugins impact receive time of backend API proxy responses? #7754

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Can log phase plugins impact receive time of backend API proxy responses? #7754

jeremyjpj0916 Aug 24, 2021

Replies: 1 comment

flrgh Dec 8, 2021 Maintainer

jeremyjpj0916
Aug 24, 2021

flrgh
Dec 8, 2021
Maintainer