trino operator restarts at 22-23 every day at k8s #32
Replies: 2 comments
-
Hey @tolstykh-da, are you talking about the Trino operator or the Trino pods (coordinator, worker)? The Trino pods are restarted every ~24h because of TLS certificate rotation (the self generated ones are only valid for 24h currently). Cheers, |
Beta Was this translation helpful? Give feedback.
-
I would like to add the question, if you just observed this behavior or if this causes trouble for you A thing that really helps the situation is a graceful shutdown of the workers, we have stackabletech/trino-operator#429 for this. This would solve the |
Beta Was this translation helpful? Give feedback.
-
Tell me how to overcome the problem - Trino restarts approximately every 22-23 hours, in the logs it looks something like this:
"2023-08-09T13:09:37.657Z\tINFO\tmain\tio.airlift.bootstrap.LifeCycleManager\tLife cycle starting..." "2023-08-09T13:15:00.859Z\tWARN\thttp-client-memoryManager-69\tio.trino.memory.RemoteNodeMemory\tError fetching memory info from http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/memory: Server refused connection: http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/memory" "2023-08-09T13:15:06.705Z\tINFO\tnode-state-poller-0\tio.trino.metadata.DiscoveryNodeManager\tPreviously active node is missing: ada15b36-96a0-47e3-bc86-20a3d9ca93ca (last seen at trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local)" "2023-08-09T13:15:07.857Z\tWARN\thttp-client-memoryManager-scheduler-1\tio.trino.memory.RemoteNodeMemory\tError fetching memory info from http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/memory: Failed communicating with server: http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/memory" "2023-08-09T13:15:06.704Z\tWARN\tnode-state-poller-0\tio.trino.metadata.RemoteNodeState\tNode state update request to http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/info/state has not returned in 10.00s" "2023-08-09T13:15:06.706Z\tWARN\thttp-client-node-manager-56\tio.trino.metadata.RemoteNodeState\tError fetching node state from http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/info/state: Failed communicating with server: http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/info/state" "2023-08-09T13:15:06.704Z\tWARN\thttp-client-node-manager-scheduler-1\tio.trino.metadata.RemoteNodeState\tError fetching node state from http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/info/state: Failed communicating with server: http://trino-worker-default-0.trino-worker-default.dwh.svc.cluster.local:8080/v1/info/state" "2023-08-09T13:15:15.496Z\tINFO\tThread-77\tio.airlift.bootstrap.LifeCycleManager\tJVM is shutting down, cleaning up" "copying /stackable/config to /stackable/rwconfig" "OpenJDK 64-Bit Server VM warning: Option UseBiasedLocking was deprecated in version 15.0 and will likely be removed in a future release." "INFO: Java version: 17.0.7" "2023-08-09T13:15:10.557Z\tINFO\tmain\tBootstrap\tInitializing logging" "2023-08-09T13:15:11.551Z\tINFO\tmain\tBootstrap\tPROPERTY DEFAULT RUNTIME DESCRIPTION" "2023-08-09T13:15:11.551Z\tINFO\tmain\tBootstrap\tservice-inventory.update-interval 10.00s 10.00s Service inventory update interval" "2023-08-09T13:15:11.552Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.use-blocking-connect false false" "2023-08-09T13:15:11.552Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.http2.session-receive-window-size 16MB 16MB Initial size of session's flow control receive window for HTTP/2" "2023-08-09T13:15:11.552Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.http-proxy ---- ----" "2023-08-09T13:15:11.554Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.idle-timeout 1.00m 1.00m" "2023-08-09T13:15:11.554Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.key-store-path /stackable/internal_tls/keystore.p12 /stackable/internal_tls/keystore.p12" "2023-08-09T13:15:11.554Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.log.enabled false false" "2023-08-09T13:15:11.555Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.log.max-size 1GB 1GB" "2023-08-09T13:15:11.555Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.max-connections 200 200" "2023-08-09T13:15:11.555Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.max-requests-queued-per-destination 1024 1024" "2023-08-09T13:15:11.556Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.record-request-complete true true" "2023-08-09T13:15:11.556Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.response-buffer-size 16kB 16kB" "2023-08-09T13:15:11.556Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.selector-count 2 2" "2023-08-09T13:15:11.557Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.timeout-threads 1 1 Total number of timeout threads" "2023-08-09T13:15:11.557Z\tINFO\tmain\tBootstrap\tdiscovery.http-client.https.hostname-verification true true Verify that server hostname matches the server certificate" "2023-08-09T13:15:11.557Z\tINFO\tmain\tBootstrap\tnode-manager.http-client.connect-timeout
Beta Was this translation helpful? Give feedback.
All reactions