[Meta][Heartbeat] NodeJS process leak in elastic-agent-complete #32386

emilioalvap · 2022-07-18T15:41:10Z

Summary

After some reports of memory leaking in elastic-agent-complete containers running in AWS Fargate, we've found a couple of scenarios where elastic-agent-complete would spawn unresponsive or orphaned processes:

Random scenarios where node process goes unresponsive and doesn't exit.
New Fleet integration triggers elastic-agent-issued restart when a synthetics monitor is running.

Details

NodeJS goes unresponsive and doesn't exit

On some containers used for testing, node processes go unresponsive seemingly at random. Once unresponsive, HB will block that monitor from running again, waiting for node to exit.
Even though node is unresponsive, it will continue to allocate memory over time, until an OOM is issued by the kernel or V8 reaches max heap size:

/// APM logs
11:07:16.802 elastic_agent.filebeat [elastic_agent.filebeat][info] Non-zero metrics in the last 30s
11:07:36.268 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: 
11:07:36.268 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: <--- Last few GCs --->
11:07:36.269 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: 
11:07:36.270 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: [51102:0x5e2e970] 87623078 ms: Scavenge 2026.0 (2080.2) -> 2019.5 (2080.2) MB, 4.5 / 0.0 ms  (average mu = 0.992, current mu = 0.994) allocation failure 
11:07:36.270 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: [51102:0x5e2e970] 87623104 ms: Scavenge 2026.5 (2080.2) -> 2019.8 (2080.2) MB, 4.7 / 0.0 ms  (average mu = 0.992, current mu = 0.994) allocation failure 
11:07:36.270 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: [51102:0x5e2e970] 87623132 ms: Scavenge 2027.3 (2080.2) -> 2020.2 (2096.2) MB, 5.2 / 0.0 ms  (average mu = 0.992, current mu = 0.994) allocation failure 
11:07:36.270 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: 
11:07:36.270 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: 
11:07:36.270 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: <--- JS stacktrace --->
11:07:36.270 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: 
11:07:36.271 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
11:07:36.271 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  1: 0xb09c10 node::Abort() [node]
11:07:36.271 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  2: 0xa1c193 node::FatalError(char const*, char const*) [node]
11:07:36.271 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  3: 0xcf8dbe v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
11:07:36.271 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  4: 0xcf9137 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
11:07:36.271 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  5: 0xeb09d5  [node]
11:07:36.272 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  6: 0xec069d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
11:07:36.272 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  7: 0xec339e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
11:07:36.272 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  8: 0xe848da v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
11:07:36.272 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr:  9: 0x11fd626 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
11:07:36.272 elastic_agent.heartbeat [elastic_agent.heartbeat][info] stderr: 10: 0x15f2099  [node]
11:07:36.634 elastic_agent.heartbeat [elastic_agent.heartbeat][warn] error scanning synthetics runner results read |0: file already closed
11:07:36.634 elastic_agent.heartbeat [elastic_agent.heartbeat][warn] could not scan stdout events from synthetics: read |0: file already closed
11:07:36.635 elastic_agent.heartbeat [elastic_agent.heartbeat][warn] error scanning synthetics runner results read |0: file already closed
11:07:36.635 elastic_agent.heartbeat [elastic_agent.heartbeat][warn] could not scan stderr events from synthetics: read |0: file already closed
11:07:36.635 elastic_agent.heartbeat [elastic_agent.heartbeat][info] Command has completed(-1): /usr/share/elastic-agent/.node/node/bin/elastic-synthetics elastic-synthetics --screenshots on --throttling 5d/3u/20l --inline --rich-events

Fleet integration policy update

When a new Synthetics integration gets added to an agent's policy, elastic-agent triggers a restart for all beats processes. ATM, heartbeat exits without checking if there're synthetic monitors running.

Since these run under a different PID, when HB exits these node processes are orphaned and end up re-assigned as PID 1 children. These children are never destroyed and never release whatever resources are already allocated to them:

Tracking

These're the individual issues:

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-07-18T15:41:12Z

Pinging @elastic/uptime (Team:Uptime)

paulb-elastic · 2022-08-09T16:54:48Z

Closing to leave the separate remaining two issues to be tracked on their own

emilioalvap added bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team labels Jul 18, 2022

emilioalvap changed the title ~~[Meta] NodeJS process leak in elastic-agent-complete~~ [Meta][Heartbeat] NodeJS process leak in elastic-agent-complete Jul 18, 2022

paulb-elastic closed this as completed Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta][Heartbeat] NodeJS process leak in elastic-agent-complete #32386

[Meta][Heartbeat] NodeJS process leak in elastic-agent-complete #32386

emilioalvap commented Jul 18, 2022 •

edited by paulb-elastic

Loading

elasticmachine commented Jul 18, 2022

paulb-elastic commented Aug 9, 2022

[Meta][Heartbeat] NodeJS process leak in elastic-agent-complete #32386

[Meta][Heartbeat] NodeJS process leak in elastic-agent-complete #32386

Comments

emilioalvap commented Jul 18, 2022 • edited by paulb-elastic Loading

Summary

Details

NodeJS goes unresponsive and doesn't exit

Fleet integration policy update

Tracking

elasticmachine commented Jul 18, 2022

paulb-elastic commented Aug 9, 2022

emilioalvap commented Jul 18, 2022 •

edited by paulb-elastic

Loading