[Bug] Zookeeper OutOfMemoryError After Upgrading to Pulsar 3.3.0 with Zookeeper 3.9.2 #23348

jamesvsshark · 2024-09-25T01:10:18Z

Search before asking

I searched in the issues and found nothing similar.

Read release policy

I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Pulsar version: 3.3.0
Zookeeper version: 3.9.2
Kubernetes environment: Helm chart deployment
Zookeeper resource configuration:
Request/limit: 6GB memory, 2 CPU
Heap settings: -Xms5632m -Xmx5632m
GC settings:

-XX:+UseG1GC
-XX:MaxGCPauseMillis=10
-XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions
-XX:+DoEscapeAnalysis
-XX:+DisableExplicitGC
-XX:+ExitOnOutOfMemoryError
-XX:+PerfDisableSharedMem

Minimal reproduce step

Upgrade Pulsar to version 3.3.0 and Zookeeper to 3.9.2.
Deploy the Zookeeper quorum using default GC and memory settings from the Helm chart.
Observe memory consumption and monitor for crashes after a few days of running.

What did you expect to see?

Zookeeper should run without constantly increasing memory usage or exhausting resources.

What did you see instead?

Error observed:

java.lang.OutOfMemoryError: unable to create a native thread: possibly out of memory or process/resource limits reached.

Anything else?

Additional Context:

After downgrading Zookeeper to version 3.2.2, the OOM issue stopped, and no pod restarts occurred.
Autorecovery pods running 3.3.0 are also encountering Java heap memory issues.
Reviewing the Pulsar 3.3.0 release notes and PIP-324 (#22054), I suspect changes to the Alpine base image could be affecting thread creation and memory management.

Possible Solution:

It may be necessary to modify the Dockerfile to increase the stack size by setting the PTHREAD_STACK_MIN environment variable:

ENV PTHREAD_STACK_MIN 2097152

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

jamesvsshark added the type/bug The PR fixed a bug or issue reported a bug label Sep 25, 2024

lhotari added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Sep 26, 2024

yuweisung mentioned this issue Sep 27, 2024

[fix][alpine] thread stack size #23361

Open

15 tasks

This was referenced Sep 28, 2024

[fix] EXPERIMENT: Switch docker base image to avoid mixing musl & glibc libraries at runtime #23366

Draft

[fix] EXPERIMENT: Switch to use minideb base image to fix glibc compatibility issues #23376

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Zookeeper OutOfMemoryError After Upgrading to Pulsar 3.3.0 with Zookeeper 3.9.2 #23348

[Bug] Zookeeper OutOfMemoryError After Upgrading to Pulsar 3.3.0 with Zookeeper 3.9.2 #23348

jamesvsshark commented Sep 25, 2024

[Bug] Zookeeper OutOfMemoryError After Upgrading to Pulsar 3.3.0 with Zookeeper 3.9.2 #23348

[Bug] Zookeeper OutOfMemoryError After Upgrading to Pulsar 3.3.0 with Zookeeper 3.9.2 #23348

Comments

jamesvsshark commented Sep 25, 2024

Search before asking

Read release policy

Version

Minimal reproduce step

What did you expect to see?

What did you see instead?

Anything else?

Are you willing to submit a PR?