Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Zookeeper OutOfMemoryError After Upgrading to Pulsar 3.3.0 with Zookeeper 3.9.2 #23348

Open
3 tasks done
jamesvsshark opened this issue Sep 25, 2024 · 0 comments
Open
3 tasks done
Labels
release/blocker Indicate the PR or issue that should block the release until it gets resolved type/bug The PR fixed a bug or issue reported a bug

Comments

@jamesvsshark
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Pulsar version: 3.3.0
Zookeeper version: 3.9.2
Kubernetes environment: Helm chart deployment
Zookeeper resource configuration:
Request/limit: 6GB memory, 2 CPU
Heap settings: -Xms5632m -Xmx5632m
GC settings:

-XX:+UseG1GC
-XX:MaxGCPauseMillis=10
-XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions
-XX:+DoEscapeAnalysis
-XX:+DisableExplicitGC
-XX:+ExitOnOutOfMemoryError
-XX:+PerfDisableSharedMem

Minimal reproduce step

  1. Upgrade Pulsar to version 3.3.0 and Zookeeper to 3.9.2.
  2. Deploy the Zookeeper quorum using default GC and memory settings from the Helm chart.
  3. Observe memory consumption and monitor for crashes after a few days of running.

What did you expect to see?

Zookeeper should run without constantly increasing memory usage or exhausting resources.

What did you see instead?

Error observed:

java.lang.OutOfMemoryError: unable to create a native thread: possibly out of memory or process/resource limits reached.

Anything else?

Additional Context:

After downgrading Zookeeper to version 3.2.2, the OOM issue stopped, and no pod restarts occurred.
Autorecovery pods running 3.3.0 are also encountering Java heap memory issues.
Reviewing the Pulsar 3.3.0 release notes and PIP-324 (#22054), I suspect changes to the Alpine base image could be affecting thread creation and memory management.

Possible Solution:

It may be necessary to modify the Dockerfile to increase the stack size by setting the PTHREAD_STACK_MIN environment variable:

ENV PTHREAD_STACK_MIN 2097152

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@jamesvsshark jamesvsshark added the type/bug The PR fixed a bug or issue reported a bug label Sep 25, 2024
@lhotari lhotari added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/blocker Indicate the PR or issue that should block the release until it gets resolved type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants