Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After ZookKeeper cluster failure no broker became leader #128

Open
eolivelli opened this issue Mar 2, 2018 · 3 comments
Open

After ZookKeeper cluster failure no broker became leader #128

eolivelli opened this issue Mar 2, 2018 · 3 comments
Labels

Comments

@eolivelli
Copy link
Contributor

Issue reported by QA integration tests.
After several restarts of zookeeper server no broker (in a cluster of 3 brokers) became leader

This is the log at sviluppo05 server, other clients and brokers are thinking that sviluppo05 is the LEADER but actually it is not acting a such role

18-03-02-11-44-29 SEVERE my own discoverypath path is /majordodo/discoverypath/brokers0000006047
18-03-02-11-44-32 SEVERE create failed at /majordodo/leader, code:NODEEXISTS, a LEADER already exists
18-03-02-11-44-32 SEVERE setting watch at /majordodo/leader
18-03-02-11-45-13 SEVERE process event NodeDeleted, at /majordodo/leader, state SyncConnected
18-03-02-11-45-13 SEVERE create failed at /majordodo/leader, code:NODEEXISTS, a LEADER already exists
18-03-02-11-45-13 SEVERE setting watch at /majordodo/leader
18-03-02-11-47-31 SEVERE process event NodeDeleted, at /majordodo/leader, state SyncConnected
18-03-02-11-47-31 SEVERE create failed at /majordodo/leader, code:NODEEXISTS, a LEADER already exists
18-03-02-11-47-31 SEVERE setting watch at /majordodo/leader
18-03-02-11-48-13 SEVERE process event NodeDeleted, at /majordodo/leader, state SyncConnected
18-03-02-11-48-13 SEVERE create success at /majordodo/leader, code:OK, I'm the new LEADER
18-03-02-11-48-13 SEVERE setting watch at /majordodo/leader
18-03-02-11-48-14 SEVERE save new ledgers list LedgersInfo{activeLedgers=[360032, 360070, 360078, 360890, 361300, 361304, 361314, 361326, 361328, 361330], firstLedger=46029, zkVersion=14598}
18-03-02-11-54-09 SEVERE process event None, at null, state Disconnected
18-03-02-11-54-09 SEVERE ZK event: WatchedEvent state:Disconnected type:None path:null
18-03-02-11-54-15 SEVERE process event None, at null, state SyncConnected
18-03-02-11-54-15 SEVERE ZK event: WatchedEvent state:SyncConnected type:None path:null
18-03-02-11-54-15 SEVERE process event None, at null, state SaslAuthenticated
18-03-02-11-54-15 SEVERE ZK event: WatchedEvent state:SaslAuthenticated type:None path:null
18-03-02-11-55-08 SEVERE process event None, at null, state Disconnected
18-03-02-11-55-08 SEVERE ZK event: WatchedEvent state:Disconnected type:None path:null
18-03-02-11-55-09 SEVERE process event None, at null, state SyncConnected
18-03-02-11-55-09 SEVERE ZK event: WatchedEvent state:SyncConnected type:None path:null
18-03-02-11-55-09 SEVERE process event None, at null, state SaslAuthenticated
18-03-02-11-55-09 SEVERE ZK event: WatchedEvent state:SaslAuthenticated type:None path:null
18-03-02-14-00-30 SEVERE ZK event: WatchedEvent state:SyncConnected type:None path:null
18-03-02-14-00-30 SEVERE ZK event: WatchedEvent state:SaslAuthenticated type:None path:null
18-03-02-14-00-30 SEVERE my own discoverypath path is /majordodo/discoverypath/brokers0000006052
18-03-02-14-00-33 SEVERE create failed at /majordodo/leader, code:NODEEXISTS, a LEADER already exists
18-03-02-14-00-33 SEVERE setting watch at /majordodo/leader
18-03-02-14-01-41 SEVERE process event NodeDeleted, at /majordodo/leader, state SyncConnected
18-03-02-14-21-14 SEVERE ZK event: WatchedEvent state:SyncConnected type:None path:null
18-03-02-14-21-14 SEVERE ZK event: WatchedEvent state:SaslAuthenticated type:None path:null
18-03-02-14-21-14 SEVERE my own discoverypath path is /majordodo/discoverypath/brokers0000006056
18-03-02-14-21-16 SEVERE create success at /majordodo/leader, code:OK, I'm the new LEADER
18-03-02-14-21-16 SEVERE setting watch at /majordodo/leader
18-03-02-14-21-18 SEVERE save new ledgers list LedgersInfo{activeLedgers=[360070, 360078, 360890, 361300, 361304, 361314, 361326, 361328, 361330, 361332, 361334], firstLedger=46029, zkVersion=14601}
:

@eolivelli eolivelli added the bug label Mar 2, 2018
@eolivelli
Copy link
Contributor Author

Other logs on sviluppo05
18-03-02-14-25-18 SEVERE error while writing to ledger org.apache.bookkeeper.client.LedgerHandle@3f38c519
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:69)
at org.apache.bookkeeper.client.SyncCallbackUtils$SyncAddCallback.addComplete(SyncCallbackUtils.java:237)
at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:359)
at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:1384)
at org.apache.bookkeeper.client.LedgerHandle$3.safeRun(LedgerHandle.java:438)
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)

18-03-02-14-25-18 SEVERE bookkeeper failure
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:69)
at org.apache.bookkeeper.client.SyncCallbackUtils$SyncAddCallback.addComplete(SyncCallbackUtils.java:237)
at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:359)
at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:1384)
at org.apache.bookkeeper.client.LedgerHandle$3.safeRun(LedgerHandle.java:438)
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)

18-03-02-14-25-18 SEVERE leadershipLost

@eolivelli
Copy link
Contributor Author

18-03-02-14-34-16 SEVERE no leader is present
18-03-02-14-34-16 SEVERE Processing snapshot file: snapshots/361332_16946.snap.json.gz
18-03-02-14-34-16 SEVERE File snapshots/361332_16946.snap.json.gz contains snapshot, ledgerId:361332,sequenceNumber:16946
18-03-02-14-34-16 SEVERE ActualLedgersList from ZK: LedgersInfo{activeLedgers=[360070, 360078, 360890, 361300, 361304, 361314, 361326, 361328, 361330, 361332, 361334], firstLedger=4602
9, zkVersion=14601}
18-03-02-14-34-16 SEVERE Loading snapshot from snapshots/361332_16946.snap.json.gz
18-03-02-14-34-17 SEVERE Snapshot has been taken at ledgerId=361332, sequenceNumber=16946
18-03-02-14-34-17 SEVERE Actual ledgers list:LedgersInfo{activeLedgers=[360070, 360078, 360890, 361300, 361304, 361314, 361326, 361328, 361330, 361332, 361334], firstLedger=46029, zkVe
rsion=14601}
18-03-02-14-34-17 SEVERE Latest snapshotSequenceNumber:361332,16946
18-03-02-14-34-17 SEVERE Skipping ledger 360070
18-03-02-14-34-17 SEVERE Skipping ledger 360078
18-03-02-14-34-17 SEVERE Skipping ledger 360890
18-03-02-14-34-17 SEVERE Skipping ledger 361300
18-03-02-14-34-17 SEVERE Skipping ledger 361304
18-03-02-14-34-17 SEVERE Skipping ledger 361314
18-03-02-14-34-17 SEVERE Skipping ledger 361326
18-03-02-14-34-17 SEVERE Skipping ledger 361328
18-03-02-14-34-17 SEVERE Skipping ledger 361330
18-03-02-14-34-17 SEVERE Recovering from latest snapshot ledger 361332, starting from entry 16946
18-03-02-14-34-17 SEVERE Recovering from ledger 361332, first=16946
18-03-02-14-34-17 SEVERE From entry 16,946, to entry 16,946 (0 %)
18-03-02-14-34-19 SEVERE Fatal error during recovery
org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:69)
at org.apache.bookkeeper.client.SyncCallbackUtils$SyncReadCallback.readComplete(SyncCallbackUtils.java:217)
at org.apache.bookkeeper.client.LedgerHandle$4.onFailure(LedgerHandle.java:712)
at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38)
at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

18-03-02-15-02-52 SEVERE actual leader is at sviluppo06-cs7.sviluppo.dna:7862

@eolivelli
Copy link
Contributor Author

18-03-02-15-02-52 SEVERE error while reading snapshot from network
majordodo.network.BrokerNotAvailableException: io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connessione rifiutata: sviluppo06-cs7.sviluppo.dna/10.168.10.117:7862
at majordodo.network.netty.GenericNettyBrokerLocator.connect(GenericNettyBrokerLocator.java:78)
at majordodo.replication.ReplicatedCommitLog.downloadSnapshotFromMaster(ReplicatedCommitLog.java:152)
at majordodo.replication.ReplicatedCommitLog.loadBrokerStatusSnapshot(ReplicatedCommitLog.java:816)
at majordodo.task.BrokerStatus.recover(BrokerStatus.java:915)
at majordodo.task.Broker.start(Broker.java:219)
at majordodo.embedded.EmbeddedBroker.start(EmbeddedBroker.java:234)
at magnews.majordodo.broker.MajordodoBrokerService.start(MajordodoBrokerService.java:210)
at magnews.broker.JMSBrokerServer.start(JMSBrokerServer.java:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at magnews.server.ServiceStarter.start(ServiceStarter.java:383)
at magnews.server.ServiceManager.main(ServiceManager.java:86)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connessione rifiutata: sviluppo06-cs7.sviluppo.dna/10.168.10.117:7862
at io.netty.channel.unix.Socket.finishConnect(..)(Unknown Source)
Caused by: io.netty.channel.unix.Errors$NativeConnectException: syscall:getsockopt(..) failed: Connessione rifiutata
... 1 more

18-03-02-15-02-52 SEVERE Processing snapshot file: snapshots/361332_16946.snap.json.gz
18-03-02-15-02-52 SEVERE File snapshots/361332_16946.snap.json.gz contains snapshot, ledgerId:361332,sequenceNumber:16946
18-03-02-15-02-52 SEVERE ActualLedgersList from ZK: LedgersInfo{activeLedgers=[360070, 360078, 360890, 361300, 361304, 361314, 361326, 361328, 361330, 361332, 361334], firstLedger=46029, zkVersion=14601}
18-03-02-15-02-52 SEVERE Loading snapshot from snapshots/361332_16946.snap.json.gz
18-03-02-15-02-52 SEVERE Snapshot has been taken at ledgerId=361332, sequenceNumber=16946
18-03-02-15-02-52 SEVERE Actual ledgers list:LedgersInfo{activeLedgers=[360070, 360078, 360890, 361300, 361304, 361314, 361326, 361328, 361330, 361332, 361334], firstLedger=46029, zkVersion=14601}
18-03-02-15-02-52 SEVERE Latest snapshotSequenceNumber:361332,16946

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant