Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

When CDH switches NA IP address from standy to active -> SSM should also switch to active NA #1813

Open
aurorahunter opened this issue Jun 8, 2018 · 1 comment
Assignees

Comments

@aurorahunter
Copy link
Collaborator

SSM should auto detect NameNode IP addresses and should not expect manual intervention
Error Signature:

2018-06-08 13:10:42,358 ERROR org.smartdata.hdfs.metric.fetcher.InotifyFetchAndApplyTask.run 63: Inotify Apply Events error
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1835)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1513)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1751)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1036)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1543)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)

    at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    at org.apache.hadoop.ipc.Client.call(Client.java:1409)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
    at com.sun.proxy.$Proxy15.getEditsFromTxid(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolTranslatorPB.java:1506)
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
    at com.sun.proxy.$Proxy16.getEditsFromTxid(Unknown Source)
    at org.apache.hadoop.hdfs.DFSInotifyEventInputStream.poll(DFSInotifyEventInputStream.java:109)
    at org.smartdata.hdfs.metric.fetcher.InotifyFetchAndApplyTask.run(InotifyFetchAndApplyTask.java:53)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

My configuration 👍


smart.dfs.namenode.rpcserver
hdfs://sjsstaging001.sj.adas.intel.com/134.191.230.46:9000
Namenode rpcserver


smart.hadoop.conf.path

file:///etc/hadoop/conf
Hadoop main cluster configuration file path


smart.tidb.enable
false

This decides whether Tidb is enabled.

      <property>

smart.security.enable
true


smart.server.keytab.file
/etc/security/ssm_deploy/ssm_deploy.keytab


smart.server.kerberos.principal

ssm_deploy@ADAS.INTEL.COM

To ReproducE:

Our IGK-DevOps cluster is shared with you. You can use it reproduce.
Start the SSM server.
Then restart HDFS service or manually change Standby namenode to active one.
SSM will error with above signature

Thanks

@aurorahunter
Copy link
Collaborator Author

Hi Qiyuan,

I tried your suggestion - Keeping just hadoop conf directory and not to use namenode property in xml.
But SSM still error outs.

hdfs_ssm_ha_debug.zip

I have attached the following in the zipped folder:

  1. SSM logs
  2. SSM conf directory
  3. Hadoop conf directory in the path : /etc/hadoop/conf dir

Let me know if you need any other information. Appreciate your time and help.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants