近日监控系统改造,使用了phoenix+hbase,最近演示环境监控经常出问题,初步查看为hbase挂掉。
经过log排查发现,是由于centos7.0默认没有fuser命令导致hadoop ha切换失败,hadoop集群挂掉导致;
namenode挂掉是由于zookeeper超时时间设置太小导致。

以下为具体排查过程,

1.首先查看hbase-master log,log显示由于hadoop集群连不上,导致hbase关闭。

2017-08-01 11:46:46,304 INFO  [master/hadoop171/172.16.31.171:60000] regionserver.HRegionServer: stopping server hadoop171,60000,1501497159832; zookeeper connection closed.
2017-08-01 11:46:46,304 INFO  [master/hadoop171/172.16.31.171:60000] regionserver.HRegionServer: master/hadoop171/172.16.31.171:60000 exiting
2017-08-01 11:46:46,307 ERROR [Thread-7] hdfs.DFSClient: Failed to close inode 63944
java.net.ConnectException: Call From hadoop171/172.16.31.171 to hadoop171:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefusedat sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:423)at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)at org.apache.hadoop.ipc.Client.call(Client.java:1475)at org.apache.hadoop.ipc.Client.call(Client.java:1408)at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)at com.sun.proxy.$Proxy16.addBlock(Unknown Source)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:404)at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)at com.sun.proxy.$Proxy17.addBlock(Unknown Source)at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)at com.sun.proxy.$Proxy18.addBlock(Unknown Source)at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)at com.sun.proxy.$Proxy18.addBlock(Unknown Source)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1704)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1500)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:668)
Caused by: java.net.ConnectException: 拒绝连接at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)

2.然后查看hadoop-namenode log,log显示由于zookeeper超时导致namenode挂掉。
集群配置了ha,正常来说,一个namenode挂掉应该切换到另外一个namenode才对。

2017-08-03 05:31:30,999 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for startLogSegment(562081). Succeeded so far: [172.16.31.171:8485]
2017-08-03 05:31:31,984 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: starting log segment 562081 failed for required journal (JournalAndStream(mgr=QJM to [172.16.31.171:8485, 172.16.31.172:8485, 172.16.31.173:8485], stream=null))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:403)at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:107)at org.apache.hadoop.hdfs.server.namenode.JournalSet$3.apply(JournalSet.java:222)at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:219)at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1206)at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1175)at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1249)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6422)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1003)at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
2017-08-03 05:31:31,987 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-08-03 05:31:31,996 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop171/172.16.31.171
************************************************************/

3.接下来排查zkfc log。
hadoop171 zkfc log显示171退出选举,具体log如下

2017-08-03 05:31:32,700 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at hadoop171/172.16.31.171:9000: java.io.EOFException End of File Exception between local host is: "hadoop171/172.16.31.171"; destination host is: "hadoop171":9000; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
2017-08-03 05:31:32,701 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING
2017-08-03 05:31:32,701 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop171/172.16.31.171:9000 entered state: SERVICE_NOT_RESPONDING
2017-08-03 05:31:32,704 WARN org.apache.hadoop.hdfs.tools.DFSZKFailoverController: Can't get local NN thread dump due to 拒绝连接
2017-08-03 05:31:32,704 INFO org.apache.hadoop.ha.ZKFailoverController: Quitting master election for NameNode at hadoop171/172.16.31.171:9000 and marking that fencing is necessary
2017-08-03 05:31:32,704 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
2017-08-03 05:31:32,756 INFO org.apache.zookeeper.ZooKeeper: Session: 0x35d9be43dc1019b closed
2017-08-03 05:31:32,756 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x35d9be43dc1019b
2017-08-03 05:31:32,756 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2017-08-03 05:31:34,758 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)

hadoop172 zkfc log 显示,ha切换要首先fence hadoop171,
**联系过程中 提示,fuser: 未找到命令,经查为centos7 没有fuser命令,遂陷入死循环。
参考 : http://f.dataguru.cn/hadoop-707120-1-1.html

2017-08-03 05:31:32,812 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2017-08-03 05:31:32,813 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0362656812036e6e311a096861646f6f7031373120a84628d33e
2017-08-03 05:31:32,816 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop171/172.16.31.171:9000
2017-08-03 05:31:33,822 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2017-08-03 05:31:33,921 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at hadoop171/172.16.31.171:9000 standby (unable to connect)
java.net.ConnectException: Call From hadoop172/172.16.31.172 to hadoop171:9000 failed on connection exception: java.net.ConnectException: 拒>绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefusedat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:423)at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)at org.apache.hadoop.ipc.Client.call(Client.java:1475)at org.apache.hadoop.ipc.Client.call(Client.java:1408)at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:511)at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:502)at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:60)at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:888)at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:909)at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:808)at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.ConnectException: 拒绝连接at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)at org.apache.hadoop.ipc.Client.call(Client.java:1447)... 14 more2017-08-03 05:31:33,927 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2017-08-03 05:31:33,927 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2017-08-03 05:31:34,092 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to hadoop171...
2017-08-03 05:31:34,096 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop171 port 22
2017-08-03 05:31:34,104 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
2017-08-03 05:31:34,122 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_6.6.1
2017-08-03 05:31:34,122 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.42
2017-08-03 05:31:34,123 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
2017-08-03 05:31:35,373 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
2017-08-03 05:31:35,376 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
2017-08-03 05:31:35,376 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
2017-08-03 05:31:35,377 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-md5 none
2017-08-03 05:31:35,377 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-md5 none
2017-08-03 05:31:35,430 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
2017-08-03 05:31:35,431 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
2017-08-03 05:31:35,447 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
2017-08-03 05:31:35,450 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop171' (RSA) to the list of known hosts.
2017-08-03 05:31:35,451 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
2017-08-03 05:31:35,451 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
2017-08-03 05:31:35,456 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
2017-08-03 05:31:35,457 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
2017-08-03 05:31:35,459 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password
2017-08-03 05:31:35,460 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic
2017-08-03 05:31:35,468 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password
2017-08-03 05:31:35,468 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey
2017-08-03 05:31:35,628 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey).
2017-08-03 05:31:35,629 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to hadoop171
2017-08-03 05:31:35,629 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 9000
2017-08-03 05:31:35,840 WARN org.apache.hadoop.ha.SshFenceByTcpPort: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 9000 via ssh: bash: fuser: 未找到命令
2017-08-03 05:31:35,844 INFO org.apache.hadoop.ha.SshFenceByTcpPort: rc: 127
2017-08-03 05:31:35,844 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop171 port 22
2017-08-03 05:31:35,847 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2017-08-03 05:31:35,847 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2017-08-03 05:31:35,847 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed
2017-08-03 05:31:35,905 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop171/172.16.31.171:9000at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:530)at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:502)at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:60)at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:888)at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:909)at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:808)at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)2017-08-03 05:31:35,906 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2017-08-03 05:31:35,967 INFO org.apache.zookeeper.ZooKeeper: Session: 0x35d9be43dc1019c closed
2017-08-03 05:31:36,968 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop171:2181,hadoop172:2181,hadoop173:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@6562a9e9
2017-08-03 05:31:36,973 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop173/172.16.31.173:2181. Will not attempt to authenticate using SASL (unknown error)
2017-08-03 05:31:37,731 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /172.16.31.172:52192, server: hadoop173/172.16.31.173:2181
2017-08-03 05:31:37,952 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop173/172.16.31.173:2181, sessionid = 0x35d9be43dc1021b, negotiated timeout = 5000
2017-08-03 05:31:37,955 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2017-08-03 05:31:37,956 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2017-08-03 05:31:38,047 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2017-08-03 05:31:38,054 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0362656812036e6e311a096861646f6f7031373120a84628d33e
2017-08-03 05:31:38,056 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop171/172.16.31.171:9000
2017-08-03 05:31:39,061 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2017-08-03 05:31:39,064 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at hadoop171/172.16.31.171:9000 standby (unable to connect)

4.到此已经查明原因,解决方案就是安装fuser对应的包

在namenode主、备节点上安装fuser(datanode节点不用安装)

[root@server101 ~]# yum -y install psmisc
[root@server102 ~]# yum -y install psmisc

zookeeper超时的隐患修改,增大超时 20000ms 增加到 50000ms

5.后续再仔细查看log发现,在hadoop出现问题之前,hbase 执行了一个balance操作。
http://openinx.github.io/2016/06/21/hbase-balance/

2017-08-03 05:30:04,345 TRACE [hadoop171,60000,1501568949531_ChoreService_2] access.AccessController: Access allowed for user hadoop; reason: Global check allowed; remote address: ; request: balance; context: (user=hadoop, scope=GLOBAL, action=ADMIN)
2017-08-03 05:30:04,349 DEBUG [htable-pool260-t1] ipc.RpcClientImpl: Use SIMPLE authentication for service ClientService, sasl=false
2017-08-03 05:30:04,349 DEBUG [htable-pool260-t1] ipc.RpcClientImpl: Connecting to hadoop172/172.16.31.172:60020
2017-08-03 05:30:05,994 DEBUG [hadoop171,60000,1501568949531_ChoreService_2] balancer.StochasticLoadBalancer: Finished computing new load balance plan.  Computation took 1646ms to try 73600 different iterations.  Found a solution that moves 16 regions; Going from a computed cost of 402.57591499759764 to a new cost of 87.37110284715531

完。

phoenix-hbase 服务频繁挂掉问题排查相关推荐

  1. Redis 3.2.3 crashed by signal: 11 服务宕机问题排查

    Redis 3.2.3 crashed by signal: 11 服务宕机问题排查 现象 Redis执行bgsave .bgrewriteaof.全量scan等操作都会出现崩溃 === REDIS ...

  2. 【2016-05-19】一次tomcat频繁挂掉的问题定位

    为什么80%的码农都做不了架构师?>>>    问题: 最近手中一个web项目频繁挂掉,tomcat进程跑着跑着就没了,catalina.out里也没有任何相关报错. 项目功能: 该 ...

  3. phoenix+hbase+Spark整合,Spark处理数据操作phoenix入hbase,Spring Cloud整合phoenix

    1 版本要求 Spark版本:spark-2.3.0-bin-hadoop2.7 Phoenix版本:apache-phoenix-4.14.1-HBase-1.4-bin HBASE版本:hbase ...

  4. 15分钟了解Apache Phoenix(HBase的开源SQL引擎)

    翻译自官方文档(http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html),翻译的不好,望轻拍砖! 什么是Phoenix? Phoeni ...

  5. 登录接口压测响应慢频繁GC问题排查

    登录接口压测响应慢频繁GC问题排查 2020.5.22 最近项目组针对几个较重要的接口进行了几十个小时的压测,发现登录接口的压测呈现了一种响应慢且越来越慢的趋势,CPU 也居高不下 压测情况 查看CP ...

  6. kafka集群broker频繁挂掉问题解决方案

    kafka集群broker频繁挂掉问题解决方案 参考文章: (1)kafka集群broker频繁挂掉问题解决方案 (2)https://www.cnblogs.com/itfly8/p/1068841 ...

  7. BDS-HA:构建高可用、低延迟的HBase服务

    HBase可以支持百TB数据规模.数百万QPS压力下的毫秒响应,适用于大数据背景下的风控和推荐等在线场景.阿里云HBase服务了多家金融.广告.媒体类业务中的风控和推荐,持续的在高可用.低延迟.低成本 ...

  8. 如何保证 HBase 服务的高可用?看看这份 HBase 可用性分析与高可用实践吧!

    来源 | 阿丸笔记 责编 | Carol 头图 | CSDN 下载自视觉中国 HBase作为一个分布式存储的数据库,它是如何保证可用性的呢?对于分布式系统的CAP问题,它是如何权衡的呢? 最重要的是, ...

  9. java 调用 docker 中的 HBase 服务 卡死 不报错 不报异常 卡着不动 但 服务ip是能ping通

    问题现象 最近接了一个需求,要在本地(win)环境运行位于虚拟机中搭建的伪分布 HBase 服务(linux) 在 win 环境中,浏览器 打开 HBase 的 16010 web页面是没有问题的,这 ...

  10. Hbase regionserver频繁突然挂掉的问题处理

    一.背景 系统:linux centos7.4 Hbase:2.1.0-cdh6.3.2 (CDH版本) 二.现象 1.应用方报错: Connection refused: hadoop05/ip:1 ...

最新文章

  1. PyTorch基础(part5)--交叉熵
  2. android短信增加条目,Android仿短信条目右上角的红色小圆球提示气泡
  3. java 打印100以内的质数
  4. python 自动输入用户密码_Linux中Python自动输入sudo 密码
  5. Zabbix---3 监控主机内存使用率
  6. 三维点云学习(1)上-PCA主成分分析 法向量估计
  7. 怎么在Guitar Pro乐谱中加入哇音
  8. 【Flask】通过Flask_login实现用户登录
  9. html按钮点击后无效,关于html中按钮的单击事件,第一次单击可以运行,再次单击不能运行的解决方法...
  10. 区分微信开发平台和公众平台(小程序)
  11. 12生肖年份和星座组合图
  12. 2022年终总结与2023新年展望
  13. 让女人无法抗拒的30句表白
  14. 全球最强的女孩保养秘方大全
  15. 习题 6.10 有一篇文章,共有三行文字,每行有80个字符。要求分别统计出其中英文大写字母、小写字母、数字、空格以及其他字符的个数。
  16. DataTable数据过滤方法
  17. python音频合成_音频拼接的简单实现方法(python一种,java两种)
  18. Annotation(注释):基本Annotation
  19. npy文件的打开,读取
  20. ESRGAN - Enhanced Super-Resolution Generative Adversarial Networks论文翻译——中英文对照

热门文章

  1. jQuery boxy弹出层插件中文演示及讲解(转)
  2. 怎么屏蔽计算机集成声卡,win10系统主板集成声卡关闭的设置方案
  3. mysql的外文图书_外文数据库
  4. CSS、Bulma介绍
  5. Android手机简易计时器(Chronometer实现)
  6. log4j2.xml 配置文件详解
  7. 手机芯片命名规则详解
  8. Java 同环比计算相关逻辑
  9. 学习使用php实现公历农历转换的方法代码
  10. Python 打怪兽游戏