背景:

在进行主从切换演练的过程中,发现client应用端会在master下线后,slave选举成为主且节点topology拓扑更新完毕的过程中出现报错信息

默认的时间是60s,我这里配置的是300ms

io.lettuce.core.RedisCommandTimeoutException: Command timed out after 300 millisecond(s)
at io.lettuce.core.ExceptionFactory.createTimeoutException(ExceptionFactory.java:51)
at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:114)
at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:123)
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
at com.sun.proxy.$Proxy9.setex(Unknown Source)
at com.xueqiu.infra.redis4.RedisClusterImpl$171.apply(RedisClusterImpl.java:2336)
at com.xueqiu.infra.redis4.RedisClusterImpl$171.apply(RedisClusterImpl.java:2333)
at com.xueqiu.infra.redis4.RedisClusterImpl.executeSync(RedisClusterImpl.java:543)
at com.xueqiu.infra.redis4.RedisClusterImpl.setex(RedisClusterImpl.java:2333)
at com.xueqiu.infra.redis4.RedisMetricsTest.testMutilSet(RedisMetricsTest.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)

其中集群的topology经历以下的过程:

disconnected –- fail ? -- fail –- all node topology consistency

以下是服务端日志,可以看到花费了近30s,集群才完成了主从切换恢复正常

另外从网上找的一大堆信息,好多都是使用的springboot自带的lettuce,大部分博主说的都是客户端的ClusterTopologyRefreshOptions无法配置,需要升级版本

但是:从雪球的RedisCluster4组件中,采用的默认配置是可以在集群failover恢复后自动更新topology信息的(网上的版本可能与我们不一致,要不就是博主夸大了)

48977:S 24 Feb 11:16:09.477 # Connection with master lost.
48977:S 24 Feb 11:16:09.477 * Caching the disconnected master state.
48977:S 24 Feb 11:16:09.941 * Connecting to MASTER 10.10.200.30:16389
48977:S 24 Feb 11:16:09.941 * MASTER <-> SLAVE sync started
48977:S 24 Feb 11:16:09.941 # Error condition on socket for SYNC: Connection refused
48977:S 24 Feb 11:16:10.943 * Connecting to MASTER 10.10.200.30:16389
48977:S 24 Feb 11:16:10.943 * MASTER <-> SLAVE sync started
……
48977:S 24 Feb 11:16:38.058 # Error condition on socket for SYNC: Connection refused
48977:S 24 Feb 11:16:39.064 * Connecting to MASTER 10.10.200.30:16389
48977:S 24 Feb 11:16:39.064 * MASTER <-> SLAVE sync started
48977:S 24 Feb 11:16:39.064 # Error condition on socket for SYNC: Connection refused
48977:S 24 Feb 11:16:39.713 * FAIL message received from 36ddadb3dbc4a981fe5415c9996754add0f18711 about 486112f7a7d8f52cb2143738a802b49421d0efe0
48977:S 24 Feb 11:16:39.765 # Start of election delayed for 725 milliseconds (rank #0, offset 4548).
48977:S 24 Feb 11:16:40.071 * Connecting to MASTER 10.10.200.30:16389
48977:S 24 Feb 11:16:40.071 * MASTER <-> SLAVE sync started
48977:S 24 Feb 11:16:40.071 # Error condition on socket for SYNC: Connection refused
48977:S 24 Feb 11:16:40.572 # Starting a failover election for epoch 29.
48977:S 24 Feb 11:16:40.620 # Failover election won: I'm the new master.
48977:S 24 Feb 11:16:40.620 # configEpoch set to 29 after successful failover
48977:M 24 Feb 11:16:40.620 # Setting secondary replication ID to f223244de2f9a22b323274f3ac4cabfc61096bbb, valid up to offset: 4549. New replication ID is 0b64827f015621e379f1e0063821c6bcae6ece69
48977:M 24 Feb 11:16:40.620 * Discarding previously cached master state.

但是要是主从切换过程中,client有读写的情况,其中尝试了几次,发现集群恢复正常的耗时较长(这部分server端的原因还没有探究)

48999:S 24 Feb 11:24:24.741 # Connection with master lost.
48999:S 24 Feb 11:24:24.741 * Caching the disconnected master state.
48999:S 24 Feb 11:24:25.222 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:24:25.222 * MASTER <-> SLAVE sync started
48999:S 24 Feb 11:24:25.222 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:24:26.225 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:24:26.225 * MASTER <-> SLAVE sync started
……
48999:S 24 Feb 11:24:55.361 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:24:56.262 * Marking node afc1b251151003a099388c26e9dd3fc90f84e413 as failing (quorum reached).
48999:S 24 Feb 11:24:56.362 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:24:56.362 * MASTER <-> SLAVE sync started
48999:S 24 Feb 11:24:56.362 # Start of election delayed for 535 milliseconds (rank #0, offset 191410).
48999:S 24 Feb 11:24:56.362 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:24:56.966 # Starting a failover election for epoch 30.
48999:S 24 Feb 11:24:57.369 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:24:57.369 * MASTER <-> SLAVE sync started
48999:S 24 Feb 11:24:57.369 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:24:58.370 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:24:58.371 * MASTER <-> SLAVE sync started
……
48999:S 24 Feb 11:25:29.487 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:25:30.490 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:25:30.491 * MASTER <-> SLAVE sync started
48999:S 24 Feb 11:25:30.491 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:25:31.293 # Currently unable to failover: Waiting for votes, but majority still not reached.
48999:S 24 Feb 11:25:31.494 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:25:31.494 * MASTER <-> SLAVE sync started
……
48999:S 24 Feb 11:25:55.580 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:25:56.582 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:25:56.582 * MASTER <-> SLAVE sync started
48999:S 24 Feb 11:25:56.583 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:25:56.983 # Currently unable to failover: Failover attempt expired.
48999:S 24 Feb 11:25:57.586 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:25:57.594 * MASTER <-> SLAVE sync started
48999:S 24 Feb 11:25:57.594 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:25:58.596 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:25:58.596 * MASTER <-> SLAVE sync started
……
48999:S 24 Feb 11:26:55.784 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:26:56.786 * Connecting to MASTER 10.10.200.30:17389
48999:S 24 Feb 11:26:56.786 * MASTER <-> SLAVE sync started
48999:S 24 Feb 11:26:56.786 # Error condition on socket for SYNC: Connection refused
48999:S 24 Feb 11:26:56.987 # Start of election delayed for 547 milliseconds (rank #0, offset 191410).
48999:S 24 Feb 11:26:57.087 # Currently unable to failover: Waiting the delay before I can start a new failover.
48999:S 24 Feb 11:26:57.588 # Starting a failover election for epoch 31.
48999:S 24 Feb 11:26:57.641 # Currently unable to failover: Waiting for votes, but majority still not reached.
48999:S 24 Feb 11:26:57.641 # Failover election won: I'm the new master.
48999:S 24 Feb 11:26:57.642 # configEpoch set to 31 after successful failover
48999:M 24 Feb 11:26:57.642 # Setting secondary replication ID to 0b64827f015621e379f1e0063821c6bcae6ece69, valid up to offset: 191411. New replication ID is ca9d8deacfb6b77a3a2f8edeebda701a0e2ca86c
48999:M 24 Feb 11:26:57.642 * Discarding previously cached master state.

分析原因:

master宕机,slave选举过程如下:

1.slave发现自己的master变为FAIL
2.发起选举前,slave先给自己的epoch(即currentEpoch)增一,然后请求其它master给自己投票。slave是通过广播FAILOVER_AUTH_REQUEST包给集中的每一个masters。
3.slave发起投票后,会等待至少两倍NODE_TIMEOUT时长接收投票结果,不管NODE_TIMEOUT何值,也至少会等待2秒。
4.master接收投票后给slave响应FAILOVER_AUTH_ACK,并且在(NODE_TIMEOUT*2)时间内不会给同一master的其它slave投票。
5.如果slave收到FAILOVER_AUTH_ACK响应的epoch值小于自己的epoch,则会直接丢弃。一旦slave收到多数master的FAILOVER_AUTH_ACK,则声明自己赢得了选举。
6.如果slave在两倍的NODE_TIMEOUT时间内(至少2秒)未赢得选举,则放弃本次选举,然后在四倍NODE_TIMEOUT时间(至少4秒)后重新发起选举。

查看NODE_TIMEOUT方式

127.0.0.1:16389> snblconfig get cluster-node-timeout
1) "cluster-node-timeout"
2) "30000"

由此可以看到,这个slave的选主过程也是影响集群的一个参数,太长太短都不行,需要结合当前的redis cluster的应用场景给出合适的配置时间

强制延迟至少0.5秒选举,是为确保master的fail状态在整个集群内传开,否则可能只有小部分master知晓,而master只会给处于fail状态的master的slaves投票

延迟计算公式:
DELAY = 500ms + random(0 ~ 500ms) + SLAVE_RANK * 1000ms
SLAVE_RANK表示此slave已经从master复制数据的总量的rank。Rank越小代表已复制的数据越新。这种方式下,持有最新数据的slave将会首先发起选举(理论上)。

改善措施:

服务端:

在进行节点迁移的时候,不要使用强制性的master shutdown或者kill操作

另外在其从节点上执行:cluster failover命令虽然可以完成节点切换,但是主从切换的数据一致性问题还有待验证(因为会发生部分主的数据没有同步至slave,这部分数据就会出现丢失)

正确的使用方式应该是:

https://redis.io/commands/cluster-setslot(这部分需要进行线上演练,线下测试是OK的)

这个过程cluster可以完成节点切换,主节点变更为后,会给客户端回复Redirected重定向操作,触发客户端的topology更新,这个过程是自适应的

注意非必要情况不要使用:cluster failover命令进行节点切换

https://redis.io/commands/cluster-failover中有强调

Implementation details and notes
CLUSTER FAILOVER, unless the TAKEOVER option is specified, does not execute a failover synchronously,
it only schedules a manual failover, bypassing the failure detection stage, so to check if the failover actually happened,
CLUSTER NODES or other means should be used in order to verify that the state of the cluster changes after some time the command was sent.

简述:它仅排定手动故障转移,绕过故障检测阶段,因此要检查故障转移是否确实发生了

好比在集群的初始创建阶段这种及特殊case可以

客户端:

服务在使用时,可以通过配置ClusterTopologyRefreshOptions参数,来设置自定义的topology刷新逻辑

比如:http://git.snowballfinance.com/lib/redis-cluster4 readme有解释SDK的设计

ClusterTopologyRefreshOptions topologyRefreshOptions = ClusterTopologyRefreshOptions.builder().enableAllAdaptiveRefreshTriggers()//开启定时刷新topology信息.enablePeriodicRefresh(true)//定时任务时间间隔,这个需要根据业务的场景配置,因为是个容灾操作刷新频率过高会影响P99,频率过低在高QPS会影响SLA.refreshPeriod(Duration.ofSeconds(60)).build();
ClusterClientOptions options = ClusterClientOptions.builder().autoReconnect(true).pingBeforeActivateConnection(true)//命令执行异步等待时间.timeoutOptions(TimeoutOptions.builder().fixedTimeout(Duration.ofMillis(300)).build()).socketOptions(SocketOptions.builder().keepAlive(true).connectTimeout(Duration.ofMillis(300)).build()).topologyRefreshOptions(topologyRefreshOptions).build();

但是topology刷新逻辑的刷新逻辑只能够降低topology的刷新时间,但是在集群master宕机的slave选主过程中,异常还是会发生的

参考资料:

RedisCluster master宕机case:https://blog.csdn.net/ankeway/article/details/100136675/

RedisCluster官方提供的failover:https://redis.io/commands/cluster-failover

https://redis.io/topics/cluster-spec/

redis cluster master failover问题相关推荐

  1. 【Redis Cluster集群】redis cluster 多mster写入,读写分离,高可用

    4. [Redis Cluster集群]redis cluster 多mster写入,读写分离,高可用 redis cluster,提供了多个master, 分布式存储:数据可以分布式存储到多个mas ...

  2. Redis Cluster集群failover

    关于master/slave的failover,redis cluster通过cluster-node-timeout设置节点间状态检测超时. 如果在redis中执行命令比较耗时,比如超过cluste ...

  3. Redis:master/slave、sentinel、Cluster简单总结

    一.单节点实例 单节点实例还是比较简单的,平时做个测试,写个小程序如果需要用到缓存的话,启动一个Redis还是很轻松的,做为一个key/value数据库也是可以胜任的.单节点部署参照:http://w ...

  4. 【故障演练】 Redis Cluster集群,当master宕机,主从切换,客户端报错 timed out

    大家好,我是Tom哥 性能不够,缓存来凑 一个高并发系统肯定少不了缓存的身影,为了保证缓存服务的高可用,我们通常采用 Redis Cluster 集群模式. 描述: 集群部署采用了 3主3从 拓扑结构 ...

  5. 深入解析redis cluster gossip机制

    社区版redis cluster是一个P2P无中心节点的集群架构,依靠gossip协议传播协同自动化修复集群的状态.本文将深入redis cluster gossip协议的细节,剖析redis clu ...

  6. redis cluster集群选主

    redis 选主过程分析  当slave发现自己的master变为FAIL状态时,便尝试进行Failover,以期成为新的master.由于挂掉的master可能会有多个slave.Failover的 ...

  7. 全面剖析Redis Cluster原理和应用 (good)

    redis redis cluster注意的问题 : 1.'cluster-require-full-coverage'参数的设置.该参数是redis配置文件中cluster模式的一个参数,从字面上基 ...

  8. Redis Cluster 高可用方案

    一.Redis Cluster Cluster介绍 Redis 集群采用无中心的方式,为了维护集群状态统一,节点之间需要互相交换消息.Redis采用交换消息的方式被称为 Gossip ,基本思想是节点 ...

  9. Redis:redis cluster的实现细节

    本片博文主要介绍: rediscluster 主要实现的是 数据分片到每一个节点,各节点之间如何通讯并保持数据的一致性.节点状态的同步性!如何即时刷新整个集群系统的状态更新. 1)数据分片 16384 ...

  10. Redis集群:redis cluster方案

    redis集群原理之官方的Redis Cluster方案 redis是单线程,但是一般的作为缓存使用的话,redis足够了,因为它的读写速度太快了. 官方的一个简单测试: 测试完成了50个并发执行10 ...

最新文章

  1. linux shell $0怎么输出,linux shell中$0,$?,$!等的特殊用法
  2. VS2015 MFC属性页孙鑫笔记
  3. 网易考拉没有了,网易严选还会远吗?
  4. Ubuntu下打开rar乱码问题的解决方法
  5. BufferedReader_字符缓冲输入流
  6. .NET Core开发实战(第14课:自定义配置数据源:低成本实现定制化配置方案)--学习笔记...
  7. 消息队列(2):常见的消息队列协议
  8. 【Web网站服务器开发】Apache 和 Tomcat的区别及配置
  9. matlab读取xlsx文件,从电子表格读取数据
  10. 安卓案例:初试谷歌图表
  11. Java性能调优小技巧
  12. 发生了未经处理的异常
  13. [ZOJ 3607] Lazier Salesgirl
  14. STM32CubeMX——霍尔编码器、L298N驱动电机
  15. 超越存储 惠普高端全闪存重新定义闪存
  16. 2019-9-2-C#命令行解析工具
  17. 会议室选多大尺寸的会议平板才合适?
  18. untiy virtual reality supported勾选
  19. 2020年第十一届C/C++ B组第一场蓝桥杯省赛真题
  20. PhoneWindowManager().interceptKeyBeforeQueueing()中的interactive变量值的来源

热门文章

  1. SPA是什么及原生js实现简易SPA单页面
  2. 话说多球 --  乒在民间
  3. 【Io开发笔记】机智云智能浇花器实战(3)-自动生成代码移植
  4. u8反启用固定资产_U8和T3固定资产没有启用不能结账的处理方法
  5. 【系统教程】Windows 11开机后任务栏假死、无响应等问题
  6. 记录解决Win10底部任务栏转圈圈问题的过程(Windows假死)
  7. 外卖扫码点餐独立全开源小程序源码+VUE前端
  8. 瑞安java,​温州瑞安JAVA 培训班
  9. 自学Java软件编程需要哪些基础?
  10. 如何自学Android编程?