背景:之前同事初始化redis实例的时候把monitor命令给rename了,今天刚好这组实例有问题需要通过monitor命令来帮助分析问题,结果发现monitor命令用不了,因为redis命令重命名不支持动态修改,于是就想删除monitor命令后赶快重启redis从库。重启完后,发现主从复制状态一直处于down的状态,于是开始排查问题

先看redis从库的日志:

43037:S 20 Apr 06:12:38.047 MASTER <-> SLAVE sync started
43037:S 20 Apr 06:12:38.047
Non blocking connect for SYNC fired the event.
43037:S 20 Apr 06:12:38.048 Master replied to PING, replication can continue...
43037:S 20 Apr 06:12:38.048
Partial resynchronization not possible (no cached master)
43037:S 20 Apr 06:12:39.112 Full resync from master: 96f2ae75d50e1f8b69737509d5b32b2da660e7c0:885061114038
43037:S 20 Apr 06:19:16.258
MASTER <-> SLAVE sync: receiving 20819568576 bytes from master
43037:S 20 Apr 06:20:08.203 MASTER <-> SLAVE sync: Flushing old data
43037:S 20 Apr 06:24:36.036
MASTER <-> SLAVE sync: Loading DB in memory
43037:S 20 Apr 06:28:18.538 MASTER <-> SLAVE sync: Finished with success
43037:S 20 Apr 06:28:19.782
Background append only file rewriting started by pid 173002
43037:S 20 Apr 06:28:19.982 # Connection with master lost.
43037:S 20 Apr 06:28:19.982 Caching the disconnected master state.
43037:S 20 Apr 06:28:20.984
Connecting to MASTER 10.93.157.52:6385
43037:S 20 Apr 06:28:20.985 MASTER <-> SLAVE sync started
43037:S 20 Apr 06:28:20.985
Non blocking connect for SYNC fired the event.
43037:S 20 Apr 06:28:20.985 Master replied to PING, replication can continue...
43037:S 20 Apr 06:28:20.986
Trying a partial resynchronization (request 96f2ae75d50e1f8b69737509d5b32b2da660e7c0:885062375607).
43037:S 20 Apr 06:28:22.073 Full resync from master: 96f2ae75d50e1f8b69737509d5b32b2da660e7c0:885240485270
43037:S 20 Apr 06:28:22.073
Discarding previously cached master state.
43037:S 20 Apr 06:33:30.800 # Timeout receiving bulk data from MASTER... If the problem persists try to set the 'repl-timeout' parameter in redis.conf to a larger value.
43037:S 20 Apr 06:33:30.801 Connecting to MASTER 10.93.157.52:6385
43037:S 20 Apr 06:33:30.802
MASTER <-> SLAVE sync started
43037:S 20 Apr 06:33:30.802 Non blocking connect for SYNC fired the event.
43037:S 20 Apr 06:33:30.802
Master replied to PING, replication can continue...
43037:S 20 Apr 06:33:30.803 Partial resynchronization not possible (no cached master)
43037:S 20 Apr 06:34:27.458
AOF rewrite child asks to stop sending diffs.
173002:C 20 Apr 06:34:27.470 Parent agreed to stop sending diffs. Finalizing AOF...
173002:C 20 Apr 06:34:27.470
Concatenating 1.19 MB of AOF diff received from parent.
173002:C 20 Apr 06:34:27.477 SYNC append only file rewrite performed
173002:C 20 Apr 06:34:28.757
AOF rewrite: 119 MB of memory used by copy-on-write
43037:S 20 Apr 06:34:29.961 Background AOF rewrite terminated with success
43037:S 20 Apr 06:34:29.961
Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
43037:S 20 Apr 06:34:29.961 Background AOF rewrite finished successfully
43037:S 20 Apr 06:35:08.471
Full resync from master: 96f2ae75d50e1f8b69737509d5b32b2da660e7c0:885393456775
43037:S 20 Apr 06:41:49.489 MASTER <-> SLAVE sync: receiving 20821399746 bytes from master
43037:S 20 Apr 06:42:28.911 # I/O error trying to sync with MASTER: connection lost
43037:S 20 Apr 06:42:32.642
Connecting to MASTER 10.93.157.52:6385
43037:S 20 Apr 06:42:32.646 MASTER <-> SLAVE sync started
43037:S 20 Apr 06:42:32.646
Non blocking connect for SYNC fired the event.
43037:S 20 Apr 06:42:32.647 Master replied to PING, replication can continue...
43037:S 20 Apr 06:42:32.647
Partial resynchronization not possible (no cached master)
43037:S 20 Apr 06:42:33.755 Full resync from master: 96f2ae75d50e1f8b69737509d5b32b2da660e7c0:885541422071
43037:S 20 Apr 06:49:15.956
MASTER <-> SLAVE sync: receiving 20821403571 bytes from master
43037:S 20 Apr 06:50:16.781 MASTER <-> SLAVE sync: Flushing old data
43037:S 20 Apr 06:54:23.078
MASTER <-> SLAVE sync: Loading DB in memory
43037:S 20 Apr 06:58:10.123 MASTER <-> SLAVE sync: Finished with success
43037:S 20 Apr 06:58:11.317
Background append only file rewriting started by pid 223387
43037:S 20 Apr 06:58:11.536 # Connection with master lost.
43037:S 20 Apr 06:58:11.536 * Caching the disconnected master state.

再来看一下redis主库的日志:

304369:M 20 Apr 05:13:00.197 Slave 10.93.157.16:6383 asks for synchronization
304369:M 20 Apr 05:13:00.197
Full resync requested by slave 10.93.157.16:6383
304369:M 20 Apr 05:13:00.197 Starting BGSAVE for SYNC with target: disk
304369:M 20 Apr 05:13:00.902
Background saving started by pid 366254
366254:C 20 Apr 05:19:14.460 DB saved on disk
366254:C 20 Apr 05:19:14.961
RDB: 4613 MB of memory used by copy-on-write
304369:M 20 Apr 05:19:15.579 Background saving terminated with success
304369:M 20 Apr 05:20:18.303
Synchronization with slave 10.93.157.16:6383 succeeded
304369:M 20 Apr 05:22:32.216 # Client id=1768461 addr=10.93.157.16:26912 fd=10 name= age=572 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=11450 omem=278945640 events=rw cmd=psync scheduled to
be closed ASAP for overcoming of output buffer limits.
304369:M 20 Apr 05:22:32.216 # Connection with slave 10.93.157.16:6383 lost.
304369:M 20 Apr 05:28:27.651 Slave 10.93.157.16:6383 asks for synchronization
304369:M 20 Apr 05:28:27.651
Unable to partial resync with slave 10.93.157.16:6383 for lack of backlog (Slave request was: 884116305579).
304369:M 20 Apr 05:28:27.651 Starting BGSAVE for SYNC with target: disk
304369:M 20 Apr 05:28:28.356
Background saving started by pid 396084
304369:M 20 Apr 05:32:37.471 # Client id=1768945 addr=10.93.157.16:21854 fd=22 name= age=250 idle=250 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=16366 oll=11126 omem=271085384 events=r cmd=psync scheduled to
be closed ASAP for overcoming of output buffer limits.
304369:M 20 Apr 05:32:37.538 # Connection with slave 10.93.157.16:6383 lost.

从日志中我们可以看到在redis主库在接到从库要求重新同步数据的时候先生成一个rdb文件,再通过psync来做部分同步,可以看到问题就出在部分同步这一块,由于client-output-buffer-limit值设置太小,导致导数发送失败。失败后,从库继续发起数据同步的请求,但是每次都失败,redis主库反复地生成rdb文件,虽然是有fork一个子进程,但是对redis主库的吞吐量是有一定影响的。

现在我们来解决这个问题,先看一下目前 client-output-buffer-limit的大小:
127.0.0.1:6385> config get client-output-buffer-limit
1) "client-output-buffer-limit"
2) "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60"
可以看到,目前的限制是最大256M和60s内不超过64M,从我们上面日志信息可以看到psync的数据明显是大于256M的。我们进行如下设置把很限制调大:
config set client-output-buffer-limit 'slave 1073741824 268435456 60'
调完后再观察,发现从库的的复制状态很快就变成了up。
我们上面的调整是加大复制输出缓冲区,还有一个办法就是关闭复制输出缓冲区的限制:
config set client-output-buffer-limit 'slave 0 0 0'

转载于:https://blog.51cto.com/chenql/2106074

redis重做从库时报Connection with master lost错误相关推荐

  1. 使用OSS Java SDK时报“Connection to XXXX refused”错误

    问题描述 使用OSS Java SDK时,报类似如下错误信息. Connection to http://buname.oss-cn-hangzhou-internal.aliyuncs.com re ...

  2. Cognos配置oracle类型内容库时报错

    Cognos配置oracle类型内容库时报错 Cognos初次安装,创建内容库为Oracle数据库类型的时候,报下面的错误 [Content Manager database connection] ...

  3. Redis Desktop Manager 运行时报0xc000007b

    一.原因 在windows7环境中运行Redis Desktop Manager 运行时报0xc000007b原因是缺少 vc++ 2017 redistributable 的相关依赖库. 二.解决办 ...

  4. php连接redis的操作库predis操作大全

    predis是php连接redis的操作库,由于它完全使用php编写,大量使用命名空间以及闭包等功能,只支持php5.3以上版本,故实测性能一般,每秒25000次读写,相信改换c语言编写的php扩展后 ...

  5. 在导入NVIDIA的apex库时报错 ImportError cannot import name ‘UnencryptedCookieSessionFactoryConfig‘ from

    在导入NVIDIA的apex库时报错 ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyr ...

  6. php实现mysql 数据迁移_php实现redis数据库指定库号迁移的方法

    本文实例讲述了php实现redis数据库指定库号迁移的方法,分享给大家供大家参考.具体如下: redis普通的数据库迁移,只能整个redis save,或者利用主从,当然也可以安装一个redis-du ...

  7. MDK AC6开启FPU移植DSP库时报错Error: L6242E: Cannot link object arm_cos_f32.o as its attributes are incompat

    报错 我在keil mdk中使用AC6作为编译器,在给f4移植dsp库时报错,照网上教程配置好了,编译后报错 Error: L6242E: Cannot link object arm_common_ ...

  8. OC和swift混合工程更新库时报:target has transitive dependencies that include statically linked binaries

    OC和swift混合工程更新库时报:target has transitive dependencies that include statically linked binaries.有时间报的错误 ...

  9. hiredis——Redis的C++库(使用教程

    (本文只是我个人的理解,如有错误,请大佬们务必帮忙指正.) 记几篇文章: C++连接redis数据库(自学笔记) redis使用c++ API 的hiredis连接详解 redis 学习-hiredi ...

  10. python 安装 pymssql 库时报错

    python 安装 pymssql 库时报错 安装第三方库 pip install pymssql Traceback (most recent call last): File "c:\u ...

最新文章

  1. ffmpeg源码分析——av_register_all
  2. Vue.js学习系列(三十四)-- Vue.js样式绑定(五)
  3. Spring基于Setter函数的依赖注入(DI)
  4. 基于Serverless的云原生转型实践
  5. 树结构-------前缀树
  6. Java反射在Main方法的中的而应用
  7. c\c++中比较两个字符串或string对象是否相等
  8. php soap 超时,PHP SoapClient超时
  9. QEMU: virsh/qemu-kvm 总结
  10. python程序员工资待遇-为什么企业很难招聘到好的python程序员?
  11. GD32 汽车诊断协议J1850-PWM 测试
  12. 计算机培训普通话是什么,计算机二级培训开课+普通话证书领取+普通话测试练习资料(二)...
  13. tecplot合速度
  14. arduino 读取模拟电压_Arduino内置教程-基本原理-读取模拟电压
  15. BUG记录--Iconfont图标在IE8浏览器的兼容问题
  16. ArrayList的扩容机制
  17. Cisco IOS下载地址整理
  18. 计算机软件与微电子专业,系专业和年级_ 计算机软件及微电子专业98级.doc
  19. 学习一下Retweet Button的代码
  20. 自制基于arduino的GPS地图导航系统 原型

热门文章

  1. java synchronized 卖票_Java通过卖票理解多线程
  2. 首届电子商务AI算法大赛 Organized by automlai
  3. Count Primes
  4. Dijkstra最短路径算法
  5. 均方根误差RMSE 均方根值(RMS)+ 均方根误差(RMSE)+标准差(Standard Deviation)
  6. 数据分析中会常犯哪些错误,如何解决? 一
  7. 频域卷积定理的证明 乘积的傅里叶变换等于分别做傅里叶变换的卷积乘1/2pi
  8. sql依据单个字段去重_Java面试之常用SQL
  9. spring 的3种常用的注入方式
  10. 【BZOJ 3669】 [Noi2014]魔法森林 LCT维护动态最小生成树