先看日志错误:

#发布端报错如下:
2022-04-01 10:18:23.812 CST,"postgres","hank",4666,"10.4.9.250:49138",624660ef.123a,1,"idle",2022-04-01 10:18:23 CST,10/0,0,ERROR,55006,"replication slot ""sub"" is active for PID 4575",,,,,,,,,"sub"
2022-04-01 10:18:28.826 CST,"postgres","hank",4681,"10.4.9.250:49152",624660f4.1249,1,"idle",2022-04-01 10:18:28 CST,10/0,0,ERROR,55006,"replication slot ""sub"" is active for PID 4575",,,,,,,,,"sub"
2022-04-01 10:18:34.115 CST,"postgres","hank",4695,"10.4.9.250:49162",624660f9.1257,1,"idle",2022-04-01 10:18:33 CST,6/0,0,LOG,00000,"starting logical decoding for slot ""sub""","Streaming transactions committing after 7/B700E080, reading WAL from 7/6BECE9A0.",,,,,,,,"sub"
2022-04-01 10:18:34.124 CST,"postgres","hank",4695,"10.4.9.250:49162",624660f9.1257,2,"idle",2022-04-01 10:18:33 CST,6/0,0,LOG,00000,"logical decoding found consistent point at 7/6BECE9A0","There are no running transactions.",,,,,,,,"sub"#订阅端错误信息如下:
2022-04-01 10:18:23.804 CST,,,6432,,624660ef.1920,1,,2022-04-01 10:18:23 CST,4/399230,0,LOG,00000,"logical replication apply worker for subscription ""sub"" has started",,,,,,,,,""
2022-04-01 10:18:23.812 CST,,,6432,,624660ef.1920,2,,2022-04-01 10:18:23 CST,4/0,0,ERROR,XX000,"could not start WAL streaming: ERROR:  replication slot ""sub"" is active for PID 4575",,,,,,,,,""
2022-04-01 10:18:23.813 CST,,,6356,,60ecf808.18d4,496876,,2021-07-13 10:18:48 CST,,0,LOG,00000,"background worker ""logical replication worker"" (PID 6432) exited with exit code 1",,,,,,,,,""

报错是由于设置订阅端wal_receiver_timeout时间太短的原因,这里为了测试,我设置了3秒

#发布端有表hank.tb1
postgres=# \c  hank
You are now connected to database "hank" as user "postgres".
hank=# \dRp+Publication pubOwner | All tables | Inserts | Updates | Deletes | Truncates
-------+------------+---------+---------+---------+-----------hank  | f          | t       | t       | t       | t
Tables:"hank.tb1"#表结构如下:
hank=# \d  hank.tb1Table "hank.tb1"Column |  Type  | Collation | Nullable | Default
--------+--------+-----------+----------+---------a      | integer |           | not null | b      | text   |           |          |
Indexes:"tb1_pkey" PRIMARY KEY, btree (a)
Publications:"pub"
Tablespace: "pg_default"#插入测试数据
hank=# insert into hank.tb1 select generate_series(1,10000000),generate_series(1,10000000)||'apple';
INSERT 0 10000000
hank=# \dt+ hank.tb1List of relationsSchema | Name | Type  | Owner |  Size  | Description
--------+------+-------+-------+--------+-------------hank   | tb1  | table | hank  | 490 MB |
(1 row)#修改字段类型,注意这里int扩成bigint是需要重写表的,可以达到我们想要的测试效果
hank=# alter table hank.tb1 alter COLUMN a type bigint;
ALTER TABLE

因为重写表的时间超过了3秒,由于wal_receiver_timeout的设置为3秒,就会终止超过3秒的复制进程,所以就看到了以上报错,而且一直循环。这个参数本意是用来探测发布端主机是否故障,以及网络是否故障。
但是在类似重写表操作,如添加字段填充默认值,扩字段的时候(有些也需要重写表),vacuum full的时候,就可能会出现报错,所以根据情况我们可以适当的加长时间。

另外,由于订阅端无法应用日志,所以也会导致发布端的wal日志堆积,如下:

#可以看到虽然复制槽状态是true,但是restart_lsn没有变化,还是造成wal日志堆积
hank=# SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as replicationSlotLag, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) as confirmedLag,active
FROM pg_replication_slots;slot_name           | replicationslotlag | confirmedlag | active
-------------------------------+--------------------+--------------+--------sub                           | 1218 MB            | 17 MB        | t

修改参数wal_receiver_timeout,暂时加长到30分钟,错误消失

#订阅端参数修改如下:
hank=#  show wal_receiver_timeout ;wal_receiver_timeout
----------------------30min
(1 row)#发布端查看复制槽已没有wal堆积,发布端插入数据,订阅端也正常消费了
hank=# SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as replicationSlotLag, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) as confirmedLag,active
FROM pg_replication_slots;slot_name           | replicationslotlag | confirmedlag | active
-------------------------------+--------------------+--------------+--------sub                           | 56 bytes           | 0 bytes      | t
(3 rows)hank=# insert  into hank.tb1 values (10000002);
INSERT 0 1#订阅端刷新并查看,已经正常消费
hank=# alter subscription sub refresh publication ;
hank=# select max(a) from hank.tb1;max
----------10000002
(1 row)

总结:

  1. 注意wal_receiver_timeout和wal_sender_timeout参数设置,根据情况,可以适当加长参数设置或者设置为0,0为禁用。
  2. 重写表的操作,如扩展字段,加字段并填充值,vacuum full等等,也会导致超时,进而导致订阅端无法正常消费,虽然复制槽状态是正常的,但是restart_lsn不会推进,发布端wal日志会堆积。

参考:
https://www.postgresql.org/docs/13/runtime-config-replication.html

Postgresql逻辑复制报错could not start WAL streaming: ERROR: replication slot “x“is active for PID xxx相关推荐

  1. 深圳坐标软件 SQL数据库 复制 报错 MS-DOS 功能无效 上读取失败 1(函数不正确) 恢复实践

    坐标软件 SQL数据库 复制 报错 MS-DOS 功能无效 上读取失败 1(函数不正确) 恢复实践 接到文件,立即组织进行分析工作,发现 select * from sysobjects  --系统表 ...

  2. 云和恩墨大讲堂丨PostgreSQL逻辑复制案例分享

    PostgreSQL逻辑复制案例分享--2月24日20:00 在PostgreSQL和基于PostgreSQL的国产数据库的使用中,逻辑复制作为一种区别于流复制的数据同步功能,常用于主业务库向分析库的 ...

  3. 搭建ADG过程中复制报错 RMAN-03009 ORA-03113

    搭建ADG过程中复制报错 RMAN-03009 ORA-03113 猜测主备之间网络路由过多导致... 开启mrp进程报错 发现数据文件是主库ASM的路径,备库是单机的 switch database ...

  4. 安装pytorch一直报错解决方法!!! ERROR: Exception: Traceback (most recent call last): File “D:\Anacoda\lib\s....

    之前安装过pytorch,但是显示cuda不可用,一气之下卸载了pytorch,结果就是一直报错. 显示的错误: ERROR: Exception: Traceback (most recent ca ...

  5. 安装php时,make步骤报错make: *** [ext/gd/gd.lo] Error 1

    安装PHP时,make步骤报错make: *** [ext/gd/gd.lo] Error 1 /usr/local/src/LAMP+memcahed+catci/php-5.4.0/ext/gd/ ...

  6. vue项目报错,解决Module build failed: Error: Cannot find module ‘node-sass‘ 问题

    vue项目报错,解决Module build failed: Error: Cannot find module 'node-sass' 问题 参考文章: (1)vue项目报错,解决Module bu ...

  7. Qt5.x解决报错main.cpp:1:10: fatal error: QApplication: No such file or directory问题

    Qt5.x解决报错main.cpp:1:10: fatal error: QApplication: No such file or directory问题 问题描述 定位问题 解决方法 The en ...

  8. sqlserver 2017 还原遇到报错The operating system returned the error ‘21(设备未就绪。)‘

    环境:windows 2016 + sqlserver 2017 备份时正常,还原时一直卡在下面的界面不动 查看日志发现两个报错 The operating system returned the e ...

  9. Qt报错:cannot find -lws_32 collect2: error: ld returned 1 exit status

    Qt报错:cannot find -lws_32 collect2: error: ld returned 1 exit status cannot find -lws_32 报错图所示 尝试奇奇怪怪 ...

最新文章

  1. java cpu 内存_如何检查Java中的CPU和内存使用情况?
  2. IEEE R10 SAC Special Call for Proposals
  3. 编程语言学习--C语言学习资料
  4. MySql取得日期(前一天、某一天)
  5. 如何判断derived-to-base conversion是否legal
  6. 设置centos6 yum源为光盘
  7. c语言玫瑰花图形程序,一个玫瑰花的程序
  8. 旅游是开车自驾好还是坐火车好?能否从各个方面解答一下?
  9. 多线程下载Android
  10. redis集群报错(error) CLUSTERDOWN Hash slot not served
  11. 网络附加存储技术与磁盘阵列柜的对比
  12. 笔记本外接显示器之后扩展屏分辨率问题
  13. 汽车4G车载TBOX智能信息终端
  14. 齐兴皓 团队项目(任务五):项目回顾
  15. 抖音账号如何做好私域流量,私域流量是什么
  16. docker安装torna1.16.2
  17. 数据挖掘之-简单属性之间的相似度和相异度
  18. 苹果7plus专用计算机,iPhone 7 Plus惨了竟用TLC内存?果粉要哭了
  19. R和Rstudio的安装使用+Rdata文件读取和转为csv
  20. HTML5游戏引擎(十五)-时间控制——Timer计时器 Ticker心跳-startTick-stopTick 帧事件-ENTER_FRAME

热门文章

  1. 字符在计算机中的存储
  2. C/C++编程:什么是ANSI C标准?
  3. 让AI简单且强大:深度学习引擎OneFlow技术实践
  4. 狄利克雷卷积_狄利克雷卷积学习笔记
  5. 从实习生到算法专家,他只用了2年!
  6. CITA 技术白皮书
  7. [kaggle]泰坦尼克预测(代码解析)
  8. 手机中android版本9是什么,这是Android手机Android 9.0还是Android 6.0?
  9. php实训心得体会doc,php实训报告心得体会php实训报告心得体会
  10. 引流脚本有什么作用?怎么选引流脚本把引流效果发挥得更好?引流脚本怎么操作?