KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障
案例说明:
在KingbaseES V8R6集群备库执行“repmgr standby switchover”时,切换失败,并且在执行过程中,伴随着“repmr standby follow”操作,本案例详细记录了解决此问题的过程。
适用版本: KingbaseES V8R6
集群节点信息:
一、备库执行switchover操作
1、执行switchover切换
[kingbase@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U esrep -d esrep
WARNING: following problems with command line parameters detected:database connection parameters not required when executing UNKNOWN ACTION
NOTICE: executing switchover on node "node101" (ID: 1)
ERROR: local node "node101" (ID: 1) is not a downstream of demotion candidate primary "node102" (ID: 2)
DETAIL: local node has no registered upstream node
HINT: execute "repmgr standby register --force" to update the local node's metadata
2、切换失败信息
3、查看集群节点状态
[kingbase@node101 bin]$ ./repmgr cluster showID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------1 | node101 | standby | running | | default | 100 | 6 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=32 | node102 | primary | * running | | default | 100 | 6 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
=如下所示,standby节点的upstream为空,无法执行switchover。=
二、配置standby节点的upstream(repmgr standby follow)
1、执行“repmgr standby follow”
[kingbase@node101 bin]$ ./repmgr standby follow -h 192.168.1.102 -U esrep -d esrep
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 1/CE004F50, follow target lsn is 1/CE004F50
ERROR: slot "repmgr_slot_1" already exists as an active slotNOTICE: STANDBY FOLLOW failed
DETAIL: slot "repmgr_slot_1" already exists as an active slot# standby的replication slot是active状态test=# select * from sys_replication_slots;slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confir
med_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+-------
--------------repmgr_slot_1 | | physical | | | f | t | 8596 | 1917 | | 1/CE005038 |
(1 row)
2、停止数据库删除standby的replication slot
# 关闭备库数据库服务
[kingbase@node101 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down.... done
server stopped# 注释kingbase.auto.conf中slot参数
[kingbase@node101 bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
enable_upper_colname = 'on'
client_idle_timeout = '0'
synchronous_standby_names = ''
wal_retrieve_retry_interval = '5000'
primary_conninfo = 'user=system connect_timeout=10 host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
recovery_target_timeline = 'latest'
# primary_slot_name = 'repmgr_slot_1'# 查看slot状态
test=# select * from sys_replication_slots;slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confir
med_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+-------
--------------repmgr_slot_1 | | physical | | | f | f | | 1922 | | 1/CE005038 |
(1 row)# 删除备库replication slot
test=# select sys_drop_replication_slot('repmgr_slot_1');sys_drop_replication_slot
---------------------------(1 row)
3、启动数据库服务执行"repmgr standby follow"
[kingbase@node101 bin]$ ./sys_ctl start -D ../data
waiting for server to start....2022-08-09 10:39:50.600 CST [6829] WARNING: enable_upper_colname can only be opened
......
server started[kingbase@node101 bin]$ ./repmgr standby follow -h 192.168.1.102 -U esrep -d esrep
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 1/CE0052E0, follow target lsn is 1/CE0052E0
NOTICE: setting node 1's upstream to node 2
NOTICE: begin to stopp server at 2022-08-09 10:39:55.101228
NOTICE: stopping server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile -w -t 90 -m fast stop"
NOTICE: stopp server finish at 2022-08-09 10:39:55.205646
NOTICE: begin to start server at 2022-08-09 10:39:55.205705
NOTICE: starting server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile start"
NOTICE: start server finish at 2022-08-09 10:39:55.316793
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node102" (ID: 2)# 集群节点状态正常
[kingbase@node101 bin]$ ./repmgr cluster showID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------1 | node101 | standby | running | node102 | default | 100 | 6 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=32 | node102 | primary | * running | | default | 100 | 6 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
三、执行‘repmgr standby switchover’
[kingbase@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U esrep -d esrep
WARNING: following problems with command line parameters detected:database connection parameters not required when executing UNKNOWN ACTION
NOTICE: executing switchover on node "node101" (ID: 1)
INFO: The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=0
"
INFO: pausing repmgrd on node "node101" (ID 1)
INFO: pausing repmgrd on node "node102" (ID 2)
NOTICE: local node "node101" (ID: 1) will be promoted to primary; current primary "node102" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "node102" (ID: 2)
NOTICE: issuing CHECKPOINT
DETAIL: executing server command "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 1/D0000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node101" (ID: 1) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node101" (ID: 1) was successfully promoted to primary
NOTICE: issuing CHECKPOINT
INFO: local node 2 can attach to rejoin target node 1
DETAIL: local node's recovery point: 1/D0000028; rejoin target node's fork point: 1/D00000A0
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: begin to start server at 2022-08-09 10:46:36.382548
NOTICE: starting server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile start"
NOTICE: start server finish at 2022-08-09 10:46:36.488870
NOTICE: replication slot "repmgr_slot_1" deleted on node 2
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1
NOTICE: switchover was successful
DETAIL: node "node101" is now primary and node "node102" is attached as standby
INFO: unpausing repmgrd on node "node101" (ID 1)
INFO: unpause node "node101" (ID 1) successfully
INFO: unpausing repmgrd on node "node102" (ID 2)
INFO: unpause node "node102" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully# 集群节点状态信息
[kingbase@node101 bin]$ ./repmgr cluster showID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------1 | node101 | primary | * running | | default | 100 | 7 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=32 | node102 | standby | running | node101 | default | 100 | 7 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
=如上所示,switchover切换完成。=
KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障相关推荐
- KingbaseES V8R6集群运维案例---数据块故障自动修复(auto_bmr)
案例说明: 在Oracle11.2版本之后,DataGuard 若搭建实时应用日志的物理备库,那么在主库数据文件少 量坏块的情况下,可以利用ABCR技术快速修复坏块. Starting in Orac ...
- KingbaseES V8R6集群外部备份案例
案例说明: 本案例采用sys_backup.sh执行物理备份,备份使用如下逻辑架构:集群采用CentOS 7系统,repo采用kylin V10 Server. 一主一备+外部备份 此场景为主备双机常 ...
- rabbitmq基础5——集群节点类型、集群基础运维,集群管理命令,API接口工具
文章目录 一.集群节点类型 1.1 内存节点 1.2 磁盘节点 二.集群基础运维 2.1 剔除单个节点 2.1.1 集群正常踢出正常节点 2.1.2 服务器异常宕机踢出节点 2.1.3 集群正常重置并 ...
- 10 Kafka集群与运维
Kafka集群与运维 10.1 集群应用场景 10.1.1 消息传递 Kafka可以很好地替代传统邮件代理.消息代理的使用有多种原因(将处理与数据生产者分离,缓冲未处理的消息等).与大多数邮件系统相比 ...
- 4.2.5 Kafka集群与运维(集群的搭建、监控工具 Kafka Eagle)
Kafka集群与运维 文章目录 Kafka集群与运维 1.集群的搭建 1.1 搭建zookeeper集群 1.1.1 上传JDK到linux,安装并配置JDK 1.1.2. Linux 安装Zooke ...
- RocketMQ 集群平滑运维
前言 在 RocketMQ 集群的运维实践中,无论线上 Broker 节点启动和关闭,还是集群的扩缩容,都希望是平滑的,业务无感知.正所谓 "随风潜入夜,润物细无声" ,本文以实际 ...
- KingbaseES V8R6集群运维案例之---sys_rewind应用分析
案例说明: sys_rewind是用于在数据库cluster的时间线分叉以后,同步一个 KingbaseES 数据库cluster 和同一数据库cluster另一份拷贝的工具.一种典型的场景是在失效 ...
- KingbaseES V8R6 集群运维系列 -- 命令行部署repmgr管理集群+switchover测试
本次部署未使用securecmd/kbha工具,无需普通用户到root用户的互信. 一.环境准备 1.创建OS用户 建立系统数据库安装用户组及用户,在所有的节点执行. root用户登陆服务器,创建用户 ...
- 4.2.9 Kafka集群与运维, 应用场景, 集群搭建, 集群监控JMX(度量指标, JConsole, 编程获取, Kafka Eagle)
目录 3.1 集群应用场景 1 消息传递 2 网站活动路由 3 监控指标 4 日志汇总 5 流处理 6 活动采集 7 提交日志 总结 3.2 集群搭建 3.2.1 Zookeeper集群搭建 3.2. ...
最新文章
- 八皇后的一个回溯递归解法
- Java字符类型练习
- 安装debian总结以及编译linux内核
- java将数据封装为树结构_JAVA代码实现多级树结构封装对象
- SAP Cloud for Customer和SAP Fiori系统里的OData测试工具
- java break 在if 中使用_java中使用国密SM4算法详解
- 计算机辐射对人体影响吗,电脑屏幕辐射对人体的危害怎么解决?
- 深度模型压缩论文(02)- BlockSwap: Fisher-guided Block Substitution for Network Compression
- linux中printf命令,Linux中printf命令使用实例
- 测试老司机一起聊聊性能测试是怎么一回事?
- STM32工作笔记0078---UCOSIII任务挂起和恢复
- 瀑布流的布局(功能还没有完善)
- Gym102174 (The 14-th BIT Campus Programming Contest)
- ArcGIS加载Excel数据连接到数据库失败的解决办法
- C++二维vector初始化大小方法
- 【花雕动手做】有趣好玩的音乐可视化系列小项目(22)--LED无限魔方
- Django视图层模版层全面解析全网最细的教程
- Devexpress Xtrareport 创建主从报表
- 给定一个整数数组 nums和一个整数目标值 target,请你在该数组中找出 和为目标值 target 的那两个整数,并返回它们的数组下标。
- P4414 [COCI2006-2007#2] ABC
热门文章
- 末位淘汰!985高校硕士毕业拟新规:强制20%不通过或需大改?
- 综合案例:实现注册、登录和跳转到主页面的功能。
- 《人月神话》(The Mythical Man-Month)2人和月可以互换吗?人月神话存在吗?
- 电视直播(CCTV5)
- 无需编程,DIY自己智能小车的Android蓝牙遥控软件(二)
- 【保姆级】扫雷游戏的设计与实现【C语言】
- vscode常用插件 - Path Autocomplete
- 使用opencv实现通过摄像头自动输入阿里云身份宝验证码
- Spring 集成与分片详解
- 2019全国c语言二级考试题库,2019年全国计算机二级考试试题题库(附答案)【精选】.docx...