HG_REPMGR autofailvoer自动故障转移
目录
文档用途
详细信息
文档用途
HG_REPMGR自动故障转移配置参考
详细信息
配置集群自动故障转移(failover),需要为集群中的每个节点开启 repmgrd 守护进程。当主节点出现故障后,会自动将合适的备节点提升为新主节点,继
续对外提供服务。示例如下。
1. 配置 postgresql.replication.conf 文件(所有节点)
在上述 postgresql.replication.conf 的基础上,添加如下参数:
shared_preload_libraries = 'repmgr' 或者 alter system set shared_preload_libraries =pg_pathman,timescaledb,repmgr; |
重启数据库:
pg_ctl restart |
2. 配置 hg_repmgr.conf(所有节点)
在现有的 hg_repmgr.conf 文件中添加如下参数:
failover=automatic promote_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote' follow_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby follow --upstream-node-id=%n' |
如果需要将 repmgr 的日志定位到固定的日志文件可添加 log_file 参数,如 下:
log_file='/opt/highgo/5.6.1/conf/data/log/hg_repmgr.log' |
为了防止上述日志文件不断膨胀,可配置系统的 logrotate。(详细步骤略)
3. 开启 repmgrd 进程(所有节点)
repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid [highgo@dbrs conf]$ repmgrd -d -p /tmp/hg_repmgrd.pid [2019-05-06 14:02:42] [NOTICE] repmgrd (repmgrd 4.2) starting up [2019-05-06 14:02:42] [INFO] connecting to database "" [2019-05-06 14:02:43] [ERROR] repmgr extension not found on this node [2019-05-06 14:02:43] [DETAIL] repmgr extension is available but not installed in database "highgo" [2019-05-06 14:02:43] [HINT] check that this node is part of a repmgr cluster [highgo@dbrs conf]$ highgo=# \c You are now connected to database "highgo" as user "highgo". create extension repmgr; [highgo@dbrs conf]$ repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid [2019-05-06 14:21:21] [NOTICE] repmgrd (repmgrd 4.2) starting up [2019-05-06 14:21:21] [INFO] connecting to database "host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2" [highgo@dbrs conf]$ хϢ: set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid [2019-05-06 14:21:21] [NOTICE] starting monitoring of node "dbrs" (ID: 1) [2019-05-06 14:21:21] [NOTICE] monitoring cluster primary "dbrs" (node ID: 1) [highgo@dbrs2 conf]$ repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid [2019-05-06 14:21:50] [NOTICE] repmgrd (repmgrd 4.2) starting up [2019-05-06 14:21:50] [INFO] connecting to database "host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2" [highgo@dbrs2 conf]$ хϢ: set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid [2019-05-06 14:21:50] [NOTICE] starting monitoring of node "dbrs2" (ID: 2) [2019-05-06 14:21:50] [INFO] monitoring connection to upstream node "dbrs" (node ID: 1) [highgo@dbrs conf]$ ls -atl /tmp/hg_repmgrd.pid -rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid [highgo@dbrs conf]$ [highgo@dbrs2 conf]$ ls -atl /tmp/hg_repmgrd.pid -rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid [highgo@dbrs2 conf]$ |
提示:这个后台进程,每次重启服务器,都要手动启动吗?
开发回复:目前是,后期会修改为自动
查看集群状态
[highgo@dbrs conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+------------------------------------------------------------ 1 | dbrs | primary | * running | | default | host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2 2 | dbrs2 | standby | running | dbrs | default | host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2 [highgo@dbrs conf]$ |
模拟主节点故障
1)在 node1 上关闭数据库 pg_ctl stop 2)在 node2 上查看集群状态 [highgo@dbrs2 conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+------------------------------------------------------------ 1 | dbrs | primary | - failed | | default | host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2 2 | dbrs2 | primary | * running | | default | host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2 WARNING: following issues were detected - unable to connect to node "dbrs" (ID: 1) [highgo@dbrs2 conf]$ 此时 node2 已经提升为 primary 日志 [highgo@dbrs2 conf]$ [2019-05-06 14:24:14] [WARNING] unable to connect to upstream node "dbrs" (node ID: 1) [2019-05-06 14:24:14] [INFO] checking state of node 1, 1 of 6 attempts [2019-05-06 14:24:14] [INFO] sleeping 10 seconds until next reconnection attempt [2019-05-06 14:24:24] [INFO] checking state of node 1, 2 of 6 attempts [2019-05-06 14:24:24] [INFO] sleeping 10 seconds until next reconnection attempt [2019-05-06 14:24:34] [INFO] checking state of node 1, 3 of 6 attempts [2019-05-06 14:24:34] [INFO] sleeping 10 seconds until next reconnection attempt [2019-05-06 14:24:44] [INFO] checking state of node 1, 4 of 6 attempts [2019-05-06 14:24:44] [INFO] sleeping 10 seconds until next reconnection attempt [2019-05-06 14:24:54] [INFO] checking state of node 1, 5 of 6 attempts [2019-05-06 14:24:54] [INFO] sleeping 10 seconds until next reconnection attempt [highgo@dbrs2 conf]$ [2019-05-06 14:25:04] [INFO] checking state of node 1, 6 of 6 attempts [2019-05-06 14:25:04] [WARNING] unable to reconnect to node 1 after 6 attempts [2019-05-06 14:25:04] [NOTICE] this node is the only available candidate and will now promote itself [2019-05-06 14:25:04] [INFO] promote_command is: "repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote" NOTICE: promoting standby to primary DETAIL: promoting server "dbrs2" (ID: 2) using "/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data' promote" DETAIL: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful DETAIL: server "dbrs2" (ID: 2) was successfully promoted to primary [2019-05-06 14:25:10] [INFO] switching to primary monitoring mode [2019-05-06 14:25:10] [NOTICE] monitoring cluster primary "dbrs2" (node ID: 2) |
更多详细信息请登录【瀚高技术支持平台】 查看瀚高技术支持平台
HG_REPMGR autofailvoer自动故障转移相关推荐
- 故障转移集群无法连接到节点_Redis集群以及自动故障转移测试
在Redis中,与Sentinel(哨兵)实现的高可用相比,集群(cluster)更多的是强调数据的分片或者是节点的伸缩性,如果在集群的主节点上加入对应的从节点,集群还可以自动故障转移,因此相比Sen ...
- (2)MongoDB副本集自动故障转移原理(含客户端)
前文我们搭建MongoDB三成员副本集,了解集群基本特性,今天我们围绕下图聊一聊背后的细节. 默认搭建的副本集均在主节点读写,辅助节点冗余部署,形成高可用和备份,具备自动故障转移能力. 集群心跳保活 ...
- postgresql 重启记录_PostgreSQL 高可用:PostgreSQL复制和自动故障转移
原文:PostgreSQL Replication and Automatic Failover Tutorial[1] 作者:Abbas Butt 翻译整理:alitrack 1.什么是 Postg ...
- sql azure 语法_Azure SQL Server自动故障转移组
sql azure 语法 In this article, we will review how to set up auto-failover groups in Azure SQL Server ...
- MySQL 自动故障转移工具--mysqlfailover
mysqlfailover 是mysql utilities工具包中包含的一个重要的高可用命令,用于对主从复制架构进行健康检测以及实现故障自动转移.它会定期按指定的时间间隔探测各节点的健康状态,一旦在 ...
- [译]PG复制和自动故障转移--2
PostgreSQL 的预写日志 (WAL) 示例 1) SELECT datname, oid FROM pg_database WHERE datname = 'postgres'; datnam ...
- 【Hadoop 分布式部署 十 一: NameNode HA 自动故障转移】
问题描述: 上一篇就是NameNode 的HA 部署完成,但是存在问题,问题是如果 主NameNode的节点宕机了,还是需要人工去使用命令来切换NameNode的Acitve 这样很不方便,所以 ...
- mysql被跑死_MySQL 8.0.23中复制架构从节点自动故障转移
接触MGR有一段时间了,MySQL 8.0.23的到来,基于MySQL Group Replicaion(MGR)的高可用架构又提供了新的架构思路.灾备机房的slave,如何更好地支持主机房的MGR? ...
- keepalive配置mysql自动故障转移
本文先配置了一个双master环境,互为主从,然后通过Keepalive配置了一个虚拟IP,客户端通过虚拟IP连接master1,当master1宕机,自动切换到master2.一次只能连接其中一个m ...
最新文章
- 线程的生命周期其实没有我们想象的那么简单!!
- 决策树和基于决策树的模型构建
- Swift 扩展存储属性
- OpenGL学习(4)——纹理(补)
- C++实用技巧(一)
- WPF--ComboBox数据绑定
- Liunx 系统调优
- Spring Boot Serverless 实战系列“架构篇” 首发 | 光速入门函数计算
- 为什么吃红薯容易放屁
- linux db2表空间目录,db2 表空间的一些知识
- static 结构体_C++基础-static
- 干货 | 机器学习在web攻击检测中的应用实践
- 声卡调试精调效果都用那些宿主(DAW)机架和效果器插件
- 近世代数--正规子群--群、同态核、同态象的大小关系
- android自定义指针,Android实现HID鼠标的指针自定义
- 三维引擎导入obj模型全黑总结
- 【老生谈算法】matlabBOOST电路的设计与仿真——BOOST电路
- Java小案例--自助饮品消费系统
- 2021年职业病防治法宣传周宣传资料
- vs运行程序时报错:“XXX处有未经处理的异常: 0xC0000374: 堆已损坏”