HG_REPMGR autofailvoer自动故障转移

文档用途

详细信息

文档用途

HG_REPMGR自动故障转移配置参考

详细信息

配置集群自动故障转移（failover），需要为集群中的每个节点开启 repmgrd 守护进程。当主节点出现故障后，会自动将合适的备节点提升为新主节点，继

续对外提供服务。示例如下。

1. 配置 postgresql.replication.conf 文件（所有节点）

在上述 postgresql.replication.conf 的基础上，添加如下参数：

shared_preload_libraries = 'repmgr'

或者

alter system set shared_preload_libraries =pg_pathman,timescaledb,repmgr;

重启数据库：

pg_ctl restart

2. 配置 hg_repmgr.conf（所有节点）

在现有的 hg_repmgr.conf 文件中添加如下参数：

failover=automatic

promote_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote'

follow_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby follow --upstream-node-id=%n'

如果需要将 repmgr 的日志定位到固定的日志文件可添加 log_file 参数，如下：

log_file='/opt/highgo/5.6.1/conf/data/log/hg_repmgr.log'

为了防止上述日志文件不断膨胀，可配置系统的 logrotate。（详细步骤略）

3. 开启 repmgrd 进程（所有节点）

repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid

[highgo@dbrs conf]$ repmgrd -d -p /tmp/hg_repmgrd.pid

[2019-05-06 14:02:42] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:02:42] [INFO] connecting to database ""

[2019-05-06 14:02:43] [ERROR] repmgr extension not found on this node

[2019-05-06 14:02:43] [DETAIL] repmgr extension is available but not installed in database "highgo"

[2019-05-06 14:02:43] [HINT] check that this node is part of a repmgr cluster

[highgo@dbrs conf]$

highgo=# \c

You are now connected to database "highgo" as user "highgo".

create extension repmgr;

[highgo@dbrs conf]$ repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid

[2019-05-06 14:21:21] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:21:21] [INFO] connecting to database "host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2"

[highgo@dbrs conf]$ хϢ: set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid

[2019-05-06 14:21:21] [NOTICE] starting monitoring of node "dbrs" (ID: 1)

[2019-05-06 14:21:21] [NOTICE] monitoring cluster primary "dbrs" (node ID: 1)

[highgo@dbrs2 conf]$ repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid

[2019-05-06 14:21:50] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:21:50] [INFO] connecting to database "host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2"

[highgo@dbrs2 conf]$ хϢ: set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid

[2019-05-06 14:21:50] [NOTICE] starting monitoring of node "dbrs2" (ID: 2)

[2019-05-06 14:21:50] [INFO] monitoring connection to upstream node "dbrs" (node ID: 1)

[highgo@dbrs conf]$ ls -atl /tmp/hg_repmgrd.pid

-rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid

[highgo@dbrs conf]$

[highgo@dbrs2 conf]$ ls -atl /tmp/hg_repmgrd.pid

-rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid

[highgo@dbrs2 conf]$

提示：这个后台进程，每次重启服务器，都要手动启动吗？

开发回复：目前是，后期会修改为自动

查看集群状态

[highgo@dbrs conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show

----+-------+---------+-----------+----------+----------+------------------------------------------------------------

[highgo@dbrs conf]$

模拟主节点故障

1）在 node1 上关闭数据库

pg_ctl stop

2）在 node2 上查看集群状态

[highgo@dbrs2 conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show

----+-------+---------+-----------+----------+----------+------------------------------------------------------------

WARNING: following issues were detected

- unable to connect to node "dbrs" (ID: 1)

[highgo@dbrs2 conf]$

此时 node2 已经提升为 primary

日志

[highgo@dbrs2 conf]$ [2019-05-06 14:24:14] [WARNING] unable to connect to upstream node "dbrs" (node ID: 1)

[2019-05-06 14:24:14] [INFO] checking state of node 1, 1 of 6 attempts

[2019-05-06 14:24:14] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:24] [INFO] checking state of node 1, 2 of 6 attempts

[2019-05-06 14:24:24] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:34] [INFO] checking state of node 1, 3 of 6 attempts

[2019-05-06 14:24:34] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:44] [INFO] checking state of node 1, 4 of 6 attempts

[2019-05-06 14:24:44] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:54] [INFO] checking state of node 1, 5 of 6 attempts

[2019-05-06 14:24:54] [INFO] sleeping 10 seconds until next reconnection attempt

[highgo@dbrs2 conf]$ [2019-05-06 14:25:04] [INFO] checking state of node 1, 6 of 6 attempts

[2019-05-06 14:25:04] [WARNING] unable to reconnect to node 1 after 6 attempts

[2019-05-06 14:25:04] [NOTICE] this node is the only available candidate and will now promote itself

[2019-05-06 14:25:04] [INFO] promote_command is:

"repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote"

NOTICE: promoting standby to primary

DETAIL: promoting server "dbrs2" (ID: 2) using "/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data' promote"

DETAIL: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "dbrs2" (ID: 2) was successfully promoted to primary

[2019-05-06 14:25:10] [INFO] switching to primary monitoring mode

[2019-05-06 14:25:10] [NOTICE] monitoring cluster primary "dbrs2" (node ID: 2)

更多详细信息请登录【瀚高技术支持平台】查看瀚高技术支持平台

HG_REPMGR autofailvoer自动故障转移相关推荐

故障转移集群无法连接到节点_Redis集群以及自动故障转移测试
在Redis中,与Sentinel(哨兵)实现的高可用相比,集群(cluster)更多的是强调数据的分片或者是节点的伸缩性,如果在集群的主节点上加入对应的从节点,集群还可以自动故障转移,因此相比Sen ...
（2）MongoDB副本集自动故障转移原理（含客户端）
前文我们搭建MongoDB三成员副本集,了解集群基本特性,今天我们围绕下图聊一聊背后的细节. 默认搭建的副本集均在主节点读写,辅助节点冗余部署,形成高可用和备份,具备自动故障转移能力. 集群心跳保活 ...
postgresql 重启记录_PostgreSQL 高可用:PostgreSQL复制和自动故障转移
原文:PostgreSQL Replication and Automatic Failover Tutorial[1] 作者:Abbas Butt 翻译整理:alitrack 1.什么是 Postg ...
sql azure 语法_Azure SQL Server自动故障转移组
sql azure 语法 In this article, we will review how to set up auto-failover groups in Azure SQL Server ...
MySQL 自动故障转移工具--mysqlfailover
mysqlfailover 是mysql utilities工具包中包含的一个重要的高可用命令,用于对主从复制架构进行健康检测以及实现故障自动转移.它会定期按指定的时间间隔探测各节点的健康状态,一旦在 ...
[译]PG复制和自动故障转移--2
PostgreSQL 的预写日志 (WAL) 示例 1) SELECT datname, oid FROM pg_database WHERE datname = 'postgres'; datnam ...
【Hadoop 分布式部署十一: NameNode HA 自动故障转移】
问题描述: 上一篇就是NameNode 的HA 部署完成,但是存在问题,问题是如果主NameNode的节点宕机了,还是需要人工去使用命令来切换NameNode的Acitve 这样很不方便,所以 ...
mysql被跑死_MySQL 8.0.23中复制架构从节点自动故障转移
接触MGR有一段时间了,MySQL 8.0.23的到来,基于MySQL Group Replicaion(MGR)的高可用架构又提供了新的架构思路.灾备机房的slave,如何更好地支持主机房的MGR? ...
keepalive配置mysql自动故障转移
本文先配置了一个双master环境,互为主从,然后通过Keepalive配置了一个虚拟IP,客户端通过虚拟IP连接master1,当master1宕机,自动切换到master2.一次只能连接其中一个m ...

HG_REPMGR autofailvoer自动故障转移

HG_REPMGR autofailvoer自动故障转移相关推荐

最新文章

热门文章