目录

文档用途

详细信息

文档用途

HG_REPMGR自动故障转移配置参考

详细信息

配置集群自动故障转移(failover),需要为集群中的每个节点开启 repmgrd 守护进程。当主节点出现故障后,会自动将合适的备节点提升为新主节点,继

续对外提供服务。示例如下。

1. 配置 postgresql.replication.conf 文件(所有节点)

在上述 postgresql.replication.conf 的基础上,添加如下参数:

shared_preload_libraries = 'repmgr'

或者

alter system set shared_preload_libraries =pg_pathman,timescaledb,repmgr;

重启数据库:

pg_ctl restart

2. 配置 hg_repmgr.conf(所有节点)

在现有的 hg_repmgr.conf 文件中添加如下参数:

failover=automatic

promote_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote'

follow_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby follow --upstream-node-id=%n'

如果需要将 repmgr 的日志定位到固定的日志文件可添加 log_file 参数,如 下:

log_file='/opt/highgo/5.6.1/conf/data/log/hg_repmgr.log'

为了防止上述日志文件不断膨胀,可配置系统的 logrotate。(详细步骤略)

3. 开启 repmgrd 进程(所有节点)

repmgrd  -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d  -p /tmp/hg_repmgrd.pid

[highgo@dbrs conf]$ repmgrd  -d  -p /tmp/hg_repmgrd.pid

[2019-05-06 14:02:42] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:02:42] [INFO] connecting to database ""

[2019-05-06 14:02:43] [ERROR] repmgr extension not found on this node

[2019-05-06 14:02:43] [DETAIL] repmgr extension is available but not installed in database "highgo"

[2019-05-06 14:02:43] [HINT] check that this node is part of a repmgr cluster

[highgo@dbrs conf]$

highgo=# \c

You are now connected to database "highgo" as user "highgo".

create extension repmgr;

[highgo@dbrs conf]$ repmgrd  -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d  -p /tmp/hg_repmgrd.pid

[2019-05-06 14:21:21] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:21:21] [INFO] connecting to database "host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2"

[highgo@dbrs conf]$ хϢ:  set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid

[2019-05-06 14:21:21] [NOTICE] starting monitoring of node "dbrs" (ID: 1)

[2019-05-06 14:21:21] [NOTICE] monitoring cluster primary "dbrs" (node ID: 1)

[highgo@dbrs2 conf]$ repmgrd  -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d  -p /tmp/hg_repmgrd.pid

[2019-05-06 14:21:50] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:21:50] [INFO] connecting to database "host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2"

[highgo@dbrs2 conf]$ хϢ:  set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid

[2019-05-06 14:21:50] [NOTICE] starting monitoring of node "dbrs2" (ID: 2)

[2019-05-06 14:21:50] [INFO] monitoring connection to upstream node "dbrs" (node ID: 1)

[highgo@dbrs conf]$ ls -atl /tmp/hg_repmgrd.pid

-rw-rw-r--. 1 highgo highgo 5 May  6 14:21 /tmp/hg_repmgrd.pid

[highgo@dbrs conf]$

[highgo@dbrs2 conf]$ ls -atl /tmp/hg_repmgrd.pid

-rw-rw-r--. 1 highgo highgo 5 May  6 14:21 /tmp/hg_repmgrd.pid

[highgo@dbrs2 conf]$

提示:这个后台进程,每次重启服务器,都要手动启动吗?

开发回复:目前是,后期会修改为自动

查看集群状态

[highgo@dbrs conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Connection string

----+-------+---------+-----------+----------+----------+------------------------------------------------------------

1  | dbrs  | primary | * running |          | default  | host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2

2  | dbrs2 | standby |   running | dbrs     | default  | host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2

[highgo@dbrs conf]$

模拟主节点故障

1)在 node1 上关闭数据库

pg_ctl stop

2)在 node2 上查看集群状态

[highgo@dbrs2 conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Connection string

----+-------+---------+-----------+----------+----------+------------------------------------------------------------

1  | dbrs  | primary | - failed  |          | default  | host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2

2  | dbrs2 | primary | * running |          | default  | host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2

WARNING: following issues were detected

- unable to connect to node "dbrs" (ID: 1)

[highgo@dbrs2 conf]$

此时 node2 已经提升为 primary

日志

[highgo@dbrs2 conf]$ [2019-05-06 14:24:14] [WARNING] unable to connect to upstream node "dbrs" (node ID: 1)

[2019-05-06 14:24:14] [INFO] checking state of node 1, 1 of 6 attempts

[2019-05-06 14:24:14] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:24] [INFO] checking state of node 1, 2 of 6 attempts

[2019-05-06 14:24:24] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:34] [INFO] checking state of node 1, 3 of 6 attempts

[2019-05-06 14:24:34] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:44] [INFO] checking state of node 1, 4 of 6 attempts

[2019-05-06 14:24:44] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:54] [INFO] checking state of node 1, 5 of 6 attempts

[2019-05-06 14:24:54] [INFO] sleeping 10 seconds until next reconnection attempt

[highgo@dbrs2 conf]$ [2019-05-06 14:25:04] [INFO] checking state of node 1, 6 of 6 attempts

[2019-05-06 14:25:04] [WARNING] unable to reconnect to node 1 after 6 attempts

[2019-05-06 14:25:04] [NOTICE] this node is the only available candidate and will now promote itself

[2019-05-06 14:25:04] [INFO] promote_command is:

"repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote"

NOTICE: promoting standby to primary

DETAIL: promoting server "dbrs2" (ID: 2) using "/opt/highgo/5.6.1/bin/pg_ctl  -w -D '/opt/highgo/5.6.1/data' promote"

DETAIL: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "dbrs2" (ID: 2) was successfully promoted to primary

[2019-05-06 14:25:10] [INFO] switching to primary monitoring mode

[2019-05-06 14:25:10] [NOTICE] monitoring cluster primary "dbrs2" (node ID: 2)

更多详细信息请登录【瀚高技术支持平台】 查看瀚高技术支持平台

HG_REPMGR autofailvoer自动故障转移相关推荐

  1. 故障转移集群无法连接到节点_Redis集群以及自动故障转移测试

    在Redis中,与Sentinel(哨兵)实现的高可用相比,集群(cluster)更多的是强调数据的分片或者是节点的伸缩性,如果在集群的主节点上加入对应的从节点,集群还可以自动故障转移,因此相比Sen ...

  2. (2)MongoDB副本集自动故障转移原理(含客户端)

    前文我们搭建MongoDB三成员副本集,了解集群基本特性,今天我们围绕下图聊一聊背后的细节. 默认搭建的副本集均在主节点读写,辅助节点冗余部署,形成高可用和备份,具备自动故障转移能力. 集群心跳保活 ...

  3. postgresql 重启记录_PostgreSQL 高可用:PostgreSQL复制和自动故障转移

    原文:PostgreSQL Replication and Automatic Failover Tutorial[1] 作者:Abbas Butt 翻译整理:alitrack 1.什么是 Postg ...

  4. sql azure 语法_Azure SQL Server自动故障转移组

    sql azure 语法 In this article, we will review how to set up auto-failover groups in Azure SQL Server ...

  5. MySQL 自动故障转移工具--mysqlfailover

    mysqlfailover 是mysql utilities工具包中包含的一个重要的高可用命令,用于对主从复制架构进行健康检测以及实现故障自动转移.它会定期按指定的时间间隔探测各节点的健康状态,一旦在 ...

  6. [译]PG复制和自动故障转移--2

    PostgreSQL 的预写日志 (WAL) 示例 1) SELECT datname, oid FROM pg_database WHERE datname = 'postgres'; datnam ...

  7. 【Hadoop 分布式部署 十 一: NameNode HA 自动故障转移】

    问题描述:    上一篇就是NameNode 的HA 部署完成,但是存在问题,问题是如果 主NameNode的节点宕机了,还是需要人工去使用命令来切换NameNode的Acitve 这样很不方便,所以 ...

  8. mysql被跑死_MySQL 8.0.23中复制架构从节点自动故障转移

    接触MGR有一段时间了,MySQL 8.0.23的到来,基于MySQL Group Replicaion(MGR)的高可用架构又提供了新的架构思路.灾备机房的slave,如何更好地支持主机房的MGR? ...

  9. keepalive配置mysql自动故障转移

    本文先配置了一个双master环境,互为主从,然后通过Keepalive配置了一个虚拟IP,客户端通过虚拟IP连接master1,当master1宕机,自动切换到master2.一次只能连接其中一个m ...

最新文章

  1. 线程的生命周期其实没有我们想象的那么简单!!
  2. 决策树和基于决策树的模型构建
  3. Swift 扩展存储属性
  4. OpenGL学习(4)——纹理(补)
  5. C++实用技巧(一)
  6. WPF--ComboBox数据绑定
  7. Liunx 系统调优
  8. Spring Boot Serverless 实战系列“架构篇” 首发 | 光速入门函数计算
  9. 为什么吃红薯容易放屁
  10. linux db2表空间目录,db2 表空间的一些知识
  11. static 结构体_C++基础-static
  12. 干货 | 机器学习在web攻击检测中的应用实践
  13. 声卡调试精调效果都用那些宿主(DAW)机架和效果器插件
  14. 近世代数--正规子群--群、同态核、同态象的大小关系
  15. android自定义指针,Android实现HID鼠标的指针自定义
  16. 三维引擎导入obj模型全黑总结
  17. 【老生谈算法】matlabBOOST电路的设计与仿真——BOOST电路
  18. Java小案例--自助饮品消费系统
  19. 2021年职业病防治法宣传周宣传资料
  20. vs运行程序时报错:“XXX处有未经处理的异常: 0xC0000374: 堆已损坏”

热门文章

  1. 解决阿里云服务器被恶意挖矿问题
  2. unity中Game视图中实现和Scene视图中一样的摄像机操作
  3. Mac 终端——常用命令语
  4. 使用笔记本的不良习惯
  5. Windows安全描述符SECURITY_DESCRIPTOR阅读注释
  6. 在网页项目中集成扫码枪设备,实现二维码扫码识别实战
  7. 35岁的程序员:第24章,兼职项目
  8. oracle如何打开控制文件,看一看oracle控制文件里面的内容
  9. 几种抽奖方式之轮盘抽奖
  10. 网站色彩设计与搭配技术(下)