mysqlfailover是mysql官方用python语言写的一款工具,包含在mysql utilities工具集中。主要作用是保障mysql高可用。他会定时检测节点状态,当master节点不可用时,会自动转移到从节点上,同时剩余的从节点都会指向转以后的节点。如何保证数据一致,在下面分析中会有说明。

mysqlfailover使用前提:

1、必须开启GTID模式,在GTID模式下,复制延迟已经减小到最低。用压测工具会有3秒左右的延迟。这取决于设置多少SQL线程。如果秒插1万,可以设置为16。

2、配置文件中必须添加:

report-host=

report-port=

master-info-repository=TABLE

relay-log-info-repository=TABLE

用于从节点可以被检测到。

3、权限:

必须要有with grant option权限。

安装也非常简单。

下载好mysql utilities工具集:https://downloads.mysql.com/archives/utilities/

unzip mysql-utilities-1.6.5.zip

cd mysql-utilities-1.6.5

python ./setup.py build

python ./setup.py install

到此为止安装完成。

使用:

mysqlfailover --master=failover:123456@'192.168.0.106':3306 --discover-slaves-login=failover:123456 --daemon=start --log=/data/failover.log

建立好主从服务。这里略。

检测事物是否完整转移:

这里使用sysbentch工具来进行批量插入。

sysbench --test=oltp --mysql-db=test --mysql-user=root --mysql-password=123456 --oltp-table-size=1000000000 --oltp-num-tables=15 prepare    批量插入

sysbench 0.4.12.10: multi-threaded system evaluation benchmark

No DB drivers specified, using mysql

Creating table 'sbtest1'...

Creating table 'sbtest5'...

Creating table 'sbtest4'...

Creating table 'sbtest8'...

Creating table 'sbtest9'...

Creating table 'sbtest6'...

Creating table 'sbtest2'...

Creating table 'sbtest'...

Creating table 'sbtest3'...

Creating table 'sbtest14'...

Creating table 'sbtest10'...

Creating table 'sbtest12'...

Creating table 'sbtest11'...

Creating table 'sbtest7'...

Creating table 'sbtest13'...

Creating 1000000000 records in table 'sbtest11'...

Creating 1000000000 records in table 'sbtest6'...

Creating 1000000000 records in table 'sbtest4'...

Creating 1000000000 records in table 'sbtest5'...

Creating 1000000000 records in table 'sbtest8'...

Creating 1000000000 records in table 'sbtest14'...

Creating 1000000000 records in table 'sbtest3'...

Creating 1000000000 records in table 'sbtest13'...

Creating 1000000000 records in table 'sbtest9'...

Creating 1000000000 records in table 'sbtest10'...

Creating 1000000000 records in table 'sbtest1'...

Creating 1000000000 records in table 'sbtest12'...

Creating 1000000000 records in table 'sbtest'...

Creating 1000000000 records in table 'sbtest7'...

Creating 1000000000 records in table 'sbtest2'...

等待几分钟后:

kill -9 17448

kill -9 18350

之后,该工具自动转移输出,可以看到已经转移到了丛机上:

Q-quit R-refresh H-health G-GTID Lists U-UUIDs

Failed to reconnect to the master after 3 attemps.

Failover starting in 'auto' mode...

# Checking eligibility of slave 192.168.0.109:3306 for candidate.

# GTID_MODE=ON ... Ok

# Replication user exists ... Ok

# Candidate slave 192.168.0.109:3306 will become the new master.

# Checking slaves status (before failover).

# Preparing candidate for failover.

WARNING: IP lookup by name failed for 44,reason: Unknown host

WARNING: IP lookup by address failed for 192.168.0.109,reason: Unknown host

WARNING: IP lookup by address failed for 192.168.0.112,reason: Unknown host

# Missing transactions found on 192.168.0.112:3306. SELECT gtid_subset() = 0

# LOCK STRING: FLUSH TABLES WITH READ LOCK

# Read only is ON for 192.168.0.112:3306.

# Connecting candidate to 192.168.0.112:3306 as a temporary slave to retrieve unprocessed GTIDs.

# Change master command for 192.168.0.109:3306

# CHANGE MASTER TO MASTER_HOST = '192.168.0.112', MASTER_USER = 'backup', MASTER_PASSWORD = '123456', MASTER_PORT = 3306, MASTER_AUTO_POSITION=1

# Read only is OFF for 192.168.0.112:3306.

# UNLOCK STRING: UNLOCK TABLES

# Waiting for candidate to catch up to slave 192.168.0.112:3306.

# Slave 192.168.0.109:3306:

# QUERY = SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('c142ca67-b898-11e8-86e8-000c29367e64:1', 300)

# Return Code = 3

# Slave 192.168.0.109:3306:

# QUERY = SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('c777e02f-b898-11e8-86a0-000c29c6f346:1-4', 300)

# Return Code = 0

# Creating replication user if it does not exist.

# Stopping slaves.

# Performing STOP on all slaves.

WARNING: IP lookup by name failed for 44,reason: Unknown host

WARNING: IP lookup by address failed for 192.168.0.109,reason: Unknown host

WARNING: IP lookup by address failed for 192.168.0.112,reason: Unknown host

# Executing stop on slave 192.168.0.109:3306 WARN - slave is not configured with this master

# Executing stop on slave 192.168.0.109:3306 Ok

WARNING: IP lookup by address failed for 192.168.0.106,reason: Unknown host

# Executing stop on slave 192.168.0.112:3306 WARN - slave is not configured with this master

# Executing stop on slave 192.168.0.112:3306 Ok

WARNING: IP lookup by name failed for 44,reason: Unknown host

WARNING: IP lookup by address failed for 192.168.0.109,reason: Unknown host

# Switching slaves to new master.

# Change master command for 192.168.0.112:3306

# CHANGE MASTER TO MASTER_HOST = '192.168.0.109', MASTER_USER = 'backup', MASTER_PASSWORD = '123456', MASTER_PORT = 3306, MASTER_AUTO_POSITION=1

# Disconnecting new master as slave.

# Execute on 192.168.0.109:3306: RESET SLAVE ALL

# Starting slaves.

# Performing START on all slaves.

# Executing start on slave 192.168.0.112:3306 Ok

# Checking slaves for errors.

# 192.168.0.112:3306 status: Ok

# Failover complete.

# Discovering slaves for master at 192.168.0.109:3306

Failover console will restart in 5 seconds.

# Attempting to contact 192.168.0.109 ... Success

# Attempting to contact 192.168.0.112 ... Success

MySQL Replication Failover Utility

Failover Mode = auto Next Interval = Sat Sep 15 14:15:30 2018

Master Information

------------------

Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB

mysql-bin.000001 657

GTID Executed Set

b5c5054c-b898-11e8-8670-000c299e1daf:1 [...]

# Attempting to contact 192.168.0.109 ... Success

# Attempting to contact 192.168.0.112 ... Success

Replication Health Status

+----------------+-------+---------+--------+------------+---------+-------------+-------------------+-----------------+------------+-------------+--------------+------------------+---------------+-----------+----------------+------------+---------------+

| host | port | role | state | gtid_mode | health | version | master_log_file | master_log_pos | IO_Thread | SQL_Thread | Secs_Behind | Remaining_Delay | IO_Error_Num | IO_Error | SQL_Error_Num | SQL_Error | Trans_Behind |

+----------------+-------+---------+--------+------------+---------+-------------+-------------------+-----------------+------------+-------------+--------------+------------------+---------------+-----------+----------------+------------+---------------+

| 192.168.0.109 | 3306 | MASTER | UP | ON | OK | 5.7.22-log | mysql-bin.000001 | 657 | | | | | | | | | |

| 192.168.0.112 | 3306 | SLAVE | UP | ON | OK | 5.7.22-log | mysql-bin.000001 | 657 | Yes | Yes | 0 | No | 0 | | 0 | | 0 |

+----------------+-------+---------+--------+------------+---------+-------------+-------------------+-----------------+------------+-------------+--------------+------------------+---------------+-----------+----------------+------------+---------------+

分析:

当程序检测到master服务停止后:

1、检查指定的候选服务器是否正常,检查GTID模式是否开启

2、锁表,防止事物提交带来的数据不一致问题。

3、如果开启了read_only模式,则会自动将其关闭,并且先change master to到另一台从机上以保证数据一致

4、解锁表,保证候选服务器和另一台从机的事物一致

5、检测候选服务器的事物号,然后停止全部从机:stop slave;

6、切换到新master,也就是候选服务器,将所有从机指向候选服务器。断开与原master的连接,执行reset slave语句

7、在从机开启start slave,开始复制,这时从机都已经指向了新master。故障转移完成。

现在在主机上输出二进制日志,看最后一次插入是哪个事物:

mysqlbinlog --base64-output=decode-rows -v mysql-bin.000005 > ~/bin.log

vim ~/bin.log

截取最后一部分:

### INSERT INTO `test`.`sbtest8`

### SET

### @1=289999

### @2=0

### @3=''

### @4='qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt'

### INSERT INTO `test`.`sbtest8`

### SET

### @1=290000

### @2=0

### @3=''

### @4='qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt'

# at 265373582

#180901 15:41:10 server id 1 end_log_pos 265373613 CRC32 0xa53bca62 Xid = 7014

COMMIT/*!*/;

SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;

DELIMITER ;

# End of log file

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

可以看到在主库上最后一次插入的操作是在test库下的sbtest8表,第一列值为290000,也就是id列。

现在切换到从库上进入sbtest8这张表,看看这条事物是否已经复制到了从库:

mysql> use test

Database changed

mysql> select * from sbtest8 where id = '290000';

+--------+---+---+----------------------------------------------------+

| id | k | c | pad |

+--------+---+---+----------------------------------------------------+

| 290000 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |

+--------+---+---+----------------------------------------------------+

1 row in set (0.00 sec)

可以看到已经有了数据,看看是不是最后一条,从库是否回滚了未提交的事物:

mysql> select * from sbtest8 where id = '290001';

Empty set (0.00 sec)

mysql> select * from sbtest8 order by id desc limit 1;

+--------+---+---+----------------------------------------------------+

| id | k | c | pad |

+--------+---+---+----------------------------------------------------+

| 290000 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt |

+--------+---+---+----------------------------------------------------+

1 row in set (0.00 sec)

mysql> \q

Bye

可以看出id=290000确实是最后一条事物,如果有未提交的事物可能已经回滚,证明主库与复制到从库的事物不会丢失。

最后,可以使用mysqldiff工具来检查主从之间的不一致:

[root@node2 data]# mysqldiff --server1=failover:123456@192.168.0.109:3306 --server2=failover:123456@192.168.0.112:3306 --difftype=sql test:test

# WARNING: Using a password on the command line interface can be insecure.

# server1 on 192.168.0.109: ... connected.

# server2 on 192.168.0.112: ... connected.

# Comparing `test` to `test` [PASS]

# Comparing `test`.`sbtest` to `test`.`sbtest` [PASS]

# Comparing `test`.`sbtest1` to `test`.`sbtest1` [PASS]

# Comparing `test`.`sbtest10` to `test`.`sbtest10` [PASS]

# Comparing `test`.`sbtest11` to `test`.`sbtest11` [PASS]

# Comparing `test`.`sbtest12` to `test`.`sbtest12` [PASS]

# Comparing `test`.`sbtest13` to `test`.`sbtest13` [PASS]

# Comparing `test`.`sbtest14` to `test`.`sbtest14` [PASS]

# Comparing `test`.`sbtest2` to `test`.`sbtest2` [PASS]

# Comparing `test`.`sbtest3` to `test`.`sbtest3` [PASS]

# Comparing `test`.`sbtest4` to `test`.`sbtest4` [PASS]

# Comparing `test`.`sbtest5` to `test`.`sbtest5` [PASS]

# Comparing `test`.`sbtest6` to `test`.`sbtest6` [PASS]

# Comparing `test`.`sbtest7` to `test`.`sbtest7` [PASS]

# Comparing `test`.`sbtest8` to `test`.`sbtest8` [PASS]

# Comparing `test`.`sbtest9` to `test`.`sbtest9` [PASS]

# Success. All objects are the same.

说明在延迟的情况下,事物并没有丢失。

注意:

mysqlfailover程序适合于只做纯粹的单点写入复制架构。

不适合于从机当测试库或从机做审计做其他服务器等操作。必须要严格保证所有从库没有任何的写入。

在使用MySQLfailover时,最好在所有从库开启read_only参数,以保证数据一致性。

在多从拓补中,如果master挂掉后,要将master再重新加入到原来的拓补中,并且还是将旧master设置为主。server1为旧master,server2为故障转移后的master。

1、停止mysqlfailover故障转移工具。并且启动旧master实例。server1

2、将旧master服务器设置为现在的master的从服务器,用以检查事物完整性和二进制日志完整性:

mysqlreplicate --master=failover:123456@192.168.88.196:3307 --slave=failover:123456@192.168.88.194:3307 --rpl-user=backup:123456

3、用mysqlrpladmin 工具将旧master设置为整个拓补的新主:

mysqlrpladmin --master=failover:123456@192.168.88.196:3307 --new-master=failover:123456@192.168.88.194:3307 --discover-slaves-login=failover:123456 --demote-master switchover

4、恢复mysqlfailover工具启动,这里要使用--force选项来启动。

未经允许,谢绝转载

failover.mysql_mysqlfailover测试相关推荐

  1. 金仓数据库KingbaseES V8R3集群删除test库主备切换测试案例

    案例说明 在KingbaseES R3集群中,kingbasecluster进程会通过test库访问,连接后台数据库服务测试:如果删除test数据库,导致后台数据库服务访问失败,在集群主备切换时,无法 ...

  2. 测试52讲学习总结之测试基础篇

    测试基础篇 一.测试文档 1. 软件缺陷报告 要求: 把发现的缺陷准确无歧义地表达清楚,不易过长 "准确无歧义地表达"意味着,开发工程师可以根据缺陷报告快速理解缺陷,并精确定位问题 ...

  3. Redis Sentinel配置小记

    Sentinel是一个管理多个redis实例的工具,它可以实现对redis的监控.通知.自动故障转移.sentinel不断的检测redis实例是否可以正常工作,通过API向其他程序报告redis的状态 ...

  4. 分布式任务调度平台XXL-JOB测试报告

    分布式任务调度平台XXL-JOB XXL-JOB是一个轻量级分布式任务调度框架,其核心设计目标是开发迅速.学习简单.轻量级.易扩展.现已开放源代码并接入多家公司线上产品线,开箱即用. 在调研xxl-j ...

  5. 如何制定软件项目测试计划

    如何制定软件项目测试计划 摘要 随着测试走向规范化管理,测试计划成为测试经理必须完成的重要任务之一,本文根据实践经验结合理论,探讨如何制定软件项目测试计划. 关键字 测试计划 变更 正文 软件测试计划 ...

  6. Redis 官方文档阅读之 High Availability

    文章目录 Distributed nature of Sentinel(sentinel的分布式特性) Quick Start Obtaining Sentinel(获取哨兵) Running Sen ...

  7. Oracle 11gR2 DG部署(RMAN方式)

    Oracle DG部署(RMAN方式) 文章目录 Oracle DG部署(RMAN方式) 1.环境介绍 2.DG部署 2.1.建立主库orcl 2.2.主库开启归档 2.3.主库添加Standby R ...

  8. PostgreSQL HA集群高可用方案介绍 pgpool-II+PostgreSQL HA方案部署

    PostgreSQL HA集群高可用方案介绍 & pgpool-II+PostgreSQL HA方案部署 一.PostgreSQL HA集群高可用方案介绍 二.pgpool-II+Postgr ...

  9. Oracle19c PDB级别Failover 出错场景测试

    首先抛出观点,对于客户端连接,也就是应用程序使用的连接,不要用默认的服务. 原文出处为Real Application Clusters Administration and Deployment G ...

最新文章

  1. Linux 常用的压缩与解压缩命令详解
  2. 不用写语句的轻量级orm_为什么说sqltoy-orm远比mybatis强大
  3. 【错误记录】Flutter 环境安装相关问题 ( 执行 flutter doctor 命令后续错误处理 )
  4. 区块链BaaS云服务(29) 溪塔科技 CITA-Cloud
  5. JSTL (标准标签库)
  6. 机械动作时序图怎么画_人物动作怎么画?动漫绘画人物姿势的基本画法!
  7. Linux中使用netstat命令的基本操作,排查端口号的占用情况
  8. Python脚本做接口测试,抛弃接口测试工具是否可行?(二)
  9. 干货!这可能是最全的IntelliJ IDEA For Mac快捷键说明,建议收藏!
  10. 黑马博客——详细步骤(十一)博客系统的前台展示页面
  11. linux下运行jar
  12. Web服务器和http请求
  13. python--我的大花莽【turtle画】
  14. 【MES】工业4.0之MES系统方案
  15. Python数据可视化——散点图
  16. 静静的活不埋怨也不嘲笑
  17. laya 怎么生成签名_手写签名在线生成器一笔签-手写签名在线生成器微信
  18. 史上最全解析——柯林斯大猫英语分级(附1~11级绘本及音频资源)
  19. 蒟蒻のpython(做个飞船大战小游戏)
  20. 牛顿插值法与拉格朗日插值法——实例

热门文章

  1. leetcode 284. Peeking Iterator | 284. 顶端迭代器(给 iterator 添加 peek 方法)
  2. Leet Code OJ 338. Counting Bits [Difficulty: Medium]
  3. panda 透视表 计算比例_用案例教你玩Excel-《第2例:给领导汇报考评结果-拖拽的透视表》...
  4. python利用列表计算斐波那契数列前30项_python斐波那契数列的计算方法
  5. python做一个考试系统_1218Python基于Django在线考试系统设计
  6. ThreadLocal用法详解和原理
  7. 安卓学习 之 网络技术(十)
  8. Socket编程:必须要了解的网络字节序和转换函数
  9. 大数据图数据库之MapReduce用于图计算
  10. ValueError: XPath error: Invalid expression in //*[@id=‘info‘]/div/p[1]/test()_Python系列学习笔记