MHA Failover测试-上

TL;DR

用例	ping_type=CONNECT	ping_type=INSERT
master too many connection	不会触发failover	不会触发failover
master hang	不会触发failover	会触发failover且成功
仅manager无法连通master	不会触发failover	不会触发failover
manager无法连通master, 且无法ssh slave1	不会触发failover	不会触发failover
manager无法连通master, 且无法ssh slave1和slave2	不会触发failover	不会触发failover
manager无法连通master, ssh到slave1后无法连通master	不会触发failover	不会触发failover
manager无法连通master, ssh到slave1和slave2后均无法连通master	会触发failover且成功	会触发failover且成功(长连接断开后才会)
master宕机前slave1也宕机了	会触发failover, 但failover失败	会触发failover, 但failover失败
master挂了, 在此之前slave-1 io_thread stop了	会failover且成功	会failover且成功
master挂了, 在此之前slave-1 io_thread error了	会failover且成功	会failover且成功
master挂了, 在此之前slave-1 sql_thread stop了	会failover且成功	会failover且成功
master挂了, 在此之前slave-1 sql_thread error了	会触发failover, 但failover失败	会触发failover, 但failover失败

环境信息

master: 172.16.120.10 centos-1 主 + proxysql
slave1: 172.16.120.11 centos-2 从 + proxysql
slave2: 172.16.120.12 centos-3 从 + proxysql
172.16.120.13 centos-4 mha manager

MHA配置

#cat /etc/masterha/conf/masterha_default.cnf
[server default]
# mysql user and password，此处的密码不能加引号
user=mha
password=xxxx#replication_user
repl_user=repler
repl_password=xxxx#checking master every 3 second
ping_interval=3# 使用短连接检测，默认是长连接
ping_type=INSERT
#ping_type=CONNECT
#下面会测试两种type#ssh user
ssh_user=root#发送邮件脚本
report_script=/etc/masterha/scripts/send_report# 节点工作目录
remote_workdir=/masterha/#cat /etc/masterha/conf/cls_new.cnf
[server default]
#workdir on the management server
manager_workdir=/masterha/cls_new/
manager_log=/masterha/cls_new/manager.log#workdir on the node for mysql server
master_binlog_dir=/data/mysql_3358/data/#自动故障VIP切换调用脚本
master_ip_failover_script=/etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128#手动故障切换调用脚本
master_ip_online_change_script=/etc/masterha/scripts/master_ip_online_change_vip --vip=172.16.120.128#检测master的可用性
secondary_check_script=masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12[server1]
hostname=172.16.120.10
port=3358
candidate_master=1[server2]
hostname=172.16.120.11
port=3358
candidate_master=1[server3]
hostname=172.16.120.12
port=3358
candidate_master=1

[用例测试] master too many connection

ping_type=CONNECT

root@localhost 11:43:29 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time   | State                                                         | Info             | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |      7 |                                                               | NULL             |         0 |             0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL               | Sleep            |      4 |                                                               | NULL             |         0 |             0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL               | Sleep            |      2 |                                                               | NULL             |         0 |             0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL               | Sleep            |     14 |                                                               | NULL             |         0 |             0 |
| 1256 | repler   | 172.16.120.11:59594 | NULL               | Binlog Dump GTID | 952922 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 1257 | repler   | 172.16.120.12:56540 | NULL               | Binlog Dump GTID | 952902 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |      2 |                                                               | NULL             |         1 |             0 |
| 1341 | mha      | 172.16.120.12:56698 | information_schema | Sleep            |    120 |                                                               | NULL             |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |     58 |                                                               | NULL             |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |     17 |                                                               | NULL             |         1 |             0 |
| 1943 | root     | localhost           | dbms_monitor       | Query            |      0 | starting                                                      | show processlist |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
11 rows in set (0.00 sec)root@localhost 11:43:30 [dbms_monitor]> show global variables like '%max_connec%';
+-----------------------+---------+
| Variable_name         | Value   |
+-----------------------+---------+
| extra_max_connections | 1       |
| max_connect_errors    | 1000000 |
| max_connections       | 1024    |
+-----------------------+---------+
3 rows in set (0.01 sec)root@localhost 11:49:34 [dbms_monitor]> set global max_connections=5;
Query OK, 0 rows affected (0.01 sec);

结论: 不会failover

Fri Oct  9 11:42:57 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 11:42:57 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 11:42:57 2020 - [info]  OK.
Fri Oct  9 11:42:57 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 11:42:57 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 11:42:57 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 11:42:57 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 11:42:57 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 11:49:51 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Too many connections at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
1040 (Too many connections)
Fri Oct  9 11:49:51 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 11:49:51 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 11:49:51 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Master is reachable from 172.16.120.11!
Fri Oct  9 11:49:52 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 11:49:54 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct  9 11:49:54 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct  9 11:49:57 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct  9 11:49:57 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct  9 11:50:00 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct  9 11:50:00 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..

ping_type=INSERT

root@localhost 11:55:13 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time   | State                                                         | Info             | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |      1 |                                                               | NULL             |         0 |             0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL               | Sleep            |     18 |                                                               | NULL             |         1 |             0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL               | Sleep            |     16 |                                                               | NULL             |         0 |             0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL               | Sleep            |      8 |                                                               | NULL             |         0 |             0 |
| 1256 | repler   | 172.16.120.11:59594 | NULL               | Binlog Dump GTID | 953626 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 1257 | repler   | 172.16.120.12:56540 | NULL               | Binlog Dump GTID | 953606 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |      6 |                                                               | NULL             |         1 |             0 |
| 1341 | mha      | 172.16.120.12:56698 | information_schema | Sleep            |    103 |                                                               | NULL             |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |     41 |                                                               | NULL             |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |      1 |                                                               | NULL             |         1 |             0 |
| 1943 | root     | localhost           | dbms_monitor       | Query            |      0 | starting                                                      | show processlist |         0 |             0 |
| 2160 | mha      | 172.16.120.13:34660 | NULL               | Sleep            |      2 |                                                               | NULL             |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)root@localhost 11:55:14 [dbms_monitor]> show global variables like '%max_connec%';
+-----------------------+---------+
| Variable_name         | Value   |
+-----------------------+---------+
| extra_max_connections | 1       |
| max_connect_errors    | 1000000 |
| max_connections       | 1024    |
+-----------------------+---------+
3 rows in set (0.04 sec)root@localhost 11:55:19 [dbms_monitor]> set global max_connections=5;
Query OK, 0 rows affected (0.00 sec)root@localhost 11:55:25 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time   | State                                                         | Info             | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |      6 |                                                               | NULL             |         0 |             0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL               | Sleep            |      3 |                                                               | NULL             |         0 |             0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL               | Sleep            |     31 |                                                               | NULL             |         0 |             0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL               | Sleep            |      3 |                                                               | NULL             |         1 |             0 |
| 1256 | repler   | 172.16.120.11:59594 | NULL               | Binlog Dump GTID | 953641 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 1257 | repler   | 172.16.120.12:56540 | NULL               | Binlog Dump GTID | 953621 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |      0 |                                                               | NULL             |         0 |             0 |
| 1341 | mha      | 172.16.120.12:56698 | information_schema | Sleep            |    118 |                                                               | NULL             |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |     56 |                                                               | NULL             |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |      6 |                                                               | NULL             |         1 |             0 |
| 1943 | root     | localhost           | dbms_monitor       | Query            |      0 | starting                                                      | show processlist |         0 |             0 |
| 2160 | mha      | 172.16.120.13:34660 | NULL               | Sleep            |      2 |                                                               | NULL             |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)

ping_type=INSERT是长连接, 不会感知too many connection.

手动kill掉mha连接

root@localhost 11:55:29 [dbms_monitor]> kill 2160;
Query OK, 0 rows affected (0.01 sec)

结论: 不会failover

Fri Oct  9 11:54:48 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 11:54:48 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 11:54:48 2020 - [info]  OK.
Fri Oct  9 11:54:48 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 11:54:48 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 11:54:48 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 11:54:48 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 11:54:48 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 11:56:42 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct  9 11:56:42 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 11:56:42 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 11:56:43 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
ERROR 1040 (HY000): Too many connections
Monitoring server 172.16.120.11 is reachable, Master is not writable from 172.16.120.11. OK.
ERROR 1040 (HY000): Too many connections
Monitoring server 172.16.120.12 is reachable, Master is not writable from 172.16.120.12. OK.
Fri Oct  9 11:56:43 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Oct  9 11:56:45 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct  9 11:56:45 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct  9 11:56:48 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct  9 11:56:48 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct  9 11:56:51 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct  9 11:56:51 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct  9 11:56:54 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct  9 11:56:54 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct  9 11:56:57 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct  9 11:56:57 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct  9 11:57:00 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)

[用例测试] master hang

ping_type=CONNECT

master hang不好模拟, 这里间接模拟. 需要将ping_select的执行的select 1改为select innodb_table查询一个innodb表

sub ping_select($) {my $self = shift;my $log  = $self->{logger};my $dbh  = $self->{dbh};my ( $query, $sth, $href );eval {$dbh->{RaiseError} = 1;#$sth = $dbh->prepare("SELECT 1 As Value");$sth = $dbh->prepare("SELECT 1 As Value from infra.chk_masterha limit 1");

然后修改innodb_thread_concurrency值

root@localhost 12:25:34 [dbms_monitor]> set global innodb_thread_concurrency=1;
Query OK, 0 rows affected (0.00 sec)

手动执行一个查询, 查询innodb表, 这样mha的select会被阻塞

root@localhost 12:25:45 [dbms_monitor]> select sleep(600) from infra.chk_masterha limit 1;root@localhost 12:29:09 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time   | State                                                         | Info                                              | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |     16 |                                                               | NULL                                              |         1 |             0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL               | Sleep            |      4 |                                                               | NULL                                              |         0 |             0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL               | Sleep            |      1 |                                                               | NULL                                              |         0 |             0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL               | Sleep            |      4 |                                                               | NULL                                              |         1 |             0 |
| 1256 | repler   | 172.16.120.11:59594 | NULL               | Binlog Dump GTID | 955662 | Master has sent all binlog to slave; waiting for more updates | NULL                                              |         0 |             0 |
| 1257 | repler   | 172.16.120.12:56540 | NULL               | Binlog Dump GTID | 955642 | Master has sent all binlog to slave; waiting for more updates | NULL                                              |         0 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |     11 |                                                               | NULL                                              |         1 |             0 |
| 1341 | mha      | 172.16.120.12:56698 | information_schema | Sleep            |     96 |                                                               | NULL                                              |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |     34 |                                                               | NULL                                              |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |      6 |                                                               | NULL                                              |         0 |             0 |
| 1943 | root     | localhost           | dbms_monitor       | Query            |     21 | User sleep                                                    | select sleep(600) from infra.chk_masterha limit 1 |         0 |             0 |
| 2260 | root     | localhost           | dbms_monitor       | Query            |      0 | starting                                                      | show processlist                                  |         0 |             0 |
| 2303 | mha      | 172.16.120.13:34982 | NULL               | Query            |     20 | Sending data                                                  | SELECT 1 As Value from infra.chk_masterha limit 1 |         0 |             0 |
| 2305 | mha      | 172.16.120.13:34988 | NULL               | Query            |     17 | Sending data                                                  | SELECT 1 As Value from infra.chk_masterha limit 1 |         0 |             0 |
| 2308 | mha      | 172.16.120.13:34994 | NULL               | Query            |     14 | Sending data                                                  | SELECT 1 As Value from infra.chk_masterha limit 1 |         0 |             0 |
| 2310 | mha      | 172.16.120.13:34998 | NULL               | Query            |     11 | Sending data                                                  | SELECT 1 As Value from infra.chk_masterha limit 1 |         0 |             0 |
| 2312 | mha      | 172.16.120.13:35002 | NULL               | Query            |      8 | Sending data                                                  | SELECT 1 As Value from infra.chk_masterha limit 1 |         0 |             0 |
| 2314 | mha      | 172.16.120.13:35006 | NULL               | Query            |      5 | Sending data                                                  | SELECT 1 As Value from infra.chk_masterha limit 1 |         0 |             0 |
| 2317 | mha      | 172.16.120.13:35010 | NULL               | Query            |      2 | Sending data                                                  | SELECT 1 As Value from infra.chk_masterha limit 1 |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------+-----------+---------------+
19 rows in set (0.00 sec)

结论: 不会failover, mha manager可能报错退出

Fri Oct  9 12:28:44 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 12:28:44 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 12:28:44 2020 - [info]  OK.
Fri Oct  9 12:28:44 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 12:28:44 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 12:28:44 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 12:28:44 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 12:28:44 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 12:28:53 2020 - [warning] Got timeout on MySQL Ping(CONNECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct  9 12:28:53 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 12:28:53 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 12:28:53 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 12:28:53 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Master is reachable from 172.16.120.11!
Fri Oct  9 12:28:53 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 12:28:56 2020 - [warning] Got timeout on MySQL Ping(CONNECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct  9 12:28:56 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 12:28:59 2020 - [warning] Got timeout on MySQL Ping(CONNECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct  9 12:28:59 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond.....Fri Oct  9 12:30:47 2020 - [warning] Got timeout on MySQL Ping(CONNECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct  9 12:30:47 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..手动ctrl+c终止select sleep(600) from infra.chk_masterha limit 1后, mha manager报错退出了Fri Oct  9 12:30:49 2020 - [warning] Got error when monitoring master:  at /usr/local/share/perl5/MHA/MasterMonitor.pm line 489.
Fri Oct  9 12:30:49 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln491] Target master's advisory lock is already held by someone. Please check whether you monitor the same master from multiple monitoring processes.
Fri Oct  9 12:30:49 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln511] Error happened on health checking.  at /usr/local/bin/masterha_manager line 50.
Fri Oct  9 12:30:49 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Fri Oct  9 12:30:49 2020 - [info] Got exit code 1 (Not master dead).

ping_type=INSERT

master hang不好模拟, 这里间接模拟. 修改innodb_thread_concurrency值

root@localhost 12:25:34 [dbms_monitor]> set global innodb_thread_concurrency=1;
Query OK, 0 rows affected (0.00 sec)

手动执行一个查询, 查询innodb表, 这样mha的insert会被阻塞

root@localhost 12:35:21 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time   | State                                                         | Info                                                                                                 | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |      3 |                                                               | NULL                                                                                                 |         1 |             0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL               | Sleep            |      1 |                                                               | NULL                                                                                                 |         1 |             0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL               | Sleep            |      8 |                                                               | NULL                                                                                                 |         0 |             0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL               | Sleep            |      1 |                                                               | NULL                                                                                                 |         0 |             0 |
| 1256 | repler   | 172.16.120.11:59594 | NULL               | Binlog Dump GTID | 956039 | Master has sent all binlog to slave; waiting for more updates | NULL                                                                                                 |         0 |             0 |
| 1257 | repler   | 172.16.120.12:56540 | NULL               | Binlog Dump GTID | 956019 | Master has sent all binlog to slave; waiting for more updates | NULL                                                                                                 |         0 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |     28 |                                                               | NULL                                                                                                 |         1 |             0 |
| 1341 | mha      | 172.16.120.12:56698 | information_schema | Sleep            |    113 |                                                               | NULL                                                                                                 |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |     51 |                                                               | NULL                                                                                                 |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |     13 |                                                               | NULL                                                                                                 |         0 |             0 |
| 1943 | root     | localhost           | dbms_monitor       | Query            |     15 | User sleep                                                    | select sleep(600) from infra.chk_masterha limit 1                                                    |         0 |             0 |
| 2260 | root     | localhost           | dbms_monitor       | Query            |      0 | starting                                                      | show processlist                                                                                     |         0 |             0 |
| 2395 | mha      | 172.16.120.13:35206 | NULL               | Query            |     13 | update                                                        | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam |         0 |             0 |
| 2398 | mha      | 172.16.120.13:35208 | NULL               | Query            |     11 | update                                                        | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam |         0 |             0 |
| 2400 | mha      | 172.16.120.11:32908 | NULL               | Query            |     10 | update                                                        | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam |         0 |             0 |
| 2401 | mha      | 172.16.120.13:35216 | NULL               | Query            |      8 | update                                                        | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam |         0 |             0 |
| 2403 | mha      | 172.16.120.12:58066 | NULL               | Query            |      7 | update                                                        | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam |         0 |             0 |
| 2404 | mha      | 172.16.120.13:35222 | NULL               | Query            |      5 | update                                                        | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
18 rows in set (0.00 sec)

结论: 会failover

Fri Oct  9 12:35:00 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 12:35:01 2020 - [info] GTID failover mode = 1
Fri Oct  9 12:35:01 2020 - [info] Dead Servers:
Fri Oct  9 12:35:01 2020 - [info] Alive Servers:
Fri Oct  9 12:35:01 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:01 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 12:35:01 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 12:35:01 2020 - [info] Alive Slaves:
Fri Oct  9 12:35:01 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:01 2020 - [info]     GTID ON
Fri Oct  9 12:35:01 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:01 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:01 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:01 2020 - [info]     GTID ON
Fri Oct  9 12:35:01 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:01 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:01 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:01 2020 - [info] Checking slave configurations..
Fri Oct  9 12:35:01 2020 - [info] Checking replication filtering settings..
Fri Oct  9 12:35:01 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 12:35:01 2020 - [info]  Replication filtering check ok.
Fri Oct  9 12:35:01 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 12:35:01 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 12:35:01 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 12:35:01 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 12:35:01 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 12:35:01 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 12:35:01 2020 - [info]  OK.
Fri Oct  9 12:35:01 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 12:35:01 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 12:35:01 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 12:35:01 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 12:35:01 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 12:35:16 2020 - [warning] Got timeout on MySQL Ping(INSERT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct  9 12:35:16 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 12:35:16 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 12:35:17 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 12:35:19 2020 - [warning] Got timeout on MySQL Ping(INSERT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct  9 12:35:19 2020 - [warning] Connection failed 2 time(s)..
Monitoring server 172.16.120.11 is reachable, Master is not writable from 172.16.120.11. OK.
Fri Oct  9 12:35:22 2020 - [warning] Got timeout on MySQL Ping(INSERT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct  9 12:35:22 2020 - [warning] Connection failed 3 time(s)..
Monitoring server 172.16.120.12 is reachable, Master is not writable from 172.16.120.12. OK.
Fri Oct  9 12:35:23 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Oct  9 12:35:25 2020 - [warning] Got timeout on MySQL Ping(INSERT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct  9 12:35:25 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 12:35:25 2020 - [warning] Master is not reachable from health checker!
Fri Oct  9 12:35:25 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Fri Oct  9 12:35:25 2020 - [warning] SSH is reachable.
Fri Oct  9 12:35:25 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Fri Oct  9 12:35:25 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Fri Oct  9 12:35:25 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct  9 12:35:25 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct  9 12:35:27 2020 - [info] GTID failover mode = 1
Fri Oct  9 12:35:27 2020 - [info] Dead Servers:
Fri Oct  9 12:35:27 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:27 2020 - [info] Alive Servers:
Fri Oct  9 12:35:27 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 12:35:27 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 12:35:27 2020 - [info] Alive Slaves:
Fri Oct  9 12:35:27 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:27 2020 - [info]     GTID ON
Fri Oct  9 12:35:27 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:27 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:27 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:27 2020 - [info]     GTID ON
Fri Oct  9 12:35:27 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:27 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:27 2020 - [info] Checking slave configurations..
Fri Oct  9 12:35:27 2020 - [info] Checking replication filtering settings..
Fri Oct  9 12:35:27 2020 - [info]  Replication filtering check ok.
Fri Oct  9 12:35:27 2020 - [info] Master is down!
Fri Oct  9 12:35:27 2020 - [info] Terminating monitoring script.
Fri Oct  9 12:35:27 2020 - [info] Got exit code 20 (Master dead).
Fri Oct  9 12:35:27 2020 - [info] MHA::MasterFailover version 0.58.
Fri Oct  9 12:35:27 2020 - [info] Starting master failover.
Fri Oct  9 12:35:27 2020 - [info]
Fri Oct  9 12:35:27 2020 - [info] * Phase 1: Configuration Check Phase..
Fri Oct  9 12:35:27 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] GTID failover mode = 1
Fri Oct  9 12:35:28 2020 - [info] Dead Servers:
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info] Alive Servers:
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 12:35:28 2020 - [info] Alive Slaves:
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:28 2020 - [info]     GTID ON
Fri Oct  9 12:35:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:28 2020 - [info]     GTID ON
Fri Oct  9 12:35:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:28 2020 - [info] Starting GTID based failover.
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Oct  9 12:35:28 2020 - [info] Executing master IP deactivation script:
Fri Oct  9 12:35:28 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stopssh --ssh_user=root
Disabling the VIP on old master: 172.16.120.10
start down vipRTNETLINK answers: Cannot assign requested address
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1
Fri Oct  9 12:35:28 2020 - [info]  done.
Fri Oct  9 12:35:28 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Oct  9 12:35:28 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:827239
Fri Oct  9 12:35:28 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:8822-10390
Fri Oct  9 12:35:28 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:28 2020 - [info]     GTID ON
Fri Oct  9 12:35:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:28 2020 - [info]     GTID ON
Fri Oct  9 12:35:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:28 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:827239
Fri Oct  9 12:35:28 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:8822-10390
Fri Oct  9 12:35:28 2020 - [info] Oldest slaves:
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:28 2020 - [info]     GTID ON
Fri Oct  9 12:35:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:28 2020 - [info]     GTID ON
Fri Oct  9 12:35:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] Searching new master from slaves..
Fri Oct  9 12:35:28 2020 - [info]  Candidate masters from the configuration file:
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:28 2020 - [info]     GTID ON
Fri Oct  9 12:35:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 12:35:28 2020 - [info]     GTID ON
Fri Oct  9 12:35:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 12:35:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 12:35:28 2020 - [info]  Non-candidate masters:
Fri Oct  9 12:35:28 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Oct  9 12:35:28 2020 - [info] New master is 172.16.120.11(172.16.120.11:3358)
Fri Oct  9 12:35:28 2020 - [info] Starting master failover..
Fri Oct  9 12:35:28 2020 - [info]
From:
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)To:
172.16.120.11(172.16.120.11:3358) (new master)+--172.16.120.12(172.16.120.12:3358)
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info]  Waiting all logs to be applied..
Fri Oct  9 12:35:28 2020 - [info]   done.
Fri Oct  9 12:35:28 2020 - [info] Getting new master's binlog name and position..
Fri Oct  9 12:35:28 2020 - [info]  mysql-bin.000008:811243
Fri Oct  9 12:35:28 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.11', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Fri Oct  9 12:35:28 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000008, 811243, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-10390,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Fri Oct  9 12:35:28 2020 - [info] Executing master IP activate script:
Fri Oct  9 12:35:28 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.11 --new_master_ip=172.16.120.11 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.11
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0
Set read_only=0 on the new master.
Creating app user on the new master..
Fri Oct  9 12:35:28 2020 - [info]  OK.
Fri Oct  9 12:35:28 2020 - [info] ** Finished master recovery successfully.
Fri Oct  9 12:35:28 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Oct  9 12:35:28 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info] -- Slave recovery on host 172.16.120.12(172.16.120.12:3358) started, pid: 44798. Check tmp log /masterha/cls_new//172.16.120.12_3358_20201009123527.log if it takes time..
Fri Oct  9 12:35:29 2020 - [info]
Fri Oct  9 12:35:29 2020 - [info] Log messages from 172.16.120.12 ...
Fri Oct  9 12:35:29 2020 - [info]
Fri Oct  9 12:35:28 2020 - [info]  Resetting slave 172.16.120.12(172.16.120.12:3358) and starting replication from the new master 172.16.120.11(172.16.120.11:3358)..
Fri Oct  9 12:35:28 2020 - [info]  Executed CHANGE MASTER.
Fri Oct  9 12:35:28 2020 - [info]  Slave started.
Fri Oct  9 12:35:28 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-10390,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.12(172.16.120.12:3358). Executed 0 events.
Fri Oct  9 12:35:29 2020 - [info] End of log messages from 172.16.120.12.
Fri Oct  9 12:35:29 2020 - [info] -- Slave on host 172.16.120.12(172.16.120.12:3358) started.
Fri Oct  9 12:35:29 2020 - [info] All new slave servers recovered successfully.
Fri Oct  9 12:35:29 2020 - [info]
Fri Oct  9 12:35:29 2020 - [info] * Phase 5: New master cleanup phase..
Fri Oct  9 12:35:29 2020 - [info]
Fri Oct  9 12:35:29 2020 - [info] Resetting slave info on the new master..
Fri Oct  9 12:35:29 2020 - [info]  172.16.120.11: Resetting slave info succeeded.
Fri Oct  9 12:35:29 2020 - [info] Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct  9 12:35:29 2020 - [info] ----- Failover Report -----cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.11(172.16.120.11:3358) succeededMaster 172.16.120.10(172.16.120.10:3358) is down!Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.11(172.16.120.11:3358) as a new master.
172.16.120.11(172.16.120.11:3358): OK: Applying all logs succeeded.
172.16.120.11(172.16.120.11:3358): OK: Activated master IP address.
172.16.120.12(172.16.120.12:3358): OK: Slave started, replicating from 172.16.120.11(172.16.120.11:3358)
172.16.120.11(172.16.120.11:3358): Resetting slave info succeeded.
Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct  9 12:35:29 2020 - [info] Sending mail..

以下情况都不会failover, 即便是手动failover指定了 --master_state=dead 也不行

our @ALIVE_ERROR_CODES = (1040,    # ER_CON_COUNT_ERROR                  -- too many connection1042,    # ER_BAD_HOST_ERROR                   -- Can't get hostname for your address1043,    # ER_HANDSHAKE_ERROR                  -- Bad handshake1044,    # ER_DBACCESS_DENIED_ERROR            -- Access denied for user '%s'@'%s' to database '%s'1045,    # ER_ACCESS_DENIED_ERROR              -- Access denied for user '%s'@'%s' (using password: %s)1129,    # ER_HOST_IS_BLOCKED                  -- Host '%s' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'1130,    # ER_HOST_NOT_PRIVILEGED              -- Host '%s' is not allowed to connect to this MySQL server1203,    # ER_TOO_MANY_USER_CONNECTIONS        -- User %s already has more than 'max_user_connections' active connections1226,    # ER_USER_LIMIT_REACHED               -- User '%s' has exceeded the '%s' resource (current value: %ld)1251,    # ER_NOT_SUPPORTED_AUTH_MODE          -- Client does not support authentication protocol requested by server; consider upgrading MySQL client1275,    # ER_SERVER_IS_IN_SECURE_AUTH_MODE    -- Server is running in --secure-auth mode, but '%s'@'%s' has a password in the old format; please change the password to the new format
);

详见MHA-为什么too many connection不会failover?

[用例测试] master 与 mha manager间网络异常1

Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 正常 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master

ping_type=CONNECT

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

结论: 不会failover

Fri Oct  9 15:29:50 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 15:29:51 2020 - [info] GTID failover mode = 1
Fri Oct  9 15:29:51 2020 - [info] Dead Servers:
Fri Oct  9 15:29:51 2020 - [info] Alive Servers:
Fri Oct  9 15:29:51 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:29:51 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 15:29:51 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 15:29:51 2020 - [info] Alive Slaves:
Fri Oct  9 15:29:51 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 15:29:51 2020 - [info]     GTID ON
Fri Oct  9 15:29:51 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:29:51 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 15:29:51 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 15:29:51 2020 - [info]     GTID ON
Fri Oct  9 15:29:51 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:29:51 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 15:29:51 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:29:51 2020 - [info] Checking slave configurations..
Fri Oct  9 15:29:51 2020 - [info] Checking replication filtering settings..
Fri Oct  9 15:29:51 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 15:29:51 2020 - [info]  Replication filtering check ok.
Fri Oct  9 15:29:51 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 15:29:51 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 15:29:51 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 15:29:51 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 15:29:51 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 15:29:51 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 15:29:51 2020 - [info]  OK.
Fri Oct  9 15:29:51 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 15:29:51 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 15:29:51 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 15:29:51 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 15:29:51 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 15:32:56 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:32:56 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 15:32:56 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct  9 15:32:56 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 15:33:00 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:00 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:33:01 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct  9 15:33:03 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:03 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:33:06 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:06 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:33:06 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:33:09 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:09 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:33:09 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 15:33:09 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct  9 15:33:09 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 15:33:12 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:12 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:33:14 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct  9 15:33:15 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:15 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:33:18 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:18 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:33:18 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:33:21 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:21 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:33:21 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 15:33:21 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct  9 15:33:21 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 15:33:24 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:24 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:33:26 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct  9 15:33:27 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:27 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:33:30 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:30 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:33:30 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:33:33 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:33:33 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:33:33 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 15:33:33 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct  9 15:33:34 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.

ping_type=INSERT

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

此时manager工作正常, 因为ping_type=INSERT是长连接.

kill连接

root@localhost 15:39:31 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time  | State                                                         | Info             | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |     4 |                                                               | NULL             |         0 |             0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL               | Sleep            |    22 |                                                               | NULL             |         0 |             0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL               | Sleep            |     9 |                                                               | NULL             |         1 |             0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL               | Sleep            |     2 |                                                               | NULL             |         1 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |    19 |                                                               | NULL             |         0 |             0 |
| 1341 | mha      | 172.16.120.12:56698 | information_schema | Sleep            |   111 |                                                               | NULL             |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |    49 |                                                               | NULL             |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |     4 |                                                               | NULL             |         1 |             0 |
| 2409 | repler   | 172.16.120.11:32918 | NULL               | Binlog Dump GTID | 10898 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 2411 | repler   | 172.16.120.12:58084 | NULL               | Binlog Dump GTID | 10873 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 2627 | root     | localhost           | dbms_monitor       | Query            |     0 | starting                                                      | show processlist |         0 |             0 |
| 2836 | mha      | 172.16.120.13:35810 | NULL               | Sleep            |     2 |                                                               | NULL             |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)root@localhost 15:39:39 [dbms_monitor]> kill 2836;
Query OK, 0 rows affected (0.00 sec)

结论: 不会failover

Fri Oct  9 15:37:54 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 15:37:55 2020 - [info] GTID failover mode = 1
Fri Oct  9 15:37:55 2020 - [info] Dead Servers:
Fri Oct  9 15:37:55 2020 - [info] Alive Servers:
Fri Oct  9 15:37:55 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:37:55 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 15:37:55 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 15:37:55 2020 - [info] Alive Slaves:
Fri Oct  9 15:37:55 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 15:37:55 2020 - [info]     GTID ON
Fri Oct  9 15:37:55 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:37:55 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 15:37:55 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 15:37:55 2020 - [info]     GTID ON
Fri Oct  9 15:37:55 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:37:55 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 15:37:55 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:37:55 2020 - [info] Checking slave configurations..
Fri Oct  9 15:37:55 2020 - [info] Checking replication filtering settings..
Fri Oct  9 15:37:55 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 15:37:55 2020 - [info]  Replication filtering check ok.
Fri Oct  9 15:37:55 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 15:37:55 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 15:37:55 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 15:37:55 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 15:37:55 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 15:37:55 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 15:37:55 2020 - [info]  OK.
Fri Oct  9 15:37:55 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 15:37:55 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 15:37:55 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 15:37:55 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 15:37:55 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 15:39:46 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct  9 15:39:46 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:39:46 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Master is reachable from 172.16.120.11!
Fri Oct  9 15:39:47 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 15:39:51 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct  9 15:39:52 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:39:52 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:39:55 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:39:55 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:39:58 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:39:58 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:39:58 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:40:01 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:40:01 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:40:01 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 15:40:01 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct  9 15:40:01 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 15:40:04 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:40:04 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:40:06 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct  9 15:40:07 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:40:07 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:40:10 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:40:10 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:40:10 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:40:13 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:40:13 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:40:13 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:40:13 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Master is reachable from 172.16.120.11!
Fri Oct  9 15:40:13 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 15:40:14 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 15:40:14 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.

[用例测试] master 与 mha manager间网络异常2

Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 正常 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master

ping_type=CONNECT

slave-1

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

此时manager已经无法连通slave-1

#ping centos-2
PING centos-2 (172.16.120.11) 56(84) bytes of data.
64 bytes from centos-2 (172.16.120.11): icmp_seq=1 ttl=64 time=0.349 ms
64 bytes from centos-2 (172.16.120.11): icmp_seq=2 ttl=64 time=0.651 ms
^C
--- centos-2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.349/0.500/0.651/0.151 ms[root@centos-4 15:48:55 /usr/local/share/perl5/MHA]
#ssh centos-2
^C

master

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

结论: 不会failover

Fri Oct  9 15:48:03 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 15:48:05 2020 - [info] GTID failover mode = 1
Fri Oct  9 15:48:05 2020 - [info] Dead Servers:
Fri Oct  9 15:48:05 2020 - [info] Alive Servers:
Fri Oct  9 15:48:05 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:48:05 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 15:48:05 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 15:48:05 2020 - [info] Alive Slaves:
Fri Oct  9 15:48:05 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 15:48:05 2020 - [info]     GTID ON
Fri Oct  9 15:48:05 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:48:05 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 15:48:05 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 15:48:05 2020 - [info]     GTID ON
Fri Oct  9 15:48:05 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:48:05 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 15:48:05 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:48:05 2020 - [info] Checking slave configurations..
Fri Oct  9 15:48:05 2020 - [info] Checking replication filtering settings..
Fri Oct  9 15:48:05 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 15:48:05 2020 - [info]  Replication filtering check ok.
Fri Oct  9 15:48:05 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 15:48:05 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 15:48:05 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 15:48:05 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 15:48:05 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 15:48:05 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 15:48:05 2020 - [info]  OK.
Fri Oct  9 15:48:05 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 15:48:05 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 15:48:05 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 15:48:05 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 15:48:05 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 15:50:40 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:50:40 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 15:50:40 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:50:44 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:50:44 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:50:45 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 15:50:45 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 15:50:47 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:50:47 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:50:50 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:50:50 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:50:50 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:50:53 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:50:53 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:50:53 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 15:50:53 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:50:56 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:50:56 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:50:58 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 15:50:58 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 15:50:59 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:50:59 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:51:02 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:51:02 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:51:02 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:51:05 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:51:05 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:51:05 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 15:51:05 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:51:05 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 15:51:05 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 15:51:09 2020 - [warning] Got timeout on Secondary Check child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!

ping_type=INSERT

slave-1

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

此时manager已经无法连通slave-1

#ping centos-2
PING centos-2 (172.16.120.11) 56(84) bytes of data.
64 bytes from centos-2 (172.16.120.11): icmp_seq=1 ttl=64 time=0.349 ms
64 bytes from centos-2 (172.16.120.11): icmp_seq=2 ttl=64 time=0.651 ms
^C
--- centos-2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.349/0.500/0.651/0.151 ms[root@centos-4 15:48:55 /usr/local/share/perl5/MHA]
#ssh centos-2
^C

master

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

因为ping_type=INSERT是长连接,1 所以此时无异常.

kill连接

root@localhost 15:39:45 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time  | State                                                         | Info             | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |     3 |                                                               | NULL             |         1 |             0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL               | Sleep            |     1 |                                                               | NULL             |         0 |             0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL               | Sleep            |     8 |                                                               | NULL             |         1 |             0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL               | Sleep            |    11 |                                                               | NULL             |         0 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |     8 |                                                               | NULL             |         0 |             0 |
| 1341 | mha      | 172.16.120.12:56698 | information_schema | Sleep            |    89 |                                                               | NULL             |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |    26 |                                                               | NULL             |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |     3 |                                                               | NULL             |         0 |             0 |
| 2409 | repler   | 172.16.120.11:32918 | NULL               | Binlog Dump GTID | 11837 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 2411 | repler   | 172.16.120.12:58084 | NULL               | Binlog Dump GTID | 11812 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 2627 | root     | localhost           | dbms_monitor       | Query            |     0 | starting                                                      | show processlist |         0 |             0 |
| 2953 | mha      | 172.16.120.13:36174 | NULL               | Sleep            |     0 |                                                               | NULL             |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)root@localhost 15:55:18 [dbms_monitor]> kill 2953;
Query OK, 0 rows affected (0.00 sec)

结论: 不会failover

Fri Oct  9 15:52:43 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 15:52:44 2020 - [info] GTID failover mode = 1
Fri Oct  9 15:52:44 2020 - [info] Dead Servers:
Fri Oct  9 15:52:44 2020 - [info] Alive Servers:
Fri Oct  9 15:52:44 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:52:44 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 15:52:44 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 15:52:44 2020 - [info] Alive Slaves:
Fri Oct  9 15:52:44 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 15:52:44 2020 - [info]     GTID ON
Fri Oct  9 15:52:44 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:52:44 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 15:52:44 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 15:52:44 2020 - [info]     GTID ON
Fri Oct  9 15:52:44 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:52:44 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 15:52:44 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 15:52:44 2020 - [info] Checking slave configurations..
Fri Oct  9 15:52:44 2020 - [info] Checking replication filtering settings..
Fri Oct  9 15:52:44 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 15:52:44 2020 - [info]  Replication filtering check ok.
Fri Oct  9 15:52:44 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 15:52:44 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 15:52:45 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 15:52:45 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 15:52:45 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 15:52:45 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 15:52:45 2020 - [info]  OK.
Fri Oct  9 15:52:45 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 15:52:45 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 15:52:45 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 15:52:45 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 15:52:45 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 15:55:24 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct  9 15:55:24 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 15:55:24 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:55:29 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 15:55:29 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 15:55:30 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:30 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:55:33 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:33 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:55:36 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:36 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:55:36 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:55:39 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:39 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:55:39 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 15:55:39 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:55:42 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:42 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:55:44 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 15:55:44 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 15:55:45 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:45 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:55:48 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:48 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:55:48 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:55:51 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:51 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:55:51 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 15:55:51 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:55:54 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:54 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 15:55:56 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 15:55:56 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 15:55:57 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:55:57 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 15:56:00 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:56:00 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 15:56:00 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 15:56:03 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 15:56:03 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 15:56:03 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 15:56:03 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 15:56:03 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 15:56:03 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Master is reachable from 172.16.120.11!
Fri Oct  9 15:56:03 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.

[用例测试] master 与 mha manager间网络异常3

Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 正常 --> master
Manager <-- 不通 --> S2 <-- 正常 --> master

ping_type=CONNECT

slave-1, slave-2

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

#ping centos-2
PING centos-2 (172.16.120.11) 56(84) bytes of data.
64 bytes from centos-2 (172.16.120.11): icmp_seq=1 ttl=64 time=0.442 ms
64 bytes from centos-2 (172.16.120.11): icmp_seq=2 ttl=64 time=0.441 ms
^C
--- centos-2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.441/0.441/0.442/0.021 ms[root@centos-4 16:44:27 /usr/local/share/perl5/MHA]
#ssh centos-2
^C[root@centos-4 16:44:30 /usr/local/share/perl5/MHA]
#ping centos-3
PING centos-3 (172.16.120.12) 56(84) bytes of data.
64 bytes from centos-3 (172.16.120.12): icmp_seq=1 ttl=64 time=0.335 ms
64 bytes from centos-3 (172.16.120.12): icmp_seq=2 ttl=64 time=0.575 ms
^C
--- centos-3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.335/0.455/0.575/0.120 ms[root@centos-4 16:44:34 /usr/local/share/perl5/MHA]
#ssh centos-3
^C

master

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

结论: 不会failover

Fri Oct  9 16:43:25 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 16:43:26 2020 - [info] GTID failover mode = 1
Fri Oct  9 16:43:26 2020 - [info] Dead Servers:
Fri Oct  9 16:43:26 2020 - [info] Alive Servers:
Fri Oct  9 16:43:26 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:43:26 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 16:43:26 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 16:43:26 2020 - [info] Alive Slaves:
Fri Oct  9 16:43:26 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 16:43:26 2020 - [info]     GTID ON
Fri Oct  9 16:43:26 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:43:26 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 16:43:26 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 16:43:26 2020 - [info]     GTID ON
Fri Oct  9 16:43:26 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:43:26 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 16:43:26 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:43:26 2020 - [info] Checking slave configurations..
Fri Oct  9 16:43:26 2020 - [info] Checking replication filtering settings..
Fri Oct  9 16:43:26 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 16:43:26 2020 - [info]  Replication filtering check ok.
Fri Oct  9 16:43:26 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 16:43:26 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 16:43:26 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 16:43:26 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 16:43:26 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 16:43:26 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 16:43:26 2020 - [info]  OK.
Fri Oct  9 16:43:26 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 16:43:26 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 16:43:26 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 16:43:26 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 16:43:26 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 16:45:55 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:45:55 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 16:45:55 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 16:45:59 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:45:59 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 16:46:00 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 16:46:00 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 16:46:02 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:46:02 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 16:46:05 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:46:05 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 16:46:05 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 16:46:08 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:46:08 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 16:46:08 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 16:46:08 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 16:46:11 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:46:11 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 16:46:13 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 16:46:13 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 16:46:14 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:46:14 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 16:46:15 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..

ping_type=INSERT

slave-1,slave-2

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

#ping centos-2
PING centos-2 (172.16.120.11) 56(84) bytes of data.
64 bytes from centos-2 (172.16.120.11): icmp_seq=1 ttl=64 time=0.352 ms
^C
--- centos-2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.352/0.352/0.352/0.000 ms[root@centos-4 17:52:38 ~]
#ping centos-3
PING centos-3 (172.16.120.12) 56(84) bytes of data.
64 bytes from centos-3 (172.16.120.12): icmp_seq=1 ttl=64 time=0.221 ms
^C
--- centos-3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.221/0.221/0.221/0.000 ms[root@centos-4 17:52:41 ~]
#ssh centos-2
^C[root@centos-4 17:52:44 ~]
#ssh centos-3

master

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

由于ping_type=INSERT是长连接, 所以无异常

kill连接

root@localhost 17:48:11 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time  | State                                                         | Info             | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |    14 |                                                               | NULL             |         0 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |     4 |                                                               | NULL             |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |    32 |                                                               | NULL             |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |     4 |                                                               | NULL             |         1 |             0 |
| 2409 | repler   | 172.16.120.11:32918 | NULL               | Binlog Dump GTID | 18936 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 2411 | repler   | 172.16.120.12:58084 | NULL               | Binlog Dump GTID | 18911 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 3192 | proxysql | 172.16.120.11:33786 | NULL               | Sleep            |     4 |                                                               | NULL             |         0 |             0 |
| 3238 | proxysql | 172.16.120.12:59006 | NULL               | Sleep            |     4 |                                                               | NULL             |         1 |             0 |
| 3245 | proxysql | 172.16.120.11:33888 | NULL               | Sleep            |    14 |                                                               | NULL             |         0 |             0 |
| 3262 | root     | localhost           | dbms_monitor       | Query            |     0 | starting                                                      | show processlist |         0 |             0 |
| 3268 | mha      | 172.16.120.13:36868 | NULL               | Sleep            |     2 |                                                               | NULL             |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
11 rows in set (0.00 sec)root@localhost 17:53:37 [dbms_monitor]> kill 3268;
Query OK, 0 rows affected (0.00 sec)

结论: 不会failover

Fri Oct  9 17:50:48 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 17:50:49 2020 - [info] GTID failover mode = 1
Fri Oct  9 17:50:49 2020 - [info] Dead Servers:
Fri Oct  9 17:50:49 2020 - [info] Alive Servers:
Fri Oct  9 17:50:49 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 17:50:49 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 17:50:49 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 17:50:49 2020 - [info] Alive Slaves:
Fri Oct  9 17:50:49 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 17:50:49 2020 - [info]     GTID ON
Fri Oct  9 17:50:49 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 17:50:49 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 17:50:49 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 17:50:49 2020 - [info]     GTID ON
Fri Oct  9 17:50:49 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 17:50:49 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 17:50:49 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 17:50:49 2020 - [info] Checking slave configurations..
Fri Oct  9 17:50:49 2020 - [info] Checking replication filtering settings..
Fri Oct  9 17:50:49 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 17:50:49 2020 - [info]  Replication filtering check ok.
Fri Oct  9 17:50:49 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 17:50:49 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 17:50:49 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 17:50:49 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 17:50:49 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 17:50:49 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 17:50:49 2020 - [info]  OK.
Fri Oct  9 17:50:49 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 17:50:49 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 17:50:49 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 17:50:49 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 17:50:49 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 17:53:43 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct  9 17:53:43 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 17:53:43 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 17:53:48 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 17:53:48 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 17:53:49 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:53:49 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 17:53:52 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:53:52 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 17:53:55 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:53:55 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 17:53:55 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 17:53:58 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:53:58 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 17:53:58 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 17:53:58 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 17:54:01 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:54:01 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 17:54:03 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 17:54:03 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 17:54:04 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:54:04 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 17:54:07 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:54:07 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 17:54:07 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 17:54:10 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:54:10 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 17:54:10 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 17:54:10 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 17:54:13 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:54:13 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 17:54:15 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct  9 17:54:15 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct  9 17:54:16 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 17:54:16 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 17:54:16 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..

[用例测试] master 与 mha manager间网络异常4

Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 不通 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master

ping_type=CONNECT

master

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

结论: 不会failover

Fri Oct  9 16:05:55 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 16:05:56 2020 - [info] GTID failover mode = 1
Fri Oct  9 16:05:56 2020 - [info] Dead Servers:
Fri Oct  9 16:05:56 2020 - [info] Alive Servers:
Fri Oct  9 16:05:56 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:05:56 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 16:05:56 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 16:05:56 2020 - [info] Alive Slaves:
Fri Oct  9 16:05:56 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 16:05:56 2020 - [info]     GTID ON
Fri Oct  9 16:05:56 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:05:56 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 16:05:56 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 16:05:56 2020 - [info]     GTID ON
Fri Oct  9 16:05:56 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:05:56 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 16:05:56 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:05:56 2020 - [info] Checking slave configurations..
Fri Oct  9 16:05:56 2020 - [info] Checking replication filtering settings..
Fri Oct  9 16:05:56 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 16:05:56 2020 - [info]  Replication filtering check ok.
Fri Oct  9 16:05:56 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 16:05:56 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 16:05:56 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 16:05:56 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 16:05:56 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 16:05:56 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 16:05:56 2020 - [info]  OK.
Fri Oct  9 16:05:56 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 16:05:56 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 16:05:56 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 16:05:56 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 16:05:56 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 16:06:43 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:06:43 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 16:06:43 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 16:06:47 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:06:47 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 16:06:48 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Master is reachable from 172.16.120.12!
Fri Oct  9 16:06:48 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 16:06:50 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:06:50 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 16:06:53 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:06:53 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 16:06:53 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 16:06:56 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:06:56 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 16:06:56 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 16:06:56 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 16:06:59 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:06:59 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 16:07:01 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Master is reachable from 172.16.120.12!
Fri Oct  9 16:07:01 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 16:07:02 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:07:02 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 16:07:05 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:07:05 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 16:07:05 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 16:07:05 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..

ping_type=INSERT

master

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

由于ping_type=INSERT是长连接, 所以无异常

kill连接

root@localhost 15:55:23 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time  | State                                                         | Info             | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |    19 |                                                               | NULL             |         1 |             0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL               | Sleep            |     7 |                                                               | NULL             |         0 |             0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL               | Sleep            |     4 |                                                               | NULL             |         0 |             0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL               | Sleep            |    27 |                                                               | NULL             |         1 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |    14 |                                                               | NULL             |         1 |             0 |
| 1341 | mha      | 172.16.120.12:56698 | information_schema | Sleep            |   114 |                                                               | NULL             |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |    51 |                                                               | NULL             |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |     9 |                                                               | NULL             |         0 |             0 |
| 2409 | repler   | 172.16.120.11:32918 | NULL               | Binlog Dump GTID | 12703 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 2411 | repler   | 172.16.120.12:58084 | NULL               | Binlog Dump GTID | 12678 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 2627 | root     | localhost           | dbms_monitor       | Query            |     0 | starting                                                      | show processlist |         0 |             0 |
| 3022 | mha      | 172.16.120.13:36466 | NULL               | Sleep            |     2 |                                                               | NULL             |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)root@localhost 16:09:44 [dbms_monitor]> kill 3022;
Query OK, 0 rows affected (0.00 sec)

结论: 不会failover

Fri Oct  9 16:08:29 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 16:08:30 2020 - [info] GTID failover mode = 1
Fri Oct  9 16:08:30 2020 - [info] Dead Servers:
Fri Oct  9 16:08:30 2020 - [info] Alive Servers:
Fri Oct  9 16:08:30 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:08:30 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 16:08:30 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 16:08:30 2020 - [info] Alive Slaves:
Fri Oct  9 16:08:30 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 16:08:30 2020 - [info]     GTID ON
Fri Oct  9 16:08:30 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:08:30 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 16:08:30 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 16:08:30 2020 - [info]     GTID ON
Fri Oct  9 16:08:30 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:08:30 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 16:08:30 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 16:08:30 2020 - [info] Checking slave configurations..
Fri Oct  9 16:08:30 2020 - [info] Checking replication filtering settings..
Fri Oct  9 16:08:30 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 16:08:30 2020 - [info]  Replication filtering check ok.
Fri Oct  9 16:08:30 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 16:08:30 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 16:08:30 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 16:08:30 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 16:08:30 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 16:08:30 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 16:08:30 2020 - [info]  OK.
Fri Oct  9 16:08:30 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 16:08:30 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 16:08:30 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 16:08:30 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 16:08:30 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 16:09:51 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct  9 16:09:51 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 16:09:51 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 16:09:56 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Master is reachable from 172.16.120.12!
Fri Oct  9 16:09:57 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 16:09:57 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:09:57 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 16:10:00 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:10:00 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 16:10:03 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:10:03 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 16:10:03 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 16:10:06 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:10:06 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 16:10:06 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 16:10:06 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 16:10:09 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:10:09 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 16:10:11 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Master is reachable from 172.16.120.12!
Fri Oct  9 16:10:12 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct  9 16:10:12 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:10:12 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 16:10:15 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:10:15 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 16:10:15 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct  9 16:10:18 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:10:18 2020 - [warning] Connection failed 1 time(s)..
Fri Oct  9 16:10:18 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 16:10:18 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 16:10:21 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 16:10:21 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 16:10:21 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 16:10:21 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Master is reachable from 172.16.120.11!
Fri Oct  9 16:10:21 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.

[用例测试] master 与 mha manager间网络异常5

Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 不通 --> master
Manager <-- 正常 --> S2 <-- 不通 --> master

ping_type=CONNECT

master

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

结论: 会failover

Fri Oct  9 18:21:13 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 18:21:14 2020 - [info] GTID failover mode = 1
Fri Oct  9 18:21:14 2020 - [info] Dead Servers:
Fri Oct  9 18:21:14 2020 - [info] Alive Servers:
Fri Oct  9 18:21:14 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:21:14 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 18:21:14 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 18:21:14 2020 - [info] Alive Slaves:
Fri Oct  9 18:21:14 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:21:14 2020 - [info]     GTID ON
Fri Oct  9 18:21:14 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:21:14 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:21:14 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:21:14 2020 - [info]     GTID ON
Fri Oct  9 18:21:14 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:21:14 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:21:14 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:21:14 2020 - [info] Checking slave configurations..
Fri Oct  9 18:21:14 2020 - [info] Checking replication filtering settings..
Fri Oct  9 18:21:14 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 18:21:14 2020 - [info]  Replication filtering check ok.
Fri Oct  9 18:21:14 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 18:21:14 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 18:21:14 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 18:21:14 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 18:21:14 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 18:21:14 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 18:21:14 2020 - [info]  OK.
Fri Oct  9 18:21:14 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 18:21:14 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 18:21:14 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 18:21:14 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 18:21:14 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 18:22:07 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 18:22:07 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 18:22:07 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct  9 18:22:11 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 18:22:11 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 18:22:12 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Fri Oct  9 18:22:14 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 18:22:14 2020 - [warning] Connection failed 3 time(s)..
Fri Oct  9 18:22:17 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 18:22:17 2020 - [warning] Connection failed 4 time(s)..
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Fri Oct  9 18:22:18 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Oct  9 18:22:18 2020 - [warning] Master is not reachable from health checker!
Fri Oct  9 18:22:18 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Fri Oct  9 18:22:18 2020 - [warning] SSH is NOT reachable.
Fri Oct  9 18:22:18 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Fri Oct  9 18:22:18 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Fri Oct  9 18:22:18 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct  9 18:22:18 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct  9 18:22:19 2020 - [info] GTID failover mode = 1
Fri Oct  9 18:22:19 2020 - [info] Dead Servers:
Fri Oct  9 18:22:19 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:19 2020 - [info] Alive Servers:
Fri Oct  9 18:22:19 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 18:22:19 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 18:22:19 2020 - [info] Alive Slaves:
Fri Oct  9 18:22:19 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:19 2020 - [info]     GTID ON
Fri Oct  9 18:22:19 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:19 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:19 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:19 2020 - [info]     GTID ON
Fri Oct  9 18:22:19 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:19 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:19 2020 - [info] Checking slave configurations..
Fri Oct  9 18:22:19 2020 - [info] Checking replication filtering settings..
Fri Oct  9 18:22:19 2020 - [info]  Replication filtering check ok.
Fri Oct  9 18:22:19 2020 - [info] Master is down!
Fri Oct  9 18:22:19 2020 - [info] Terminating monitoring script.
Fri Oct  9 18:22:19 2020 - [info] Got exit code 20 (Master dead).
Fri Oct  9 18:22:19 2020 - [info] MHA::MasterFailover version 0.58.
Fri Oct  9 18:22:19 2020 - [info] Starting master failover.
Fri Oct  9 18:22:19 2020 - [info]
Fri Oct  9 18:22:19 2020 - [info] * Phase 1: Configuration Check Phase..
Fri Oct  9 18:22:19 2020 - [info]
Fri Oct  9 18:22:20 2020 - [info] GTID failover mode = 1
Fri Oct  9 18:22:20 2020 - [info] Dead Servers:
Fri Oct  9 18:22:20 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:20 2020 - [info] Checking master reachability via MySQL(double check)...
Fri Oct  9 18:22:21 2020 - [info]  ok.
Fri Oct  9 18:22:21 2020 - [info] Alive Servers:
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 18:22:21 2020 - [info] Alive Slaves:
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:21 2020 - [info]     GTID ON
Fri Oct  9 18:22:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:21 2020 - [info]     GTID ON
Fri Oct  9 18:22:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:21 2020 - [info] Starting GTID based failover.
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Oct  9 18:22:21 2020 - [info] Executing master IP deactivation script:
Fri Oct  9 18:22:21 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stop
Disabling the VIP on old master: 172.16.120.10
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1
Fri Oct  9 18:22:21 2020 - [info]  done.
Fri Oct  9 18:22:21 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Oct  9 18:22:21 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:3084318
Fri Oct  9 18:22:21 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:10391-19970
Fri Oct  9 18:22:21 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:21 2020 - [info]     GTID ON
Fri Oct  9 18:22:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:21 2020 - [info]     GTID ON
Fri Oct  9 18:22:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:21 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:3084318
Fri Oct  9 18:22:21 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:10391-19970
Fri Oct  9 18:22:21 2020 - [info] Oldest slaves:
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:21 2020 - [info]     GTID ON
Fri Oct  9 18:22:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:21 2020 - [info]     GTID ON
Fri Oct  9 18:22:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] Searching new master from slaves..
Fri Oct  9 18:22:21 2020 - [info]  Candidate masters from the configuration file:
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:21 2020 - [info]     GTID ON
Fri Oct  9 18:22:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:21 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:22:21 2020 - [info]     GTID ON
Fri Oct  9 18:22:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:22:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:22:21 2020 - [info]  Non-candidate masters:
Fri Oct  9 18:22:21 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Oct  9 18:22:21 2020 - [info] New master is 172.16.120.11(172.16.120.11:3358)
Fri Oct  9 18:22:21 2020 - [info] Starting master failover..
Fri Oct  9 18:22:21 2020 - [info]
From:
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)To:
172.16.120.11(172.16.120.11:3358) (new master)+--172.16.120.12(172.16.120.12:3358)
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info]  Waiting all logs to be applied..
Fri Oct  9 18:22:21 2020 - [info]   done.
Fri Oct  9 18:22:21 2020 - [info] Getting new master's binlog name and position..
Fri Oct  9 18:22:21 2020 - [info]  mysql-bin.000008:3052407
Fri Oct  9 18:22:21 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.11', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Fri Oct  9 18:22:21 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000008, 3052407, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-19970,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Fri Oct  9 18:22:21 2020 - [info] Executing master IP activate script:
Fri Oct  9 18:22:21 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.11 --new_master_ip=172.16.120.11 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.11
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0
Set read_only=0 on the new master.
Creating app user on the new master..
Fri Oct  9 18:22:21 2020 - [info]  OK.
Fri Oct  9 18:22:21 2020 - [info] ** Finished master recovery successfully.
Fri Oct  9 18:22:21 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Oct  9 18:22:21 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info] -- Slave recovery on host 172.16.120.12(172.16.120.12:3358) started, pid: 68999. Check tmp log /masterha/cls_new//172.16.120.12_3358_20201009182219.log if it takes time..
Fri Oct  9 18:22:22 2020 - [info]
Fri Oct  9 18:22:22 2020 - [info] Log messages from 172.16.120.12 ...
Fri Oct  9 18:22:22 2020 - [info]
Fri Oct  9 18:22:21 2020 - [info]  Resetting slave 172.16.120.12(172.16.120.12:3358) and starting replication from the new master 172.16.120.11(172.16.120.11:3358)..
Fri Oct  9 18:22:21 2020 - [info]  Executed CHANGE MASTER.
Fri Oct  9 18:22:21 2020 - [info]  Slave started.
Fri Oct  9 18:22:21 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-19970,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.12(172.16.120.12:3358). Executed 0 events.
Fri Oct  9 18:22:22 2020 - [info] End of log messages from 172.16.120.12.
Fri Oct  9 18:22:22 2020 - [info] -- Slave on host 172.16.120.12(172.16.120.12:3358) started.
Fri Oct  9 18:22:22 2020 - [info] All new slave servers recovered successfully.
Fri Oct  9 18:22:22 2020 - [info]
Fri Oct  9 18:22:22 2020 - [info] * Phase 5: New master cleanup phase..
Fri Oct  9 18:22:22 2020 - [info]
Fri Oct  9 18:22:22 2020 - [info] Resetting slave info on the new master..
Fri Oct  9 18:22:22 2020 - [info]  172.16.120.11: Resetting slave info succeeded.
Fri Oct  9 18:22:22 2020 - [info] Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct  9 18:22:22 2020 - [info] ----- Failover Report -----cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.11(172.16.120.11:3358) succeededMaster 172.16.120.10(172.16.120.10:3358) is down!Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.11(172.16.120.11:3358) as a new master.
172.16.120.11(172.16.120.11:3358): OK: Applying all logs succeeded.
172.16.120.11(172.16.120.11:3358): OK: Activated master IP address.
172.16.120.12(172.16.120.12:3358): OK: Slave started, replicating from 172.16.120.11(172.16.120.11:3358)
172.16.120.11(172.16.120.11:3358): Resetting slave info succeeded.
Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct  9 18:22:22 2020 - [info] Sending mail..

ping_type=INSERT

由于ping_type=INSERT是长连接, 所以无异常

kill连接

root@localhost 18:24:52 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User     | Host                | db                 | Command          | Time | State                                                         | Info             | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL               | Sleep            |    1 |                                                               | NULL             |         0 |             0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL               | Sleep            |    2 |                                                               | NULL             |         0 |             0 |
| 1343 | mha      | 172.16.120.11:59758 | information_schema | Sleep            |   74 |                                                               | NULL             |         0 |             0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL               | Sleep            |    2 |                                                               | NULL             |         1 |             0 |
| 3192 | proxysql | 172.16.120.11:33786 | NULL               | Sleep            |    3 |                                                               | NULL             |         1 |             0 |
| 3238 | proxysql | 172.16.120.12:59006 | NULL               | Sleep            |    1 |                                                               | NULL             |         1 |             0 |
| 3245 | proxysql | 172.16.120.11:33888 | NULL               | Sleep            |    2 |                                                               | NULL             |         0 |             0 |
| 3262 | root     | localhost           | dbms_monitor       | Query            |    0 | starting                                                      | show processlist |         0 |             0 |
| 3357 | repler   | 172.16.120.11:34036 | NULL               | Binlog Dump GTID |  142 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 3359 | repler   | 172.16.120.12:59166 | NULL               | Binlog Dump GTID |  123 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 3364 | mha      | 172.16.120.13:37512 | NULL               | Sleep            |    2 |                                                               | NULL             |         0 |             0 |
+------+----------+---------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
11 rows in set (0.00 sec)root@localhost 18:26:25 [dbms_monitor]> kill 3364;
Query OK, 0 rows affected (0.01 sec)

结论: 长连接断开后才会failover, 否则不会failover

Fri Oct  9 18:25:33 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct  9 18:25:34 2020 - [info] GTID failover mode = 1
Fri Oct  9 18:25:34 2020 - [info] Dead Servers:
Fri Oct  9 18:25:34 2020 - [info] Alive Servers:
Fri Oct  9 18:25:34 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:25:34 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 18:25:34 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 18:25:34 2020 - [info] Alive Slaves:
Fri Oct  9 18:25:34 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:25:34 2020 - [info]     GTID ON
Fri Oct  9 18:25:34 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:25:34 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:25:34 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:25:34 2020 - [info]     GTID ON
Fri Oct  9 18:25:34 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:25:34 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:25:34 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:25:34 2020 - [info] Checking slave configurations..
Fri Oct  9 18:25:34 2020 - [info] Checking replication filtering settings..
Fri Oct  9 18:25:34 2020 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Oct  9 18:25:34 2020 - [info]  Replication filtering check ok.
Fri Oct  9 18:25:34 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct  9 18:25:34 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct  9 18:25:35 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct  9 18:25:35 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct  9 18:25:35 2020 - [info] Checking master_ip_failover_script status:
Fri Oct  9 18:25:35 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct  9 18:25:35 2020 - [info]  OK.
Fri Oct  9 18:25:35 2020 - [warning] shutdown_script is not defined.
Fri Oct  9 18:25:35 2020 - [info] Set master ping interval 3 seconds.
Fri Oct  9 18:25:35 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct  9 18:25:35 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct  9 18:25:35 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct  9 18:26:44 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct  9 18:26:44 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct  9 18:26:44 2020 - [info] Executing SSH check script: exit 0
Fri Oct  9 18:26:49 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Fri Oct  9 18:26:50 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 18:26:50 2020 - [warning] Connection failed 2 time(s)..
Fri Oct  9 18:26:53 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 18:26:53 2020 - [warning] Connection failed 3 time(s)..
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Fri Oct  9 18:26:54 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Oct  9 18:26:56 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct  9 18:26:56 2020 - [warning] Connection failed 4 time(s)..
Fri Oct  9 18:26:56 2020 - [warning] Master is not reachable from health checker!
Fri Oct  9 18:26:56 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Fri Oct  9 18:26:56 2020 - [warning] SSH is NOT reachable.
Fri Oct  9 18:26:56 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Fri Oct  9 18:26:56 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Fri Oct  9 18:26:56 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct  9 18:26:56 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct  9 18:26:57 2020 - [info] GTID failover mode = 1
Fri Oct  9 18:26:57 2020 - [info] Dead Servers:
Fri Oct  9 18:26:57 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:57 2020 - [info] Alive Servers:
Fri Oct  9 18:26:57 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 18:26:57 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 18:26:57 2020 - [info] Alive Slaves:
Fri Oct  9 18:26:57 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:57 2020 - [info]     GTID ON
Fri Oct  9 18:26:57 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:57 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:57 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:57 2020 - [info]     GTID ON
Fri Oct  9 18:26:57 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:57 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:57 2020 - [info] Checking slave configurations..
Fri Oct  9 18:26:57 2020 - [info] Checking replication filtering settings..
Fri Oct  9 18:26:57 2020 - [info]  Replication filtering check ok.
Fri Oct  9 18:26:57 2020 - [info] Master is down!
Fri Oct  9 18:26:57 2020 - [info] Terminating monitoring script.
Fri Oct  9 18:26:57 2020 - [info] Got exit code 20 (Master dead).
Fri Oct  9 18:26:57 2020 - [info] MHA::MasterFailover version 0.58.
Fri Oct  9 18:26:57 2020 - [info] Starting master failover.
Fri Oct  9 18:26:57 2020 - [info]
Fri Oct  9 18:26:57 2020 - [info] * Phase 1: Configuration Check Phase..
Fri Oct  9 18:26:57 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] GTID failover mode = 1
Fri Oct  9 18:26:58 2020 - [info] Dead Servers:
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info] Alive Servers:
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Fri Oct  9 18:26:58 2020 - [info] Alive Slaves:
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:58 2020 - [info]     GTID ON
Fri Oct  9 18:26:58 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:58 2020 - [info]     GTID ON
Fri Oct  9 18:26:58 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:58 2020 - [info] Starting GTID based failover.
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Oct  9 18:26:58 2020 - [info] Executing master IP deactivation script:
Fri Oct  9 18:26:58 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stop
Disabling the VIP on old master: 172.16.120.10
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1
Fri Oct  9 18:26:58 2020 - [info]  done.
Fri Oct  9 18:26:58 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Oct  9 18:26:58 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:3101017
Fri Oct  9 18:26:58 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:19971-20041
Fri Oct  9 18:26:58 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:58 2020 - [info]     GTID ON
Fri Oct  9 18:26:58 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:58 2020 - [info]     GTID ON
Fri Oct  9 18:26:58 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:58 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:3101017
Fri Oct  9 18:26:58 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:19971-20041
Fri Oct  9 18:26:58 2020 - [info] Oldest slaves:
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:58 2020 - [info]     GTID ON
Fri Oct  9 18:26:58 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:58 2020 - [info]     GTID ON
Fri Oct  9 18:26:58 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] Searching new master from slaves..
Fri Oct  9 18:26:58 2020 - [info]  Candidate masters from the configuration file:
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:58 2020 - [info]     GTID ON
Fri Oct  9 18:26:58 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:58 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct  9 18:26:58 2020 - [info]     GTID ON
Fri Oct  9 18:26:58 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct  9 18:26:58 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Oct  9 18:26:58 2020 - [info]  Non-candidate masters:
Fri Oct  9 18:26:58 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Oct  9 18:26:58 2020 - [info] New master is 172.16.120.11(172.16.120.11:3358)
Fri Oct  9 18:26:58 2020 - [info] Starting master failover..
Fri Oct  9 18:26:58 2020 - [info]
From:
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)To:
172.16.120.11(172.16.120.11:3358) (new master)+--172.16.120.12(172.16.120.12:3358)
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info]  Waiting all logs to be applied..
Fri Oct  9 18:26:58 2020 - [info]   done.
Fri Oct  9 18:26:58 2020 - [info] Getting new master's binlog name and position..
Fri Oct  9 18:26:58 2020 - [info]  mysql-bin.000008:3068991
Fri Oct  9 18:26:58 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.11', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Fri Oct  9 18:26:58 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000008, 3068991, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20041,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Fri Oct  9 18:26:58 2020 - [info] Executing master IP activate script:
Fri Oct  9 18:26:58 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.11 --new_master_ip=172.16.120.11 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.11
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0
Set read_only=0 on the new master.
Creating app user on the new master..
Fri Oct  9 18:26:58 2020 - [info]  OK.
Fri Oct  9 18:26:58 2020 - [info] ** Finished master recovery successfully.
Fri Oct  9 18:26:58 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Oct  9 18:26:58 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info] -- Slave recovery on host 172.16.120.12(172.16.120.12:3358) started, pid: 69850. Check tmp log /masterha/cls_new//172.16.120.12_3358_20201009182657.log if it takes time..
Fri Oct  9 18:26:59 2020 - [info]
Fri Oct  9 18:26:59 2020 - [info] Log messages from 172.16.120.12 ...
Fri Oct  9 18:26:59 2020 - [info]
Fri Oct  9 18:26:58 2020 - [info]  Resetting slave 172.16.120.12(172.16.120.12:3358) and starting replication from the new master 172.16.120.11(172.16.120.11:3358)..
Fri Oct  9 18:26:58 2020 - [info]  Executed CHANGE MASTER.
Fri Oct  9 18:26:58 2020 - [info]  Slave started.
Fri Oct  9 18:26:58 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20041,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.12(172.16.120.12:3358). Executed 0 events.
Fri Oct  9 18:26:59 2020 - [info] End of log messages from 172.16.120.12.
Fri Oct  9 18:26:59 2020 - [info] -- Slave on host 172.16.120.12(172.16.120.12:3358) started.
Fri Oct  9 18:26:59 2020 - [info] All new slave servers recovered successfully.
Fri Oct  9 18:26:59 2020 - [info]
Fri Oct  9 18:26:59 2020 - [info] * Phase 5: New master cleanup phase..
Fri Oct  9 18:26:59 2020 - [info]
Fri Oct  9 18:26:59 2020 - [info] Resetting slave info on the new master..
Fri Oct  9 18:26:59 2020 - [info]  172.16.120.11: Resetting slave info succeeded.
Fri Oct  9 18:26:59 2020 - [info] Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct  9 18:26:59 2020 - [info] ----- Failover Report -----cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.11(172.16.120.11:3358) succeededMaster 172.16.120.10(172.16.120.10:3358) is down!Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.11(172.16.120.11:3358) as a new master.
172.16.120.11(172.16.120.11:3358): OK: Applying all logs succeeded.
172.16.120.11(172.16.120.11:3358): OK: Activated master IP address.
172.16.120.12(172.16.120.12:3358): OK: Slave started, replicating from 172.16.120.11(172.16.120.11:3358)
172.16.120.11(172.16.120.11:3358): Resetting slave info succeeded.
Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct  9 18:26:59 2020 - [info] Sending mail..