MHA Failover测试-上
MHA Failover测试-上
TL;DR
用例 | ping_type=CONNECT | ping_type=INSERT |
---|---|---|
master too many connection | 不会触发failover | 不会触发failover |
master hang | 不会触发failover | 会触发failover且成功 |
仅manager无法连通master | 不会触发failover | 不会触发failover |
manager无法连通master, 且无法ssh slave1 | 不会触发failover | 不会触发failover |
manager无法连通master, 且无法ssh slave1和slave2 | 不会触发failover | 不会触发failover |
manager无法连通master, ssh到slave1后无法连通master | 不会触发failover | 不会触发failover |
manager无法连通master, ssh到slave1和slave2后均无法连通master | 会触发failover且成功 | 会触发failover且成功(长连接断开后才会) |
master宕机前slave1也宕机了 | 会触发failover, 但failover失败 | 会触发failover, 但failover失败 |
master挂了, 在此之前slave-1 io_thread stop了 | 会failover且成功 | 会failover且成功 |
master挂了, 在此之前slave-1 io_thread error了 | 会failover且成功 | 会failover且成功 |
master挂了, 在此之前slave-1 sql_thread stop了 | 会failover且成功 | 会failover且成功 |
master挂了, 在此之前slave-1 sql_thread error了 | 会触发failover, 但failover失败 | 会触发failover, 但failover失败 |
环境信息
master: 172.16.120.10 centos-1 主 + proxysql
slave1: 172.16.120.11 centos-2 从 + proxysql
slave2: 172.16.120.12 centos-3 从 + proxysql
172.16.120.13 centos-4 mha manager
MHA配置
#cat /etc/masterha/conf/masterha_default.cnf
[server default]
# mysql user and password,此处的密码不能加引号
user=mha
password=xxxx#replication_user
repl_user=repler
repl_password=xxxx#checking master every 3 second
ping_interval=3# 使用短连接检测,默认是长连接
ping_type=INSERT
#ping_type=CONNECT
#下面会测试两种type#ssh user
ssh_user=root#发送邮件脚本
report_script=/etc/masterha/scripts/send_report# 节点工作目录
remote_workdir=/masterha/#cat /etc/masterha/conf/cls_new.cnf
[server default]
#workdir on the management server
manager_workdir=/masterha/cls_new/
manager_log=/masterha/cls_new/manager.log#workdir on the node for mysql server
master_binlog_dir=/data/mysql_3358/data/#自动故障VIP切换调用脚本
master_ip_failover_script=/etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128#手动故障切换调用脚本
master_ip_online_change_script=/etc/masterha/scripts/master_ip_online_change_vip --vip=172.16.120.128#检测master的可用性
secondary_check_script=masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12[server1]
hostname=172.16.120.10
port=3358
candidate_master=1[server2]
hostname=172.16.120.11
port=3358
candidate_master=1[server3]
hostname=172.16.120.12
port=3358
candidate_master=1
[用例测试] master too many connection
ping_type=CONNECT
root@localhost 11:43:29 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 7 | | NULL | 0 | 0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL | Sleep | 4 | | NULL | 0 | 0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL | Sleep | 2 | | NULL | 0 | 0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL | Sleep | 14 | | NULL | 0 | 0 |
| 1256 | repler | 172.16.120.11:59594 | NULL | Binlog Dump GTID | 952922 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1257 | repler | 172.16.120.12:56540 | NULL | Binlog Dump GTID | 952902 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 2 | | NULL | 1 | 0 |
| 1341 | mha | 172.16.120.12:56698 | information_schema | Sleep | 120 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 58 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 17 | | NULL | 1 | 0 |
| 1943 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
11 rows in set (0.00 sec)root@localhost 11:43:30 [dbms_monitor]> show global variables like '%max_connec%';
+-----------------------+---------+
| Variable_name | Value |
+-----------------------+---------+
| extra_max_connections | 1 |
| max_connect_errors | 1000000 |
| max_connections | 1024 |
+-----------------------+---------+
3 rows in set (0.01 sec)root@localhost 11:49:34 [dbms_monitor]> set global max_connections=5;
Query OK, 0 rows affected (0.01 sec);
结论: 不会failover
Fri Oct 9 11:42:57 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 11:42:57 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 11:42:57 2020 - [info] OK.
Fri Oct 9 11:42:57 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 11:42:57 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 11:42:57 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 11:42:57 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 11:42:57 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 11:49:51 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Too many connections at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
1040 (Too many connections)
Fri Oct 9 11:49:51 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 11:49:51 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 11:49:51 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Master is reachable from 172.16.120.11!
Fri Oct 9 11:49:52 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 11:49:54 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct 9 11:49:54 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct 9 11:49:57 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct 9 11:49:57 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct 9 11:50:00 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct 9 11:50:00 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
ping_type=INSERT
root@localhost 11:55:13 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 1 | | NULL | 0 | 0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL | Sleep | 18 | | NULL | 1 | 0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL | Sleep | 16 | | NULL | 0 | 0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL | Sleep | 8 | | NULL | 0 | 0 |
| 1256 | repler | 172.16.120.11:59594 | NULL | Binlog Dump GTID | 953626 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1257 | repler | 172.16.120.12:56540 | NULL | Binlog Dump GTID | 953606 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 6 | | NULL | 1 | 0 |
| 1341 | mha | 172.16.120.12:56698 | information_schema | Sleep | 103 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 41 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 1 | | NULL | 1 | 0 |
| 1943 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 2160 | mha | 172.16.120.13:34660 | NULL | Sleep | 2 | | NULL | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)root@localhost 11:55:14 [dbms_monitor]> show global variables like '%max_connec%';
+-----------------------+---------+
| Variable_name | Value |
+-----------------------+---------+
| extra_max_connections | 1 |
| max_connect_errors | 1000000 |
| max_connections | 1024 |
+-----------------------+---------+
3 rows in set (0.04 sec)root@localhost 11:55:19 [dbms_monitor]> set global max_connections=5;
Query OK, 0 rows affected (0.00 sec)root@localhost 11:55:25 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 6 | | NULL | 0 | 0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL | Sleep | 3 | | NULL | 0 | 0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL | Sleep | 31 | | NULL | 0 | 0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL | Sleep | 3 | | NULL | 1 | 0 |
| 1256 | repler | 172.16.120.11:59594 | NULL | Binlog Dump GTID | 953641 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1257 | repler | 172.16.120.12:56540 | NULL | Binlog Dump GTID | 953621 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 0 | | NULL | 0 | 0 |
| 1341 | mha | 172.16.120.12:56698 | information_schema | Sleep | 118 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 56 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 6 | | NULL | 1 | 0 |
| 1943 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 2160 | mha | 172.16.120.13:34660 | NULL | Sleep | 2 | | NULL | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)
ping_type=INSERT是长连接, 不会感知too many connection
.
手动kill掉mha连接
root@localhost 11:55:29 [dbms_monitor]> kill 2160;
Query OK, 0 rows affected (0.01 sec)
结论: 不会failover
Fri Oct 9 11:54:48 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 11:54:48 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 11:54:48 2020 - [info] OK.
Fri Oct 9 11:54:48 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 11:54:48 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 11:54:48 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 11:54:48 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 11:54:48 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 11:56:42 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct 9 11:56:42 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 11:56:42 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 11:56:43 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
ERROR 1040 (HY000): Too many connections
Monitoring server 172.16.120.11 is reachable, Master is not writable from 172.16.120.11. OK.
ERROR 1040 (HY000): Too many connections
Monitoring server 172.16.120.12 is reachable, Master is not writable from 172.16.120.12. OK.
Fri Oct 9 11:56:43 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Oct 9 11:56:45 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct 9 11:56:45 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct 9 11:56:48 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct 9 11:56:48 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct 9 11:56:51 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct 9 11:56:51 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct 9 11:56:54 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct 9 11:56:54 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct 9 11:56:57 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
Fri Oct 9 11:56:57 2020 - [info] Got MySQL error 1040, but this is not a MySQL crash. Continue health check..
Fri Oct 9 11:57:00 2020 - [warning] Got error on MySQL connect: 1040 (Too many connections)
[用例测试] master hang
ping_type=CONNECT
master hang不好模拟, 这里间接模拟. 需要将ping_select的执行的select 1
改为select innodb_table
查询一个innodb表
sub ping_select($) {my $self = shift;my $log = $self->{logger};my $dbh = $self->{dbh};my ( $query, $sth, $href );eval {$dbh->{RaiseError} = 1;#$sth = $dbh->prepare("SELECT 1 As Value");$sth = $dbh->prepare("SELECT 1 As Value from infra.chk_masterha limit 1");
然后修改innodb_thread_concurrency
值
root@localhost 12:25:34 [dbms_monitor]> set global innodb_thread_concurrency=1;
Query OK, 0 rows affected (0.00 sec)
手动执行一个查询, 查询innodb表, 这样mha的select会被阻塞
root@localhost 12:25:45 [dbms_monitor]> select sleep(600) from infra.chk_masterha limit 1;root@localhost 12:29:09 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 16 | | NULL | 1 | 0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL | Sleep | 4 | | NULL | 0 | 0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL | Sleep | 1 | | NULL | 0 | 0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL | Sleep | 4 | | NULL | 1 | 0 |
| 1256 | repler | 172.16.120.11:59594 | NULL | Binlog Dump GTID | 955662 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1257 | repler | 172.16.120.12:56540 | NULL | Binlog Dump GTID | 955642 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 11 | | NULL | 1 | 0 |
| 1341 | mha | 172.16.120.12:56698 | information_schema | Sleep | 96 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 34 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 6 | | NULL | 0 | 0 |
| 1943 | root | localhost | dbms_monitor | Query | 21 | User sleep | select sleep(600) from infra.chk_masterha limit 1 | 0 | 0 |
| 2260 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 2303 | mha | 172.16.120.13:34982 | NULL | Query | 20 | Sending data | SELECT 1 As Value from infra.chk_masterha limit 1 | 0 | 0 |
| 2305 | mha | 172.16.120.13:34988 | NULL | Query | 17 | Sending data | SELECT 1 As Value from infra.chk_masterha limit 1 | 0 | 0 |
| 2308 | mha | 172.16.120.13:34994 | NULL | Query | 14 | Sending data | SELECT 1 As Value from infra.chk_masterha limit 1 | 0 | 0 |
| 2310 | mha | 172.16.120.13:34998 | NULL | Query | 11 | Sending data | SELECT 1 As Value from infra.chk_masterha limit 1 | 0 | 0 |
| 2312 | mha | 172.16.120.13:35002 | NULL | Query | 8 | Sending data | SELECT 1 As Value from infra.chk_masterha limit 1 | 0 | 0 |
| 2314 | mha | 172.16.120.13:35006 | NULL | Query | 5 | Sending data | SELECT 1 As Value from infra.chk_masterha limit 1 | 0 | 0 |
| 2317 | mha | 172.16.120.13:35010 | NULL | Query | 2 | Sending data | SELECT 1 As Value from infra.chk_masterha limit 1 | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------+-----------+---------------+
19 rows in set (0.00 sec)
结论: 不会failover, mha manager可能报错退出
Fri Oct 9 12:28:44 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 12:28:44 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 12:28:44 2020 - [info] OK.
Fri Oct 9 12:28:44 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 12:28:44 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 12:28:44 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 12:28:44 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 12:28:44 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 12:28:53 2020 - [warning] Got timeout on MySQL Ping(CONNECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct 9 12:28:53 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 12:28:53 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 12:28:53 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 12:28:53 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Master is reachable from 172.16.120.11!
Fri Oct 9 12:28:53 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 12:28:56 2020 - [warning] Got timeout on MySQL Ping(CONNECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct 9 12:28:56 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 12:28:59 2020 - [warning] Got timeout on MySQL Ping(CONNECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct 9 12:28:59 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond.....Fri Oct 9 12:30:47 2020 - [warning] Got timeout on MySQL Ping(CONNECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct 9 12:30:47 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..手动ctrl+c终止select sleep(600) from infra.chk_masterha limit 1后, mha manager报错退出了Fri Oct 9 12:30:49 2020 - [warning] Got error when monitoring master: at /usr/local/share/perl5/MHA/MasterMonitor.pm line 489.
Fri Oct 9 12:30:49 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln491] Target master's advisory lock is already held by someone. Please check whether you monitor the same master from multiple monitoring processes.
Fri Oct 9 12:30:49 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln511] Error happened on health checking. at /usr/local/bin/masterha_manager line 50.
Fri Oct 9 12:30:49 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Fri Oct 9 12:30:49 2020 - [info] Got exit code 1 (Not master dead).
ping_type=INSERT
master hang不好模拟, 这里间接模拟. 修改innodb_thread_concurrency
值
root@localhost 12:25:34 [dbms_monitor]> set global innodb_thread_concurrency=1;
Query OK, 0 rows affected (0.00 sec)
手动执行一个查询, 查询innodb表, 这样mha的insert会被阻塞
root@localhost 12:35:21 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 3 | | NULL | 1 | 0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL | Sleep | 1 | | NULL | 1 | 0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL | Sleep | 8 | | NULL | 0 | 0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL | Sleep | 1 | | NULL | 0 | 0 |
| 1256 | repler | 172.16.120.11:59594 | NULL | Binlog Dump GTID | 956039 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1257 | repler | 172.16.120.12:56540 | NULL | Binlog Dump GTID | 956019 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 28 | | NULL | 1 | 0 |
| 1341 | mha | 172.16.120.12:56698 | information_schema | Sleep | 113 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 51 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 13 | | NULL | 0 | 0 |
| 1943 | root | localhost | dbms_monitor | Query | 15 | User sleep | select sleep(600) from infra.chk_masterha limit 1 | 0 | 0 |
| 2260 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 2395 | mha | 172.16.120.13:35206 | NULL | Query | 13 | update | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam | 0 | 0 |
| 2398 | mha | 172.16.120.13:35208 | NULL | Query | 11 | update | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam | 0 | 0 |
| 2400 | mha | 172.16.120.11:32908 | NULL | Query | 10 | update | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam | 0 | 0 |
| 2401 | mha | 172.16.120.13:35216 | NULL | Query | 8 | update | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam | 0 | 0 |
| 2403 | mha | 172.16.120.12:58066 | NULL | Query | 7 | update | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam | 0 | 0 |
| 2404 | mha | 172.16.120.13:35222 | NULL | Query | 5 | update | INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestam | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
18 rows in set (0.00 sec)
结论: 会failover
Fri Oct 9 12:35:00 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 12:35:01 2020 - [info] GTID failover mode = 1
Fri Oct 9 12:35:01 2020 - [info] Dead Servers:
Fri Oct 9 12:35:01 2020 - [info] Alive Servers:
Fri Oct 9 12:35:01 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:01 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 12:35:01 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 12:35:01 2020 - [info] Alive Slaves:
Fri Oct 9 12:35:01 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:01 2020 - [info] GTID ON
Fri Oct 9 12:35:01 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:01 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:01 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:01 2020 - [info] GTID ON
Fri Oct 9 12:35:01 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:01 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:01 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:01 2020 - [info] Checking slave configurations..
Fri Oct 9 12:35:01 2020 - [info] Checking replication filtering settings..
Fri Oct 9 12:35:01 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 12:35:01 2020 - [info] Replication filtering check ok.
Fri Oct 9 12:35:01 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 12:35:01 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 12:35:01 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 12:35:01 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 12:35:01 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 12:35:01 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 12:35:01 2020 - [info] OK.
Fri Oct 9 12:35:01 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 12:35:01 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 12:35:01 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 12:35:01 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 12:35:01 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 12:35:16 2020 - [warning] Got timeout on MySQL Ping(INSERT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct 9 12:35:16 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 12:35:16 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 12:35:17 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 12:35:19 2020 - [warning] Got timeout on MySQL Ping(INSERT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct 9 12:35:19 2020 - [warning] Connection failed 2 time(s)..
Monitoring server 172.16.120.11 is reachable, Master is not writable from 172.16.120.11. OK.
Fri Oct 9 12:35:22 2020 - [warning] Got timeout on MySQL Ping(INSERT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct 9 12:35:22 2020 - [warning] Connection failed 3 time(s)..
Monitoring server 172.16.120.12 is reachable, Master is not writable from 172.16.120.12. OK.
Fri Oct 9 12:35:23 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Oct 9 12:35:25 2020 - [warning] Got timeout on MySQL Ping(INSERT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
Fri Oct 9 12:35:25 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 12:35:25 2020 - [warning] Master is not reachable from health checker!
Fri Oct 9 12:35:25 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Fri Oct 9 12:35:25 2020 - [warning] SSH is reachable.
Fri Oct 9 12:35:25 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Fri Oct 9 12:35:25 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Fri Oct 9 12:35:25 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct 9 12:35:25 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct 9 12:35:27 2020 - [info] GTID failover mode = 1
Fri Oct 9 12:35:27 2020 - [info] Dead Servers:
Fri Oct 9 12:35:27 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:27 2020 - [info] Alive Servers:
Fri Oct 9 12:35:27 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 12:35:27 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 12:35:27 2020 - [info] Alive Slaves:
Fri Oct 9 12:35:27 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:27 2020 - [info] GTID ON
Fri Oct 9 12:35:27 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:27 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:27 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:27 2020 - [info] GTID ON
Fri Oct 9 12:35:27 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:27 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:27 2020 - [info] Checking slave configurations..
Fri Oct 9 12:35:27 2020 - [info] Checking replication filtering settings..
Fri Oct 9 12:35:27 2020 - [info] Replication filtering check ok.
Fri Oct 9 12:35:27 2020 - [info] Master is down!
Fri Oct 9 12:35:27 2020 - [info] Terminating monitoring script.
Fri Oct 9 12:35:27 2020 - [info] Got exit code 20 (Master dead).
Fri Oct 9 12:35:27 2020 - [info] MHA::MasterFailover version 0.58.
Fri Oct 9 12:35:27 2020 - [info] Starting master failover.
Fri Oct 9 12:35:27 2020 - [info]
Fri Oct 9 12:35:27 2020 - [info] * Phase 1: Configuration Check Phase..
Fri Oct 9 12:35:27 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] GTID failover mode = 1
Fri Oct 9 12:35:28 2020 - [info] Dead Servers:
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Alive Servers:
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 12:35:28 2020 - [info] Alive Slaves:
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:28 2020 - [info] GTID ON
Fri Oct 9 12:35:28 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:28 2020 - [info] GTID ON
Fri Oct 9 12:35:28 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:28 2020 - [info] Starting GTID based failover.
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Oct 9 12:35:28 2020 - [info] Executing master IP deactivation script:
Fri Oct 9 12:35:28 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stopssh --ssh_user=root
Disabling the VIP on old master: 172.16.120.10
start down vipRTNETLINK answers: Cannot assign requested address
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1
Fri Oct 9 12:35:28 2020 - [info] done.
Fri Oct 9 12:35:28 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Oct 9 12:35:28 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:827239
Fri Oct 9 12:35:28 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:8822-10390
Fri Oct 9 12:35:28 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:28 2020 - [info] GTID ON
Fri Oct 9 12:35:28 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:28 2020 - [info] GTID ON
Fri Oct 9 12:35:28 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:28 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:827239
Fri Oct 9 12:35:28 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:8822-10390
Fri Oct 9 12:35:28 2020 - [info] Oldest slaves:
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:28 2020 - [info] GTID ON
Fri Oct 9 12:35:28 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:28 2020 - [info] GTID ON
Fri Oct 9 12:35:28 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] Searching new master from slaves..
Fri Oct 9 12:35:28 2020 - [info] Candidate masters from the configuration file:
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:28 2020 - [info] GTID ON
Fri Oct 9 12:35:28 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:28 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 12:35:28 2020 - [info] GTID ON
Fri Oct 9 12:35:28 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 12:35:28 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 12:35:28 2020 - [info] Non-candidate masters:
Fri Oct 9 12:35:28 2020 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Oct 9 12:35:28 2020 - [info] New master is 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 12:35:28 2020 - [info] Starting master failover..
Fri Oct 9 12:35:28 2020 - [info]
From:
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)To:
172.16.120.11(172.16.120.11:3358) (new master)+--172.16.120.12(172.16.120.12:3358)
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] Waiting all logs to be applied..
Fri Oct 9 12:35:28 2020 - [info] done.
Fri Oct 9 12:35:28 2020 - [info] Getting new master's binlog name and position..
Fri Oct 9 12:35:28 2020 - [info] mysql-bin.000008:811243
Fri Oct 9 12:35:28 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.11', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Fri Oct 9 12:35:28 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000008, 811243, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-10390,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Fri Oct 9 12:35:28 2020 - [info] Executing master IP activate script:
Fri Oct 9 12:35:28 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.11 --new_master_ip=172.16.120.11 --new_master_port=3358 --new_master_user='mha' --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.11
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0
Set read_only=0 on the new master.
Creating app user on the new master..
Fri Oct 9 12:35:28 2020 - [info] OK.
Fri Oct 9 12:35:28 2020 - [info] ** Finished master recovery successfully.
Fri Oct 9 12:35:28 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Oct 9 12:35:28 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] -- Slave recovery on host 172.16.120.12(172.16.120.12:3358) started, pid: 44798. Check tmp log /masterha/cls_new//172.16.120.12_3358_20201009123527.log if it takes time..
Fri Oct 9 12:35:29 2020 - [info]
Fri Oct 9 12:35:29 2020 - [info] Log messages from 172.16.120.12 ...
Fri Oct 9 12:35:29 2020 - [info]
Fri Oct 9 12:35:28 2020 - [info] Resetting slave 172.16.120.12(172.16.120.12:3358) and starting replication from the new master 172.16.120.11(172.16.120.11:3358)..
Fri Oct 9 12:35:28 2020 - [info] Executed CHANGE MASTER.
Fri Oct 9 12:35:28 2020 - [info] Slave started.
Fri Oct 9 12:35:28 2020 - [info] gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-10390,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.12(172.16.120.12:3358). Executed 0 events.
Fri Oct 9 12:35:29 2020 - [info] End of log messages from 172.16.120.12.
Fri Oct 9 12:35:29 2020 - [info] -- Slave on host 172.16.120.12(172.16.120.12:3358) started.
Fri Oct 9 12:35:29 2020 - [info] All new slave servers recovered successfully.
Fri Oct 9 12:35:29 2020 - [info]
Fri Oct 9 12:35:29 2020 - [info] * Phase 5: New master cleanup phase..
Fri Oct 9 12:35:29 2020 - [info]
Fri Oct 9 12:35:29 2020 - [info] Resetting slave info on the new master..
Fri Oct 9 12:35:29 2020 - [info] 172.16.120.11: Resetting slave info succeeded.
Fri Oct 9 12:35:29 2020 - [info] Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct 9 12:35:29 2020 - [info] ----- Failover Report -----cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.11(172.16.120.11:3358) succeededMaster 172.16.120.10(172.16.120.10:3358) is down!Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.11(172.16.120.11:3358) as a new master.
172.16.120.11(172.16.120.11:3358): OK: Applying all logs succeeded.
172.16.120.11(172.16.120.11:3358): OK: Activated master IP address.
172.16.120.12(172.16.120.12:3358): OK: Slave started, replicating from 172.16.120.11(172.16.120.11:3358)
172.16.120.11(172.16.120.11:3358): Resetting slave info succeeded.
Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct 9 12:35:29 2020 - [info] Sending mail..
以下情况都不会failover, 即便是手动failover指定了 --master_state=dead 也不行
our @ALIVE_ERROR_CODES = (1040, # ER_CON_COUNT_ERROR -- too many connection1042, # ER_BAD_HOST_ERROR -- Can't get hostname for your address1043, # ER_HANDSHAKE_ERROR -- Bad handshake1044, # ER_DBACCESS_DENIED_ERROR -- Access denied for user '%s'@'%s' to database '%s'1045, # ER_ACCESS_DENIED_ERROR -- Access denied for user '%s'@'%s' (using password: %s)1129, # ER_HOST_IS_BLOCKED -- Host '%s' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'1130, # ER_HOST_NOT_PRIVILEGED -- Host '%s' is not allowed to connect to this MySQL server1203, # ER_TOO_MANY_USER_CONNECTIONS -- User %s already has more than 'max_user_connections' active connections1226, # ER_USER_LIMIT_REACHED -- User '%s' has exceeded the '%s' resource (current value: %ld)1251, # ER_NOT_SUPPORTED_AUTH_MODE -- Client does not support authentication protocol requested by server; consider upgrading MySQL client1275, # ER_SERVER_IS_IN_SECURE_AUTH_MODE -- Server is running in --secure-auth mode, but '%s'@'%s' has a password in the old format; please change the password to the new format
);
详见MHA-为什么too many connection不会failover?
[用例测试] master 与 mha manager间网络异常1
Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 正常 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master
ping_type=CONNECT
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
结论: 不会failover
Fri Oct 9 15:29:50 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 15:29:51 2020 - [info] GTID failover mode = 1
Fri Oct 9 15:29:51 2020 - [info] Dead Servers:
Fri Oct 9 15:29:51 2020 - [info] Alive Servers:
Fri Oct 9 15:29:51 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:29:51 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 15:29:51 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 15:29:51 2020 - [info] Alive Slaves:
Fri Oct 9 15:29:51 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 15:29:51 2020 - [info] GTID ON
Fri Oct 9 15:29:51 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:29:51 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 15:29:51 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 15:29:51 2020 - [info] GTID ON
Fri Oct 9 15:29:51 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:29:51 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 15:29:51 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:29:51 2020 - [info] Checking slave configurations..
Fri Oct 9 15:29:51 2020 - [info] Checking replication filtering settings..
Fri Oct 9 15:29:51 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 15:29:51 2020 - [info] Replication filtering check ok.
Fri Oct 9 15:29:51 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 15:29:51 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 15:29:51 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 15:29:51 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 15:29:51 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 15:29:51 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 15:29:51 2020 - [info] OK.
Fri Oct 9 15:29:51 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 15:29:51 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 15:29:51 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 15:29:51 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 15:29:51 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 15:32:56 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:32:56 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 15:32:56 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct 9 15:32:56 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 15:33:00 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:00 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:33:01 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct 9 15:33:03 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:03 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:33:06 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:06 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:33:06 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:33:09 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:09 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:33:09 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 15:33:09 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct 9 15:33:09 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 15:33:12 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:12 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:33:14 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct 9 15:33:15 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:15 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:33:18 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:18 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:33:18 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:33:21 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:21 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:33:21 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 15:33:21 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct 9 15:33:21 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 15:33:24 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:24 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:33:26 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct 9 15:33:27 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:27 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:33:30 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:30 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:33:30 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:33:33 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:33:33 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:33:33 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 15:33:33 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct 9 15:33:34 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
ping_type=INSERT
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
此时manager工作正常, 因为ping_type=INSERT是长连接.
kill连接
root@localhost 15:39:31 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 4 | | NULL | 0 | 0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL | Sleep | 22 | | NULL | 0 | 0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL | Sleep | 9 | | NULL | 1 | 0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL | Sleep | 2 | | NULL | 1 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 19 | | NULL | 0 | 0 |
| 1341 | mha | 172.16.120.12:56698 | information_schema | Sleep | 111 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 49 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 4 | | NULL | 1 | 0 |
| 2409 | repler | 172.16.120.11:32918 | NULL | Binlog Dump GTID | 10898 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 2411 | repler | 172.16.120.12:58084 | NULL | Binlog Dump GTID | 10873 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 2627 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 2836 | mha | 172.16.120.13:35810 | NULL | Sleep | 2 | | NULL | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)root@localhost 15:39:39 [dbms_monitor]> kill 2836;
Query OK, 0 rows affected (0.00 sec)
结论: 不会failover
Fri Oct 9 15:37:54 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 15:37:55 2020 - [info] GTID failover mode = 1
Fri Oct 9 15:37:55 2020 - [info] Dead Servers:
Fri Oct 9 15:37:55 2020 - [info] Alive Servers:
Fri Oct 9 15:37:55 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:37:55 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 15:37:55 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 15:37:55 2020 - [info] Alive Slaves:
Fri Oct 9 15:37:55 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 15:37:55 2020 - [info] GTID ON
Fri Oct 9 15:37:55 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:37:55 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 15:37:55 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 15:37:55 2020 - [info] GTID ON
Fri Oct 9 15:37:55 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:37:55 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 15:37:55 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:37:55 2020 - [info] Checking slave configurations..
Fri Oct 9 15:37:55 2020 - [info] Checking replication filtering settings..
Fri Oct 9 15:37:55 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 15:37:55 2020 - [info] Replication filtering check ok.
Fri Oct 9 15:37:55 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 15:37:55 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 15:37:55 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 15:37:55 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 15:37:55 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 15:37:55 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 15:37:55 2020 - [info] OK.
Fri Oct 9 15:37:55 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 15:37:55 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 15:37:55 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 15:37:55 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 15:37:55 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 15:39:46 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct 9 15:39:46 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:39:46 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Master is reachable from 172.16.120.11!
Fri Oct 9 15:39:47 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 15:39:51 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct 9 15:39:52 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:39:52 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:39:55 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:39:55 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:39:58 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:39:58 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:39:58 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:40:01 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:40:01 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:40:01 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 15:40:01 2020 - [info] Executing SSH check script: exit 0
Master is reachable from 172.16.120.11!
Fri Oct 9 15:40:01 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 15:40:04 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:40:04 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:40:06 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Fri Oct 9 15:40:07 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:40:07 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:40:10 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:40:10 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:40:10 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:40:13 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:40:13 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:40:13 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:40:13 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Master is reachable from 172.16.120.11!
Fri Oct 9 15:40:13 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 15:40:14 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 15:40:14 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
[用例测试] master 与 mha manager间网络异常2
Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 正常 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master
ping_type=CONNECT
slave-1
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
此时manager已经无法连通slave-1
#ping centos-2
PING centos-2 (172.16.120.11) 56(84) bytes of data.
64 bytes from centos-2 (172.16.120.11): icmp_seq=1 ttl=64 time=0.349 ms
64 bytes from centos-2 (172.16.120.11): icmp_seq=2 ttl=64 time=0.651 ms
^C
--- centos-2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.349/0.500/0.651/0.151 ms[root@centos-4 15:48:55 /usr/local/share/perl5/MHA]
#ssh centos-2
^C
master
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
结论: 不会failover
Fri Oct 9 15:48:03 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 15:48:05 2020 - [info] GTID failover mode = 1
Fri Oct 9 15:48:05 2020 - [info] Dead Servers:
Fri Oct 9 15:48:05 2020 - [info] Alive Servers:
Fri Oct 9 15:48:05 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:48:05 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 15:48:05 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 15:48:05 2020 - [info] Alive Slaves:
Fri Oct 9 15:48:05 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 15:48:05 2020 - [info] GTID ON
Fri Oct 9 15:48:05 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:48:05 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 15:48:05 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 15:48:05 2020 - [info] GTID ON
Fri Oct 9 15:48:05 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:48:05 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 15:48:05 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:48:05 2020 - [info] Checking slave configurations..
Fri Oct 9 15:48:05 2020 - [info] Checking replication filtering settings..
Fri Oct 9 15:48:05 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 15:48:05 2020 - [info] Replication filtering check ok.
Fri Oct 9 15:48:05 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 15:48:05 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 15:48:05 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 15:48:05 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 15:48:05 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 15:48:05 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 15:48:05 2020 - [info] OK.
Fri Oct 9 15:48:05 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 15:48:05 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 15:48:05 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 15:48:05 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 15:48:05 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 15:50:40 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:50:40 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 15:50:40 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:50:44 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:50:44 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:50:45 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 15:50:45 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 15:50:47 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:50:47 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:50:50 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:50:50 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:50:50 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:50:53 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:50:53 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:50:53 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 15:50:53 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:50:56 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:50:56 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:50:58 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 15:50:58 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 15:50:59 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:50:59 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:51:02 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:51:02 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:51:02 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:51:05 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:51:05 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:51:05 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 15:51:05 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:51:05 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 15:51:05 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 15:51:09 2020 - [warning] Got timeout on Secondary Check child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 435.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
ping_type=INSERT
slave-1
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
此时manager已经无法连通slave-1
#ping centos-2
PING centos-2 (172.16.120.11) 56(84) bytes of data.
64 bytes from centos-2 (172.16.120.11): icmp_seq=1 ttl=64 time=0.349 ms
64 bytes from centos-2 (172.16.120.11): icmp_seq=2 ttl=64 time=0.651 ms
^C
--- centos-2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.349/0.500/0.651/0.151 ms[root@centos-4 15:48:55 /usr/local/share/perl5/MHA]
#ssh centos-2
^C
master
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
因为ping_type=INSERT是长连接,1 所以此时无异常.
kill连接
root@localhost 15:39:45 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 3 | | NULL | 1 | 0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL | Sleep | 1 | | NULL | 0 | 0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL | Sleep | 8 | | NULL | 1 | 0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL | Sleep | 11 | | NULL | 0 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 8 | | NULL | 0 | 0 |
| 1341 | mha | 172.16.120.12:56698 | information_schema | Sleep | 89 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 26 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 3 | | NULL | 0 | 0 |
| 2409 | repler | 172.16.120.11:32918 | NULL | Binlog Dump GTID | 11837 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 2411 | repler | 172.16.120.12:58084 | NULL | Binlog Dump GTID | 11812 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 2627 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 2953 | mha | 172.16.120.13:36174 | NULL | Sleep | 0 | | NULL | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)root@localhost 15:55:18 [dbms_monitor]> kill 2953;
Query OK, 0 rows affected (0.00 sec)
结论: 不会failover
Fri Oct 9 15:52:43 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 15:52:44 2020 - [info] GTID failover mode = 1
Fri Oct 9 15:52:44 2020 - [info] Dead Servers:
Fri Oct 9 15:52:44 2020 - [info] Alive Servers:
Fri Oct 9 15:52:44 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:52:44 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 15:52:44 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 15:52:44 2020 - [info] Alive Slaves:
Fri Oct 9 15:52:44 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 15:52:44 2020 - [info] GTID ON
Fri Oct 9 15:52:44 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:52:44 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 15:52:44 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 15:52:44 2020 - [info] GTID ON
Fri Oct 9 15:52:44 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:52:44 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 15:52:44 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 15:52:44 2020 - [info] Checking slave configurations..
Fri Oct 9 15:52:44 2020 - [info] Checking replication filtering settings..
Fri Oct 9 15:52:44 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 15:52:44 2020 - [info] Replication filtering check ok.
Fri Oct 9 15:52:44 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 15:52:44 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 15:52:45 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 15:52:45 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 15:52:45 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 15:52:45 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 15:52:45 2020 - [info] OK.
Fri Oct 9 15:52:45 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 15:52:45 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 15:52:45 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 15:52:45 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 15:52:45 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 15:55:24 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct 9 15:55:24 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 15:55:24 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:55:29 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 15:55:29 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 15:55:30 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:30 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:55:33 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:33 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:55:36 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:36 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:55:36 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:55:39 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:39 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:55:39 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 15:55:39 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:55:42 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:42 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:55:44 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 15:55:44 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 15:55:45 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:45 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:55:48 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:48 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:55:48 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:55:51 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:51 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:55:51 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 15:55:51 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:55:54 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:54 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 15:55:56 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 15:55:56 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 15:55:57 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:55:57 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 15:56:00 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:56:00 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 15:56:00 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 15:56:03 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 15:56:03 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 15:56:03 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 15:56:03 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 15:56:03 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 15:56:03 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Master is reachable from 172.16.120.11!
Fri Oct 9 15:56:03 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
[用例测试] master 与 mha manager间网络异常3
Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 正常 --> master
Manager <-- 不通 --> S2 <-- 正常 --> master
ping_type=CONNECT
slave-1, slave-2
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
#ping centos-2
PING centos-2 (172.16.120.11) 56(84) bytes of data.
64 bytes from centos-2 (172.16.120.11): icmp_seq=1 ttl=64 time=0.442 ms
64 bytes from centos-2 (172.16.120.11): icmp_seq=2 ttl=64 time=0.441 ms
^C
--- centos-2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.441/0.441/0.442/0.021 ms[root@centos-4 16:44:27 /usr/local/share/perl5/MHA]
#ssh centos-2
^C[root@centos-4 16:44:30 /usr/local/share/perl5/MHA]
#ping centos-3
PING centos-3 (172.16.120.12) 56(84) bytes of data.
64 bytes from centos-3 (172.16.120.12): icmp_seq=1 ttl=64 time=0.335 ms
64 bytes from centos-3 (172.16.120.12): icmp_seq=2 ttl=64 time=0.575 ms
^C
--- centos-3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.335/0.455/0.575/0.120 ms[root@centos-4 16:44:34 /usr/local/share/perl5/MHA]
#ssh centos-3
^C
master
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
结论: 不会failover
Fri Oct 9 16:43:25 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 16:43:26 2020 - [info] GTID failover mode = 1
Fri Oct 9 16:43:26 2020 - [info] Dead Servers:
Fri Oct 9 16:43:26 2020 - [info] Alive Servers:
Fri Oct 9 16:43:26 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:43:26 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 16:43:26 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 16:43:26 2020 - [info] Alive Slaves:
Fri Oct 9 16:43:26 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 16:43:26 2020 - [info] GTID ON
Fri Oct 9 16:43:26 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:43:26 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 16:43:26 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 16:43:26 2020 - [info] GTID ON
Fri Oct 9 16:43:26 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:43:26 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 16:43:26 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:43:26 2020 - [info] Checking slave configurations..
Fri Oct 9 16:43:26 2020 - [info] Checking replication filtering settings..
Fri Oct 9 16:43:26 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 16:43:26 2020 - [info] Replication filtering check ok.
Fri Oct 9 16:43:26 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 16:43:26 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 16:43:26 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 16:43:26 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 16:43:26 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 16:43:26 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 16:43:26 2020 - [info] OK.
Fri Oct 9 16:43:26 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 16:43:26 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 16:43:26 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 16:43:26 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 16:43:26 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 16:45:55 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:45:55 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 16:45:55 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 16:45:59 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:45:59 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 16:46:00 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 16:46:00 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 16:46:02 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:46:02 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 16:46:05 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:46:05 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 16:46:05 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 16:46:08 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:46:08 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 16:46:08 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 16:46:08 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 16:46:11 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:46:11 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 16:46:13 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 16:46:13 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 16:46:14 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:46:14 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 16:46:15 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
ping_type=INSERT
slave-1,slave-2
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
#ping centos-2
PING centos-2 (172.16.120.11) 56(84) bytes of data.
64 bytes from centos-2 (172.16.120.11): icmp_seq=1 ttl=64 time=0.352 ms
^C
--- centos-2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.352/0.352/0.352/0.000 ms[root@centos-4 17:52:38 ~]
#ping centos-3
PING centos-3 (172.16.120.12) 56(84) bytes of data.
64 bytes from centos-3 (172.16.120.12): icmp_seq=1 ttl=64 time=0.221 ms
^C
--- centos-3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.221/0.221/0.221/0.000 ms[root@centos-4 17:52:41 ~]
#ssh centos-2
^C[root@centos-4 17:52:44 ~]
#ssh centos-3
master
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.11 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
由于ping_type=INSERT是长连接, 所以无异常
kill连接
root@localhost 17:48:11 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 14 | | NULL | 0 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 4 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 32 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 4 | | NULL | 1 | 0 |
| 2409 | repler | 172.16.120.11:32918 | NULL | Binlog Dump GTID | 18936 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 2411 | repler | 172.16.120.12:58084 | NULL | Binlog Dump GTID | 18911 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 3192 | proxysql | 172.16.120.11:33786 | NULL | Sleep | 4 | | NULL | 0 | 0 |
| 3238 | proxysql | 172.16.120.12:59006 | NULL | Sleep | 4 | | NULL | 1 | 0 |
| 3245 | proxysql | 172.16.120.11:33888 | NULL | Sleep | 14 | | NULL | 0 | 0 |
| 3262 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 3268 | mha | 172.16.120.13:36868 | NULL | Sleep | 2 | | NULL | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
11 rows in set (0.00 sec)root@localhost 17:53:37 [dbms_monitor]> kill 3268;
Query OK, 0 rows affected (0.00 sec)
结论: 不会failover
Fri Oct 9 17:50:48 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 17:50:49 2020 - [info] GTID failover mode = 1
Fri Oct 9 17:50:49 2020 - [info] Dead Servers:
Fri Oct 9 17:50:49 2020 - [info] Alive Servers:
Fri Oct 9 17:50:49 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 17:50:49 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 17:50:49 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 17:50:49 2020 - [info] Alive Slaves:
Fri Oct 9 17:50:49 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 17:50:49 2020 - [info] GTID ON
Fri Oct 9 17:50:49 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 17:50:49 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 17:50:49 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 17:50:49 2020 - [info] GTID ON
Fri Oct 9 17:50:49 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 17:50:49 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 17:50:49 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 17:50:49 2020 - [info] Checking slave configurations..
Fri Oct 9 17:50:49 2020 - [info] Checking replication filtering settings..
Fri Oct 9 17:50:49 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 17:50:49 2020 - [info] Replication filtering check ok.
Fri Oct 9 17:50:49 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 17:50:49 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 17:50:49 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 17:50:49 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 17:50:49 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 17:50:49 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 17:50:49 2020 - [info] OK.
Fri Oct 9 17:50:49 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 17:50:49 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 17:50:49 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 17:50:49 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 17:50:49 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 17:53:43 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct 9 17:53:43 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 17:53:43 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 17:53:48 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 17:53:48 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 17:53:49 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:53:49 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 17:53:52 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:53:52 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 17:53:55 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:53:55 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 17:53:55 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 17:53:58 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:53:58 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 17:53:58 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 17:53:58 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 17:54:01 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:54:01 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 17:54:03 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 17:54:03 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 17:54:04 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:54:04 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 17:54:07 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:54:07 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 17:54:07 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 17:54:10 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:54:10 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 17:54:10 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 17:54:10 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 17:54:13 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:54:13 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 17:54:15 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
ssh: connect to host 172.16.120.11 port 22: Connection timed out
Monitoring server 172.16.120.11 is NOT reachable!
Fri Oct 9 17:54:15 2020 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Oct 9 17:54:16 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 17:54:16 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 17:54:16 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
[用例测试] master 与 mha manager间网络异常4
Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 不通 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master
ping_type=CONNECT
master
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
结论: 不会failover
Fri Oct 9 16:05:55 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 16:05:56 2020 - [info] GTID failover mode = 1
Fri Oct 9 16:05:56 2020 - [info] Dead Servers:
Fri Oct 9 16:05:56 2020 - [info] Alive Servers:
Fri Oct 9 16:05:56 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:05:56 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 16:05:56 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 16:05:56 2020 - [info] Alive Slaves:
Fri Oct 9 16:05:56 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 16:05:56 2020 - [info] GTID ON
Fri Oct 9 16:05:56 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:05:56 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 16:05:56 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 16:05:56 2020 - [info] GTID ON
Fri Oct 9 16:05:56 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:05:56 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 16:05:56 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:05:56 2020 - [info] Checking slave configurations..
Fri Oct 9 16:05:56 2020 - [info] Checking replication filtering settings..
Fri Oct 9 16:05:56 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 16:05:56 2020 - [info] Replication filtering check ok.
Fri Oct 9 16:05:56 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 16:05:56 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 16:05:56 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 16:05:56 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 16:05:56 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 16:05:56 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 16:05:56 2020 - [info] OK.
Fri Oct 9 16:05:56 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 16:05:56 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 16:05:56 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 16:05:56 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 16:05:56 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 16:06:43 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:06:43 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 16:06:43 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 16:06:47 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:06:47 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 16:06:48 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Master is reachable from 172.16.120.12!
Fri Oct 9 16:06:48 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 16:06:50 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:06:50 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 16:06:53 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:06:53 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 16:06:53 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 16:06:56 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:06:56 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 16:06:56 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 16:06:56 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 16:06:59 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:06:59 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 16:07:01 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Master is reachable from 172.16.120.12!
Fri Oct 9 16:07:01 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 16:07:02 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:07:02 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 16:07:05 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:07:05 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 16:07:05 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 16:07:05 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
ping_type=INSERT
master
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
由于ping_type=INSERT是长连接, 所以无异常
kill连接
root@localhost 15:55:23 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 19 | | NULL | 1 | 0 |
| 1249 | proxysql | 172.16.120.11:59582 | NULL | Sleep | 7 | | NULL | 0 | 0 |
| 1250 | proxysql | 172.16.120.12:56530 | NULL | Sleep | 4 | | NULL | 0 | 0 |
| 1254 | proxysql | 172.16.120.11:59592 | NULL | Sleep | 27 | | NULL | 1 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 14 | | NULL | 1 | 0 |
| 1341 | mha | 172.16.120.12:56698 | information_schema | Sleep | 114 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 51 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 9 | | NULL | 0 | 0 |
| 2409 | repler | 172.16.120.11:32918 | NULL | Binlog Dump GTID | 12703 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 2411 | repler | 172.16.120.12:58084 | NULL | Binlog Dump GTID | 12678 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 2627 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 3022 | mha | 172.16.120.13:36466 | NULL | Sleep | 2 | | NULL | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+-------+---------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)root@localhost 16:09:44 [dbms_monitor]> kill 3022;
Query OK, 0 rows affected (0.00 sec)
结论: 不会failover
Fri Oct 9 16:08:29 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 16:08:30 2020 - [info] GTID failover mode = 1
Fri Oct 9 16:08:30 2020 - [info] Dead Servers:
Fri Oct 9 16:08:30 2020 - [info] Alive Servers:
Fri Oct 9 16:08:30 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:08:30 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 16:08:30 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 16:08:30 2020 - [info] Alive Slaves:
Fri Oct 9 16:08:30 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 16:08:30 2020 - [info] GTID ON
Fri Oct 9 16:08:30 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:08:30 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 16:08:30 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 16:08:30 2020 - [info] GTID ON
Fri Oct 9 16:08:30 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:08:30 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 16:08:30 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 16:08:30 2020 - [info] Checking slave configurations..
Fri Oct 9 16:08:30 2020 - [info] Checking replication filtering settings..
Fri Oct 9 16:08:30 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 16:08:30 2020 - [info] Replication filtering check ok.
Fri Oct 9 16:08:30 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 16:08:30 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 16:08:30 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 16:08:30 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 16:08:30 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 16:08:30 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 16:08:30 2020 - [info] OK.
Fri Oct 9 16:08:30 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 16:08:30 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 16:08:30 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 16:08:30 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 16:08:30 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 16:09:51 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct 9 16:09:51 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 16:09:51 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 16:09:56 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Master is reachable from 172.16.120.12!
Fri Oct 9 16:09:57 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 16:09:57 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:09:57 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 16:10:00 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:10:00 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 16:10:03 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:10:03 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 16:10:03 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 16:10:06 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:10:06 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 16:10:06 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 16:10:06 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 16:10:09 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:10:09 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 16:10:11 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Master is reachable from 172.16.120.12!
Fri Oct 9 16:10:12 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Oct 9 16:10:12 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:10:12 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 16:10:15 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:10:15 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 16:10:15 2020 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Oct 9 16:10:18 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:10:18 2020 - [warning] Connection failed 1 time(s)..
Fri Oct 9 16:10:18 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 16:10:18 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 16:10:21 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 16:10:21 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 16:10:21 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 16:10:21 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Master is reachable from 172.16.120.11!
Fri Oct 9 16:10:21 2020 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
[用例测试] master 与 mha manager间网络异常5
Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 不通 --> master
Manager <-- 正常 --> S2 <-- 不通 --> master
ping_type=CONNECT
master
IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP
结论: 会failover
Fri Oct 9 18:21:13 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 18:21:14 2020 - [info] GTID failover mode = 1
Fri Oct 9 18:21:14 2020 - [info] Dead Servers:
Fri Oct 9 18:21:14 2020 - [info] Alive Servers:
Fri Oct 9 18:21:14 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:21:14 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 18:21:14 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 18:21:14 2020 - [info] Alive Slaves:
Fri Oct 9 18:21:14 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:21:14 2020 - [info] GTID ON
Fri Oct 9 18:21:14 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:21:14 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:21:14 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:21:14 2020 - [info] GTID ON
Fri Oct 9 18:21:14 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:21:14 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:21:14 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:21:14 2020 - [info] Checking slave configurations..
Fri Oct 9 18:21:14 2020 - [info] Checking replication filtering settings..
Fri Oct 9 18:21:14 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 18:21:14 2020 - [info] Replication filtering check ok.
Fri Oct 9 18:21:14 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 18:21:14 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 18:21:14 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 18:21:14 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 18:21:14 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 18:21:14 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 18:21:14 2020 - [info] OK.
Fri Oct 9 18:21:14 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 18:21:14 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 18:21:14 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 18:21:14 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 18:21:14 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 18:22:07 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (4) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 18:22:07 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 18:22:07 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Fri Oct 9 18:22:11 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 18:22:11 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 18:22:12 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Fri Oct 9 18:22:14 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 18:22:14 2020 - [warning] Connection failed 3 time(s)..
Fri Oct 9 18:22:17 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 18:22:17 2020 - [warning] Connection failed 4 time(s)..
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Fri Oct 9 18:22:18 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Oct 9 18:22:18 2020 - [warning] Master is not reachable from health checker!
Fri Oct 9 18:22:18 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Fri Oct 9 18:22:18 2020 - [warning] SSH is NOT reachable.
Fri Oct 9 18:22:18 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Fri Oct 9 18:22:18 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Fri Oct 9 18:22:18 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct 9 18:22:18 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct 9 18:22:19 2020 - [info] GTID failover mode = 1
Fri Oct 9 18:22:19 2020 - [info] Dead Servers:
Fri Oct 9 18:22:19 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:19 2020 - [info] Alive Servers:
Fri Oct 9 18:22:19 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 18:22:19 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 18:22:19 2020 - [info] Alive Slaves:
Fri Oct 9 18:22:19 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:19 2020 - [info] GTID ON
Fri Oct 9 18:22:19 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:19 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:19 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:19 2020 - [info] GTID ON
Fri Oct 9 18:22:19 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:19 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:19 2020 - [info] Checking slave configurations..
Fri Oct 9 18:22:19 2020 - [info] Checking replication filtering settings..
Fri Oct 9 18:22:19 2020 - [info] Replication filtering check ok.
Fri Oct 9 18:22:19 2020 - [info] Master is down!
Fri Oct 9 18:22:19 2020 - [info] Terminating monitoring script.
Fri Oct 9 18:22:19 2020 - [info] Got exit code 20 (Master dead).
Fri Oct 9 18:22:19 2020 - [info] MHA::MasterFailover version 0.58.
Fri Oct 9 18:22:19 2020 - [info] Starting master failover.
Fri Oct 9 18:22:19 2020 - [info]
Fri Oct 9 18:22:19 2020 - [info] * Phase 1: Configuration Check Phase..
Fri Oct 9 18:22:19 2020 - [info]
Fri Oct 9 18:22:20 2020 - [info] GTID failover mode = 1
Fri Oct 9 18:22:20 2020 - [info] Dead Servers:
Fri Oct 9 18:22:20 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:20 2020 - [info] Checking master reachability via MySQL(double check)...
Fri Oct 9 18:22:21 2020 - [info] ok.
Fri Oct 9 18:22:21 2020 - [info] Alive Servers:
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 18:22:21 2020 - [info] Alive Slaves:
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:21 2020 - [info] GTID ON
Fri Oct 9 18:22:21 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:21 2020 - [info] GTID ON
Fri Oct 9 18:22:21 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:21 2020 - [info] Starting GTID based failover.
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Oct 9 18:22:21 2020 - [info] Executing master IP deactivation script:
Fri Oct 9 18:22:21 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stop
Disabling the VIP on old master: 172.16.120.10
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1
Fri Oct 9 18:22:21 2020 - [info] done.
Fri Oct 9 18:22:21 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Oct 9 18:22:21 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:3084318
Fri Oct 9 18:22:21 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:10391-19970
Fri Oct 9 18:22:21 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:21 2020 - [info] GTID ON
Fri Oct 9 18:22:21 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:21 2020 - [info] GTID ON
Fri Oct 9 18:22:21 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:21 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:3084318
Fri Oct 9 18:22:21 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:10391-19970
Fri Oct 9 18:22:21 2020 - [info] Oldest slaves:
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:21 2020 - [info] GTID ON
Fri Oct 9 18:22:21 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:21 2020 - [info] GTID ON
Fri Oct 9 18:22:21 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] Searching new master from slaves..
Fri Oct 9 18:22:21 2020 - [info] Candidate masters from the configuration file:
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:21 2020 - [info] GTID ON
Fri Oct 9 18:22:21 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:21 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:22:21 2020 - [info] GTID ON
Fri Oct 9 18:22:21 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:22:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:22:21 2020 - [info] Non-candidate masters:
Fri Oct 9 18:22:21 2020 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Oct 9 18:22:21 2020 - [info] New master is 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 18:22:21 2020 - [info] Starting master failover..
Fri Oct 9 18:22:21 2020 - [info]
From:
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)To:
172.16.120.11(172.16.120.11:3358) (new master)+--172.16.120.12(172.16.120.12:3358)
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] Waiting all logs to be applied..
Fri Oct 9 18:22:21 2020 - [info] done.
Fri Oct 9 18:22:21 2020 - [info] Getting new master's binlog name and position..
Fri Oct 9 18:22:21 2020 - [info] mysql-bin.000008:3052407
Fri Oct 9 18:22:21 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.11', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Fri Oct 9 18:22:21 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000008, 3052407, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-19970,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Fri Oct 9 18:22:21 2020 - [info] Executing master IP activate script:
Fri Oct 9 18:22:21 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.11 --new_master_ip=172.16.120.11 --new_master_port=3358 --new_master_user='mha' --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.11
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0
Set read_only=0 on the new master.
Creating app user on the new master..
Fri Oct 9 18:22:21 2020 - [info] OK.
Fri Oct 9 18:22:21 2020 - [info] ** Finished master recovery successfully.
Fri Oct 9 18:22:21 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Oct 9 18:22:21 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] -- Slave recovery on host 172.16.120.12(172.16.120.12:3358) started, pid: 68999. Check tmp log /masterha/cls_new//172.16.120.12_3358_20201009182219.log if it takes time..
Fri Oct 9 18:22:22 2020 - [info]
Fri Oct 9 18:22:22 2020 - [info] Log messages from 172.16.120.12 ...
Fri Oct 9 18:22:22 2020 - [info]
Fri Oct 9 18:22:21 2020 - [info] Resetting slave 172.16.120.12(172.16.120.12:3358) and starting replication from the new master 172.16.120.11(172.16.120.11:3358)..
Fri Oct 9 18:22:21 2020 - [info] Executed CHANGE MASTER.
Fri Oct 9 18:22:21 2020 - [info] Slave started.
Fri Oct 9 18:22:21 2020 - [info] gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-19970,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.12(172.16.120.12:3358). Executed 0 events.
Fri Oct 9 18:22:22 2020 - [info] End of log messages from 172.16.120.12.
Fri Oct 9 18:22:22 2020 - [info] -- Slave on host 172.16.120.12(172.16.120.12:3358) started.
Fri Oct 9 18:22:22 2020 - [info] All new slave servers recovered successfully.
Fri Oct 9 18:22:22 2020 - [info]
Fri Oct 9 18:22:22 2020 - [info] * Phase 5: New master cleanup phase..
Fri Oct 9 18:22:22 2020 - [info]
Fri Oct 9 18:22:22 2020 - [info] Resetting slave info on the new master..
Fri Oct 9 18:22:22 2020 - [info] 172.16.120.11: Resetting slave info succeeded.
Fri Oct 9 18:22:22 2020 - [info] Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct 9 18:22:22 2020 - [info] ----- Failover Report -----cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.11(172.16.120.11:3358) succeededMaster 172.16.120.10(172.16.120.10:3358) is down!Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.11(172.16.120.11:3358) as a new master.
172.16.120.11(172.16.120.11:3358): OK: Applying all logs succeeded.
172.16.120.11(172.16.120.11:3358): OK: Activated master IP address.
172.16.120.12(172.16.120.12:3358): OK: Slave started, replicating from 172.16.120.11(172.16.120.11:3358)
172.16.120.11(172.16.120.11:3358): Resetting slave info succeeded.
Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct 9 18:22:22 2020 - [info] Sending mail..
ping_type=INSERT
由于ping_type=INSERT是长连接, 所以无异常
kill连接
root@localhost 18:24:52 [dbms_monitor]> show processlist;
+------+----------+---------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+----------+---------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
| 1248 | proxysql | 172.16.120.10:58672 | NULL | Sleep | 1 | | NULL | 0 | 0 |
| 1264 | proxysql | 172.16.120.12:56552 | NULL | Sleep | 2 | | NULL | 0 | 0 |
| 1343 | mha | 172.16.120.11:59758 | information_schema | Sleep | 74 | | NULL | 0 | 0 |
| 1452 | proxysql | 172.16.120.10:59046 | NULL | Sleep | 2 | | NULL | 1 | 0 |
| 3192 | proxysql | 172.16.120.11:33786 | NULL | Sleep | 3 | | NULL | 1 | 0 |
| 3238 | proxysql | 172.16.120.12:59006 | NULL | Sleep | 1 | | NULL | 1 | 0 |
| 3245 | proxysql | 172.16.120.11:33888 | NULL | Sleep | 2 | | NULL | 0 | 0 |
| 3262 | root | localhost | dbms_monitor | Query | 0 | starting | show processlist | 0 | 0 |
| 3357 | repler | 172.16.120.11:34036 | NULL | Binlog Dump GTID | 142 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 3359 | repler | 172.16.120.12:59166 | NULL | Binlog Dump GTID | 123 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 3364 | mha | 172.16.120.13:37512 | NULL | Sleep | 2 | | NULL | 0 | 0 |
+------+----------+---------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
11 rows in set (0.00 sec)root@localhost 18:26:25 [dbms_monitor]> kill 3364;
Query OK, 0 rows affected (0.01 sec)
结论: 长连接断开后才会failover, 否则不会failover
Fri Oct 9 18:25:33 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Oct 9 18:25:34 2020 - [info] GTID failover mode = 1
Fri Oct 9 18:25:34 2020 - [info] Dead Servers:
Fri Oct 9 18:25:34 2020 - [info] Alive Servers:
Fri Oct 9 18:25:34 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:25:34 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 18:25:34 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 18:25:34 2020 - [info] Alive Slaves:
Fri Oct 9 18:25:34 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:25:34 2020 - [info] GTID ON
Fri Oct 9 18:25:34 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:25:34 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:25:34 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:25:34 2020 - [info] GTID ON
Fri Oct 9 18:25:34 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:25:34 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:25:34 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:25:34 2020 - [info] Checking slave configurations..
Fri Oct 9 18:25:34 2020 - [info] Checking replication filtering settings..
Fri Oct 9 18:25:34 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Oct 9 18:25:34 2020 - [info] Replication filtering check ok.
Fri Oct 9 18:25:34 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Oct 9 18:25:34 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Oct 9 18:25:35 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Fri Oct 9 18:25:35 2020 - [info]
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)Fri Oct 9 18:25:35 2020 - [info] Checking master_ip_failover_script status:
Fri Oct 9 18:25:35 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358
Fri Oct 9 18:25:35 2020 - [info] OK.
Fri Oct 9 18:25:35 2020 - [warning] shutdown_script is not defined.
Fri Oct 9 18:25:35 2020 - [info] Set master ping interval 3 seconds.
Fri Oct 9 18:25:35 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Fri Oct 9 18:25:35 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Fri Oct 9 18:25:35 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Oct 9 18:26:44 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Oct 9 18:26:44 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12 --user=root --master_host=172.16.120.10 --master_ip=172.16.120.10 --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Fri Oct 9 18:26:44 2020 - [info] Executing SSH check script: exit 0
Fri Oct 9 18:26:49 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.16.120.10! at /usr/local/share/perl5/MHA/HealthCheck.pm line 344.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Fri Oct 9 18:26:50 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 18:26:50 2020 - [warning] Connection failed 2 time(s)..
Fri Oct 9 18:26:53 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 18:26:53 2020 - [warning] Connection failed 3 time(s)..
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Fri Oct 9 18:26:54 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Oct 9 18:26:56 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (4))
Fri Oct 9 18:26:56 2020 - [warning] Connection failed 4 time(s)..
Fri Oct 9 18:26:56 2020 - [warning] Master is not reachable from health checker!
Fri Oct 9 18:26:56 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Fri Oct 9 18:26:56 2020 - [warning] SSH is NOT reachable.
Fri Oct 9 18:26:56 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Fri Oct 9 18:26:56 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Fri Oct 9 18:26:56 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct 9 18:26:56 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Fri Oct 9 18:26:57 2020 - [info] GTID failover mode = 1
Fri Oct 9 18:26:57 2020 - [info] Dead Servers:
Fri Oct 9 18:26:57 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:57 2020 - [info] Alive Servers:
Fri Oct 9 18:26:57 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 18:26:57 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 18:26:57 2020 - [info] Alive Slaves:
Fri Oct 9 18:26:57 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:57 2020 - [info] GTID ON
Fri Oct 9 18:26:57 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:57 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:57 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:57 2020 - [info] GTID ON
Fri Oct 9 18:26:57 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:57 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:57 2020 - [info] Checking slave configurations..
Fri Oct 9 18:26:57 2020 - [info] Checking replication filtering settings..
Fri Oct 9 18:26:57 2020 - [info] Replication filtering check ok.
Fri Oct 9 18:26:57 2020 - [info] Master is down!
Fri Oct 9 18:26:57 2020 - [info] Terminating monitoring script.
Fri Oct 9 18:26:57 2020 - [info] Got exit code 20 (Master dead).
Fri Oct 9 18:26:57 2020 - [info] MHA::MasterFailover version 0.58.
Fri Oct 9 18:26:57 2020 - [info] Starting master failover.
Fri Oct 9 18:26:57 2020 - [info]
Fri Oct 9 18:26:57 2020 - [info] * Phase 1: Configuration Check Phase..
Fri Oct 9 18:26:57 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] GTID failover mode = 1
Fri Oct 9 18:26:58 2020 - [info] Dead Servers:
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Alive Servers:
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.12(172.16.120.12:3358)
Fri Oct 9 18:26:58 2020 - [info] Alive Slaves:
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:58 2020 - [info] GTID ON
Fri Oct 9 18:26:58 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:58 2020 - [info] GTID ON
Fri Oct 9 18:26:58 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:58 2020 - [info] Starting GTID based failover.
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Oct 9 18:26:58 2020 - [info] Executing master IP deactivation script:
Fri Oct 9 18:26:58 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stop
Disabling the VIP on old master: 172.16.120.10
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1
Fri Oct 9 18:26:58 2020 - [info] done.
Fri Oct 9 18:26:58 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Oct 9 18:26:58 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:3101017
Fri Oct 9 18:26:58 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:19971-20041
Fri Oct 9 18:26:58 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:58 2020 - [info] GTID ON
Fri Oct 9 18:26:58 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:58 2020 - [info] GTID ON
Fri Oct 9 18:26:58 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:58 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:3101017
Fri Oct 9 18:26:58 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:19971-20041
Fri Oct 9 18:26:58 2020 - [info] Oldest slaves:
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:58 2020 - [info] GTID ON
Fri Oct 9 18:26:58 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:58 2020 - [info] GTID ON
Fri Oct 9 18:26:58 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] Searching new master from slaves..
Fri Oct 9 18:26:58 2020 - [info] Candidate masters from the configuration file:
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.11(172.16.120.11:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:58 2020 - [info] GTID ON
Fri Oct 9 18:26:58 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:58 2020 - [info] 172.16.120.12(172.16.120.12:3358) Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Fri Oct 9 18:26:58 2020 - [info] GTID ON
Fri Oct 9 18:26:58 2020 - [info] Replicating from 172.16.120.10(172.16.120.10:3358)
Fri Oct 9 18:26:58 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Oct 9 18:26:58 2020 - [info] Non-candidate masters:
Fri Oct 9 18:26:58 2020 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Oct 9 18:26:58 2020 - [info] New master is 172.16.120.11(172.16.120.11:3358)
Fri Oct 9 18:26:58 2020 - [info] Starting master failover..
Fri Oct 9 18:26:58 2020 - [info]
From:
172.16.120.10(172.16.120.10:3358) (current master)+--172.16.120.11(172.16.120.11:3358)+--172.16.120.12(172.16.120.12:3358)To:
172.16.120.11(172.16.120.11:3358) (new master)+--172.16.120.12(172.16.120.12:3358)
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] Waiting all logs to be applied..
Fri Oct 9 18:26:58 2020 - [info] done.
Fri Oct 9 18:26:58 2020 - [info] Getting new master's binlog name and position..
Fri Oct 9 18:26:58 2020 - [info] mysql-bin.000008:3068991
Fri Oct 9 18:26:58 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.11', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Fri Oct 9 18:26:58 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000008, 3068991, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20041,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Fri Oct 9 18:26:58 2020 - [info] Executing master IP activate script:
Fri Oct 9 18:26:58 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.11 --new_master_ip=172.16.120.11 --new_master_port=3358 --new_master_user='mha' --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.11
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0
Set read_only=0 on the new master.
Creating app user on the new master..
Fri Oct 9 18:26:58 2020 - [info] OK.
Fri Oct 9 18:26:58 2020 - [info] ** Finished master recovery successfully.
Fri Oct 9 18:26:58 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Oct 9 18:26:58 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] -- Slave recovery on host 172.16.120.12(172.16.120.12:3358) started, pid: 69850. Check tmp log /masterha/cls_new//172.16.120.12_3358_20201009182657.log if it takes time..
Fri Oct 9 18:26:59 2020 - [info]
Fri Oct 9 18:26:59 2020 - [info] Log messages from 172.16.120.12 ...
Fri Oct 9 18:26:59 2020 - [info]
Fri Oct 9 18:26:58 2020 - [info] Resetting slave 172.16.120.12(172.16.120.12:3358) and starting replication from the new master 172.16.120.11(172.16.120.11:3358)..
Fri Oct 9 18:26:58 2020 - [info] Executed CHANGE MASTER.
Fri Oct 9 18:26:58 2020 - [info] Slave started.
Fri Oct 9 18:26:58 2020 - [info] gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20041,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.12(172.16.120.12:3358). Executed 0 events.
Fri Oct 9 18:26:59 2020 - [info] End of log messages from 172.16.120.12.
Fri Oct 9 18:26:59 2020 - [info] -- Slave on host 172.16.120.12(172.16.120.12:3358) started.
Fri Oct 9 18:26:59 2020 - [info] All new slave servers recovered successfully.
Fri Oct 9 18:26:59 2020 - [info]
Fri Oct 9 18:26:59 2020 - [info] * Phase 5: New master cleanup phase..
Fri Oct 9 18:26:59 2020 - [info]
Fri Oct 9 18:26:59 2020 - [info] Resetting slave info on the new master..
Fri Oct 9 18:26:59 2020 - [info] 172.16.120.11: Resetting slave info succeeded.
Fri Oct 9 18:26:59 2020 - [info] Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct 9 18:26:59 2020 - [info] ----- Failover Report -----cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.11(172.16.120.11:3358) succeededMaster 172.16.120.10(172.16.120.10:3358) is down!Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.11(172.16.120.11:3358) as a new master.
172.16.120.11(172.16.120.11:3358): OK: Applying all logs succeeded.
172.16.120.11(172.16.120.11:3358): OK: Activated master IP address.
172.16.120.12(172.16.120.12:3358): OK: Slave started, replicating from 172.16.120.11(172.16.120.11:3358)
172.16.120.11(172.16.120.11:3358): Resetting slave info succeeded.
Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Fri Oct 9 18:26:59 2020 - [info] Sending mail..
MHA Failover测试-上相关推荐
- 计算机测试怎么提交,Win7电脑怎么测试上传速度?
做网站的人都知道上传速度是很重要的,因为太差的上传速度会影响工作的进度,所以他们经常要对上传速度进行测试,但是有一些新手不知道Win7电脑怎么测试上传速度?为此小编赶紧整理了以下教程,不知道的朋友赶紧 ...
- postman测试 上传下载文件
postman测试 上传下载文件 1 测试上传文件 选择 Body 选择 form-data 参数 key 值 填写后 在后面下拉选择 file value 点击 Select Files 选择需要上 ...
- wsbm服务器错误怎么修复,如何使用ab测试上传文件
1. 简单了解ab测试 ab是Apache超文本传输协议(HTTP)的性能测试工具.可以使用工具对网络接口进行压力测试,以判断网络接口的性能. 一般对网络接口进行压力测试,需要关注几个重要的指标,吞吐 ...
- 【springboot】上传并解析excel表,使用postman测试上传文件,解决excel版本报错问题
声明:博客代码只是实际项目的一部分,项目是前后端分离的,这篇博客中将记录如何使用工具类中提供的解析excel表格数据,并使用postman测试上传excel表的接口. 在项目pom.xml文件中添加依 ...
- Postman测试上传/下载接口
Postman测试上传/下载接口 1.Postman测试上传接口 2.Postman测试下载接口
- fastdfs测试上传
fastdfs测试上传 1.修改配置文件 进入到/etc/fdfs 下载[root@localhost fdfs]# sz client.conf.sample base_path=/home ...
- MySQL MHA 搭建测试
一:背景介绍 MHA(Master HA)是一款开源的MySQL的高可用工具,能在MySQL主从复制的基础上,实现自动化主服务器故障转移.虽然MHA试图从宕机的主服务器上保存二进制日志,但并不是总是可 ...
- Oracle 1204 RAC failover 测试 (一)
1. 检查RAC是否正常 [oracle@racdb01 ~]$ crs_stat -t Name Type Target State Host ...
- 百度 Hydra 工具在移动端 UI 兼容性测试上的高效应用
导读:尽管自动化测试技术日新月异,但是自动化case构建成本.执行稳定性等问题的存在,使手工测试依然移动端质量保证的重要手段.传统手工测试必须通过人工操作的方式执行测试用例,效率提升依赖测试人员的操作 ...
- 非核心版本的计算机上_软件测试之兼容性测试(上)
对于基于计算机平台的软件,在测试过程中必须考虑软.硬件的兼容性,在设计测试用例的过程中必须考虑数据转换或转移的问题,应该尽力发现其可能带来的错误.不仅是基于计算机平台的软件,对于嵌入式软件也一样,在软 ...
最新文章
- 简明python教程 --C++程序员的视角(七):异常
- python中修改列表元素的方法
- 600兆的html文件怎么打开,如何打开容量600多兆的文本文件
- 关于 PHP 5.4 你所需要知道的
- 【CodeForces - 340D】Bubble Sort Graph (思维,nlogn最长上升子序列类问题)
- Java实现二维码生成
- 在浏览器中运行java applet
- 在命令行中使用ssh连接远程服务器
- 数组中只出现一次的数字(python解法)
- mysql多库备份_Mysql 之多库备份
- 检索计算机中word文件的词,电脑word搜索工具
- vb.net怎么和mysql连接_解析VB.NET如何连接数据库
- ConfuserEx
- Payoneer取人民币全过程(ATM)
- Android MediaPlayer+SurfaceView播放视频 (异常处理)
- armbian php7.1_N1 + armbian+宝塔面板+apache+MySQL+php
- 算法设计与分析:分治思想 - 入门
- 获取下一个周几的日期
- 苹果Mac如何优化电池续航能力?
- 信息量和信息熵的理解
热门文章
- facebook 推特. Line 领英 分享功能 带图标(最全,实测可用)
- 抖音小店无货源,营业执照怎么办理?该怎么选择经营范围?
- springboot的学习(1)
- win7音量图标不见了怎么办
- EHCI主机控制器--周期帧列表(periodic frame list)
- linux之OPERATION(运维)一
- html内容太大超过盒子范围,HTML内容超过div宽度不能自动换行解决方法
- Arduino - 改造楼道门禁,使用密码开门
- 第三方统计分析埋点工具对比,神策、Ptmind、GrowingIO、国双,还有谷歌分析,谁更好?...
- 【蓝凌系统】OA首页最新知识_正文表格模板