MySQL-MHA集群部署(binlog复制)
MHA的理论知识网上有很多教程,这里不会说明;仅推荐博客链接!
MHA的理论说明:http://www.ywnds.com/?p=8094
MHA的安装包需要在google上面下载,或者就是csdn上面花钱下载!
详细说明怎么搭建MHA
#四台服务器分配如下 10.0.102.214 test3 MHA的管理节点 10.0.102.204 test2 master节点 10.0.102.179 test1 slave节点(作为备用的管理节点) 10.0.102.221 mgt01 slave节点 #这里我们一主两从的架构基于binlog复制,首先需要配置好一主两从的架构。#需要注意的是,作为备用主的slave服务器需要开通二进制日志和配置log_slave_updates参数#MySQL基于binglog复制过程如下: https://www.cnblogs.com/wxzhe/p/10051114.html#部署过程中不会说明怎么搭建MySQL主从架构
第一步:搭建好主从架构,也就是一主两从的架构。【MHA的官方不支持一主一从,但是传闻阿里修改了源码使其支持一主一从,这里使用官方的结构】
需要注意的是要在作为备用主的服务器添加如下配置:
log-bin= #开启二进制日志 log_slave_updates #把SQL线程的动作写入二进制日志
第二步:安装MHA
在MHA的集群的所有服务器上需要安装MHA-node节点,
[root@mgt01 ~]# yum install epel-release perl-DBD-MySQL perl-CPAN -y #安装依赖包 [root@mgt01 src]# ls mha4mysql-node-0.56.tar.gz [root@mgt01 src]# tar zxvf mha4mysql-node-0.56.tar.gz -C ../ #解压 [root@mgt01 src]# cd ../ [root@mgt01 local]# cd mha4mysql-node-0.56/ [root@mgt01 mha4mysql-node-0.56]# ls AUTHORS bin COPYING debian inc lib Makefile.PL MANIFEST META.yml README rpm t [root@mgt01 mha4mysql-node-0.56]# perl Makefile.PL #编译 [root@mgt01 mha4mysql-node-0.56]# make & make install #安装 [root@mgt01 ~]# cd /usr/local/bin #安装完成之后,会在/usr/local/bin目录下面生成如下文件[root@mgt01 bin]# lsapply_diff_relay_logs filter_mysqlbinlog purge_relay_logs save_binary_logs[root@mgt01 bin]# lltotal 44-r-xr-xr-x 1 root root 16367 Dec 8 10:29 apply_diff_relay_logs-r-xr-xr-x 1 root root 4807 Dec 8 10:29 filter_mysqlbinlog-r-xr-xr-x 1 root root 8261 Dec 8 10:29 purge_relay_logs-r-xr-xr-x 1 root root 7525 Dec 8 10:29 save_binary_logs
注意上面的这一步操作,需要在MHA集群的每个节点上都执行!
安装MHA-manager,也就是MHA集群的管理节点!
#首先安装MHA-manager需要安装的包yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y #安装MHA-managertar zxvf mha4mysql-manager-0.56.tar.gz cd mha4mysql-manager-0.56/ perl Makefile.PL make & make installcp -frp samples/scripts/* /usr/local/bin #把这些脚本文件拷贝到/usr/local/bin下面,这样不用再添加环境变量
master_ip_failover:故障自动切换时对vip管理的脚本,不是必须。如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移。master_ip_online_change:在线切换时对vip的管理,不是必须,同样可以自行编写简单的shell完成。power_manager:故障发生后关闭主机的脚本,不是必须。send_report:因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成。
脚本说明
第三步:配置MHA
配置MHA这一步主要做的就是写MHA的配置文件,然后创建对应的目录!
在上面的samples/ 目录下还有一个目录conf,里面有两个配置文件模板:
[root@test3 conf]# ls app1.cnf masterha_default.cnf [root@test3 conf]#
把配置文件模板拷贝到/etc下面:
mkdir /etc/masterha -p #在/etc下面创建MHA使用的配置文件的目录【名字可以随意,最好可以标识目录的内容】 cp * /etc/masterha/
首先编辑masterha_default.cnf文件:
[root@mha ~]# cat /etc/masterha_default.cnf [server default] # 设置监控用户mha,需要有授权 user=mha # 设置mysql中root用户的密码,这个密码是前文中创建监控用户的那个密码; password=123456 # 设置复制环境中的复制用户名; repl_user=repl # 设置复制用户的密码; repl_password=123456 # 设置ssh的登录用户名; ssh_user=root # 设置ssh的登录端口(不写默认22端口); ssh_port=22 # 设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行failover; ping_interval=3 # 设置mysql master保存binlog的目录,以便MHA可以找到master的二进制日志; master_binlog_dir= /data/mysql/ # 设置mysql master在发生切换时保存binlog的目录,在mysql master上创建这个目录(不写默认为/var/tmp); remote_workdir=/data/log/masterha# 一旦MHA到mysql01的监控之间出现问题,MHA Manager将会尝试从mysql02,mysql03登录到mysql01; secondary_check_script= masterha_secondary_check -s test1 -s mgt01 --user=root --port=22 --master_host=test2 --master_port=3306 # 设置自动failover时候的切换脚本(脚本有瑕疵,需要自行修改); #master_ip_failover_script=/usr/local/bin/master_ip_failover # 设置手动切换时候的切换脚本(脚本有瑕疵,需要自行修改); #master_ip_online_change_script=/usr/local/bin/master_ip_online_change # 设置发生切换后发送的报警的脚本(可自行编写); #report_script=/usr/local/bin/send_report # 设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用); #shutdown_script=""
masterha_default.cnf配置参数说明
上面给出了masterha_default.cnf每个配置参数的说明情况,下面这个是我的配置
[server default] user=root password=123456 ssh_user=root ssh_port=22 ping_interval=3 repl_user=repl repl_password=123456master_binlog_dir= /data/mysql/remote_workdir=/data/log/masterhasecondary_check_script= masterha_secondary_check -s test1 -s mgt01 --user=root --port=22 --master_host=test2 --master_port=3306master_ip_failover_script= /usr/local/bin/master_ip_failover # shutdown_script= /script/masterha/power_manager report_script= /usr/local/bin/send_report # master_ip_online_change_script= /script/masterha/master_ip_online_change
然后编辑配置app1.conf文件
只针对单个应用生效,但是app1.cnf的配置参数优先级高于masterha_default.cnf,一般都会在app1.cnf包含masterha_default.cnf所有参数。MHA可以监控多个主从的集群,每个集群的配置文件可以用名字区分,因为这里只有一个集群,因此只有app1.conf一个文件!
[root@test3 masterha]# cat app1.cnf manager_log=/data/log/app1/manager.logmanager_workdir=/data/log/app1master_binlog_dir=/data/mysqlpassword=123456ping_interval=3remote_workdir=/data/log/masterharepl_password=123456repl_user=replreport_script=/usr/local/bin/send_reportsecondary_check_script=masterha_secondary_check -s test1 -s mgt01 --user=root --port=22 --master_host=test2 --master_port=3306ssh_port=22ssh_user=rootuser=root [server1]hostname=10.0.102.204port=3306candidate_master=1 [server2]candidate_master=1hostname=10.0.102.179port=3306 [server3]hostname=10.0.102.221no_master=1port=3306
这个配置文件的参数基本都比较好理解,需要注意的是,配置文件指定的目录都需要另行创建!
mkdir -p /data/log/masterha mkdir /data/log/app1
当candidate_master设置为1时,表示为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave。默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master check_repl_delay=0。
同样设置为候选master的slave一定要开启二进制日志和log_slave_updates参数!
设置relay log的清除方式(在每个slave节点上)
在配置文件中加上relay_log_purge=0,需要重启才能生效!
注意:MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在Linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)
MHA节点中包含了pure_relay_logs命令工具,它可以为中继日志创建硬链接,执行SET GLOBAL relay_log_purge=1,等待几秒钟以便SQL线程切换到新的中继日志,再执行SET GLOBAL relay_log_purge=0。
pure_relay_logs脚本参数如下所示:
--user mysql #用户名; --password mysql #密码; --port #端口号; --workdir #指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除; --disable_relay_log_purge #默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF;
设置定期清理relay脚本
[root@mgt01 ~]# cat !$ cat purge_relay.sh #!/bin/bash user=root passwd=123456 port=3306 log_dir='/data/masterha/log' work_dir='/data' purge='/usr/local/bin/purge_relay_logs'if [ ! -d $log_dir ];thenmkdir $log_dir -p fi$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
把以上脚本加入定时计划任务:
[root@mgt01 log]# crontab -l * 4 * * * sh /root/purge_relay.sh
purge_relay_logs脚本删除中继日志不会阻塞SQL线程
第四步: 设置ssh无密码认证
MHA的管理节点可以无密码访问集群中的其余节点!
MySQL集群需要互相之间可以无密码访问!
ssh无密码访问不再写过程。
使用MHA检查ssh是否成功
[root@test3 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
若成功则进行下一步,检查复制
【有一些博客提到:暂时先注释配置文件中master_ip_failover_script= /usr/local/bin/master_ip_failover这个选项,不然这个检查过不去的。但是我测试时候没有注释,也是可以检查成功的】
[root@test3 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
[root@test3 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf Sat Dec 8 17:03:38 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Dec 8 17:03:38 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sat Dec 8 17:03:38 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sat Dec 8 17:03:38 2018 - [info] MHA::MasterMonitor version 0.56. Sat Dec 8 17:03:38 2018 - [info] GTID failover mode = 0 Sat Dec 8 17:03:38 2018 - [info] Dead Servers: Sat Dec 8 17:03:38 2018 - [info] Alive Servers: Sat Dec 8 17:03:38 2018 - [info] 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:03:38 2018 - [info] 10.0.102.179(10.0.102.179:3306) Sat Dec 8 17:03:38 2018 - [info] 10.0.102.221(10.0.102.221:3306) Sat Dec 8 17:03:38 2018 - [info] Alive Slaves: Sat Dec 8 17:03:38 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sat Dec 8 17:03:38 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:03:38 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sat Dec 8 17:03:38 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22 (oldest major version between slaves) log-bin:disabled Sat Dec 8 17:03:38 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:03:38 2018 - [info] Not candidate for the new Master (no_master is set) Sat Dec 8 17:03:38 2018 - [info] Current Alive Master: 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:03:38 2018 - [info] Checking slave configurations.. Sat Dec 8 17:03:38 2018 - [info] read_only=1 is not set on slave 10.0.102.179(10.0.102.179:3306). Sat Dec 8 17:03:38 2018 - [info] read_only=1 is not set on slave 10.0.102.221(10.0.102.221:3306). Sat Dec 8 17:03:38 2018 - [warning] log-bin is not set on slave 10.0.102.221(10.0.102.221:3306). This host cannot be a master. Sat Dec 8 17:03:38 2018 - [info] Checking replication filtering settings.. Sat Dec 8 17:03:38 2018 - [info] binlog_do_db= , binlog_ignore_db= Sat Dec 8 17:03:38 2018 - [info] Replication filtering check ok. Sat Dec 8 17:03:38 2018 - [info] GTID (with auto-pos) is not supported Sat Dec 8 17:03:38 2018 - [info] Starting SSH connection tests.. Sat Dec 8 17:03:40 2018 - [info] All SSH connection tests passed successfully. Sat Dec 8 17:03:40 2018 - [info] Checking MHA Node version.. Sat Dec 8 17:03:40 2018 - [info] Version check ok. Sat Dec 8 17:03:40 2018 - [info] Checking SSH publickey authentication settings on the current master.. Sat Dec 8 17:03:40 2018 - [info] HealthCheck: SSH to 10.0.102.204 is reachable. Sat Dec 8 17:03:41 2018 - [info] Master MHA Node version is 0.56. Sat Dec 8 17:03:41 2018 - [info] Checking recovery script configurations on 10.0.102.204(10.0.102.204:3306).. Sat Dec 8 17:03:41 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/log/masterha/save_binary_logs_test --manager_version=0.56 --start_file=test2-bin.000007 Sat Dec 8 17:03:41 2018 - [info] Connecting to root@10.0.102.204(10.0.102.204:22).. Creating /data/log/masterha if not exists.. ok.Checking output directory is accessible or not..ok.Binlog found at /data/mysql, up to test2-bin.000007 Sat Dec 8 17:03:41 2018 - [info] Binlog setting check done. Sat Dec 8 17:03:41 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sat Dec 8 17:03:41 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.102.179 --slave_ip=10.0.102.179 --slave_port=3306 --workdir=/data/log/masterha --target_version=5.7.22-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx Sat Dec 8 17:03:41 2018 - [info] Connecting to root@10.0.102.179(10.0.102.179:22).. Checking slave recovery environment settings..Opening /data/mysql/relay-log.info ... ok.Relay log found at /data/mysql, up to test1-relay-bin.000002Temporary relay log file is /data/mysql/test1-relay-bin.000002Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.done.Testing mysqlbinlog output.. done.Cleaning up test file(s).. done. Sat Dec 8 17:03:41 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.102.221 --slave_ip=10.0.102.221 --slave_port=3306 --workdir=/data/log/masterha --target_version=5.7.22 --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx Sat Dec 8 17:03:41 2018 - [info] Connecting to root@10.0.102.221(10.0.102.221:22).. Checking slave recovery environment settings..Opening /data/mysql/relay-log.info ... ok.Relay log found at /data/mysql, up to mgt01-relay-bin.000002Temporary relay log file is /data/mysql/mgt01-relay-bin.000002Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.done.Testing mysqlbinlog output.. done.Cleaning up test file(s).. done. Sat Dec 8 17:03:41 2018 - [info] Slaves settings check done. Sat Dec 8 17:03:41 2018 - [info] 10.0.102.204(10.0.102.204:3306) (current master)+--10.0.102.179(10.0.102.179:3306)+--10.0.102.221(10.0.102.221:3306)Sat Dec 8 17:03:41 2018 - [info] Checking replication health on 10.0.102.179.. Sat Dec 8 17:03:41 2018 - [info] ok. Sat Dec 8 17:03:41 2018 - [info] Checking replication health on 10.0.102.221.. Sat Dec 8 17:03:41 2018 - [info] ok. Sat Dec 8 17:03:41 2018 - [warning] master_ip_failover_script is not defined. Sat Dec 8 17:03:41 2018 - [warning] shutdown_script is not defined. Sat Dec 8 17:03:41 2018 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.详细过程
详细过程
遇到过一次是复制检查时,总是会dead servers下面有一个服务器,但是集群里面是正常的,各种都是正常的,后来发现是本地的解析出错!【/etc/hosts文化和ssh目录下面的known_hosts文件,新建的服务器一般不会出现这问题】
查看MHA-manger的状态
[root@test3 masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING). [root@test3 masterha]#
开启MHa-manager
[root@test3 masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover & #参数说明remove_dead_master_conf:设置了这个参数后,如果MHA failover结束后,MHA Manager会自动在配置文件中删除dead master的相关项。如果不设置,由于dead master的配置还存在文件中,那么当MHA failover后,当再次restart MHA manager后,会报错(there is a dead slave previous dead master)。ignore_last_failover:默认情况下,如果一个或者多个slave down掉了,master monitor进程就会停掉,就算你设置了ignore_fail。如果设置了–ignore_fail_on_start参数,ignore_fail标记了slave挂掉也不会让master monitor进程停掉。
启动之后查看状态:
[root@test3 masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:18866) is running(0:PING_OK), master:10.0.102.204
如果启动没有报错,那么一个MHA的集群就已经搭建成功!
关闭MHA-manager可以使用如下命令:
masterha_stop --conf=/etc/masterha/app1.cnf
最后:我们进行一个failover测试!
停掉MySQL主从集群中的主,查看是否会自动切换到从!在测试主从之前最后可以写入一点数据,这里我利用tpcc写入了一些数据!
./tpcc_load -h 10.0.102.204 -P 3306 -d tpcc_test -u root -p 123456 -w 3 tpcc的测试使用:https://www.cnblogs.com/wxzhe/p/10027474.html
停掉当前的主服务器!
[root@test2 ~]# service mysqld stop Shutting down MySQL............ SUCCESS!
然后查看MHA的管理日志
Sat Dec 8 17:21:50 2018 - [info] Executing secondary network check script: masterha_secondary_check -s test1 -s mgt01 --user=root --port=22 --master_host=test2 --master_port=3306 --user=root --master_host=10.0.102.204 --master_ip=10.0.102.204 --master_port=3306 --master_user=root --master_password=123456 --ping_type=SELECT Sat Dec 8 17:21:50 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/log/masterha/save_binary_logs_test --manager_version=0.56 --binlog_prefix=test2-bin Monitoring server test1 is reachable, Master is not reachable from test1. OK. Sat Dec 8 17:21:50 2018 - [info] HealthCheck: SSH to 10.0.102.204 is reachable. Monitoring server mgt01 is reachable, Master is not reachable from mgt01. OK. Sat Dec 8 17:21:50 2018 - [info] Master is not reachable from all other monitoring servers. Failover should start. Sat Dec 8 17:21:53 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sat Dec 8 17:21:53 2018 - [warning] Connection failed 2 time(s).. Sat Dec 8 17:21:56 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sat Dec 8 17:21:56 2018 - [warning] Connection failed 3 time(s).. Sat Dec 8 17:21:59 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sat Dec 8 17:21:59 2018 - [warning] Connection failed 4 time(s).. Sat Dec 8 17:21:59 2018 - [warning] Master is not reachable from health checker! Sat Dec 8 17:21:59 2018 - [warning] Master 10.0.102.204(10.0.102.204:3306) is not reachable! Sat Dec 8 17:21:59 2018 - [warning] SSH is reachable. Sat Dec 8 17:21:59 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Sat Dec 8 17:21:59 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Dec 8 17:21:59 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sat Dec 8 17:21:59 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sat Dec 8 17:21:59 2018 - [info] GTID failover mode = 0 Sat Dec 8 17:21:59 2018 - [info] Dead Servers: Sat Dec 8 17:21:59 2018 - [info] 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Alive Servers: Sat Dec 8 17:21:59 2018 - [info] 10.0.102.179(10.0.102.179:3306) Sat Dec 8 17:21:59 2018 - [info] 10.0.102.221(10.0.102.221:3306) Sat Dec 8 17:21:59 2018 - [info] Alive Slaves: Sat Dec 8 17:21:59 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sat Dec 8 17:21:59 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sat Dec 8 17:21:59 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22 (oldest major version between slaves) log-bin:disabled Sat Dec 8 17:21:59 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Not candidate for the new Master (no_master is set) Sat Dec 8 17:21:59 2018 - [info] Checking slave configurations.. Sat Dec 8 17:21:59 2018 - [info] read_only=1 is not set on slave 10.0.102.179(10.0.102.179:3306). Sat Dec 8 17:21:59 2018 - [info] read_only=1 is not set on slave 10.0.102.221(10.0.102.221:3306). Sat Dec 8 17:21:59 2018 - [warning] log-bin is not set on slave 10.0.102.221(10.0.102.221:3306). This host cannot be a master. Sat Dec 8 17:21:59 2018 - [info] Checking replication filtering settings.. Sat Dec 8 17:21:59 2018 - [info] Replication filtering check ok. Sat Dec 8 17:21:59 2018 - [info] Master is down! Sat Dec 8 17:21:59 2018 - [info] Terminating monitoring script. Sat Dec 8 17:21:59 2018 - [info] Got exit code 20 (Master dead). Sat Dec 8 17:21:59 2018 - [info] MHA::MasterFailover version 0.56. Sat Dec 8 17:21:59 2018 - [info] Starting master failover. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] * Phase 1: Configuration Check Phase.. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] GTID failover mode = 0 Sat Dec 8 17:21:59 2018 - [info] Dead Servers: Sat Dec 8 17:21:59 2018 - [info] 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Checking master reachability via MySQL(double check)... Sat Dec 8 17:21:59 2018 - [info] ok. Sat Dec 8 17:21:59 2018 - [info] Alive Servers: Sat Dec 8 17:21:59 2018 - [info] 10.0.102.179(10.0.102.179:3306) Sat Dec 8 17:21:59 2018 - [info] 10.0.102.221(10.0.102.221:3306) Sat Dec 8 17:21:59 2018 - [info] Alive Slaves: Sat Dec 8 17:21:59 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sat Dec 8 17:21:59 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sat Dec 8 17:21:59 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22 (oldest major version between slaves) log-bin:disabled Sat Dec 8 17:21:59 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Not candidate for the new Master (no_master is set) Sat Dec 8 17:21:59 2018 - [info] Starting Non-GTID based failover. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] ** Phase 1: Configuration Check Phase completed. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] Forcing shutdown so that applications never connect to the current master.. Sat Dec 8 17:21:59 2018 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address. Sat Dec 8 17:21:59 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Sat Dec 8 17:21:59 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] * Phase 3: Master Recovery Phase.. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] The latest binary log file/position on all slaves is test2-bin.000007:154 Sat Dec 8 17:21:59 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): Sat Dec 8 17:21:59 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sat Dec 8 17:21:59 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sat Dec 8 17:21:59 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22 (oldest major version between slaves) log-bin:disabled Sat Dec 8 17:21:59 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Not candidate for the new Master (no_master is set) Sat Dec 8 17:21:59 2018 - [info] The oldest binary log file/position on all slaves is test2-bin.000007:154 Sat Dec 8 17:21:59 2018 - [info] Oldest slaves: Sat Dec 8 17:21:59 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sat Dec 8 17:21:59 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sat Dec 8 17:21:59 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22 (oldest major version between slaves) log-bin:disabled Sat Dec 8 17:21:59 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:21:59 2018 - [info] Not candidate for the new Master (no_master is set) Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Sat Dec 8 17:21:59 2018 - [info] Sat Dec 8 17:21:59 2018 - [info] Fetching dead master's binary logs.. Sat Dec 8 17:21:59 2018 - [info] Executing command on the dead master 10.0.102.204(10.0.102.204:3306): save_binary_logs --command=save --start_file=test2-bin.000007 --start_pos=154 --binlog_dir=/data/mysql --output_file=/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56Creating /data/log/masterha if not exists.. ok.Concat binary/relay logs from test2-bin.000007 pos 154 to test2-bin.000007 EOF into /data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog ..Binlog Checksum enabledDumping binlog format description event, from position 0 to 154.. ok.Dumping effective binlog data from /data/mysql/test2-bin.000007 position 154 to tail(177).. ok.Binlog Checksum enabledConcat succeeded. Sat Dec 8 17:22:00 2018 - [info] scp from root@10.0.102.204:/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog to local:/data/log/app1/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog succeeded. Sat Dec 8 17:22:00 2018 - [info] HealthCheck: SSH to 10.0.102.179 is reachable. Sat Dec 8 17:22:00 2018 - [info] HealthCheck: SSH to 10.0.102.221 is reachable. Sat Dec 8 17:22:00 2018 - [info] Sat Dec 8 17:22:00 2018 - [info] * Phase 3.3: Determining New Master Phase.. Sat Dec 8 17:22:00 2018 - [info] Sat Dec 8 17:22:00 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Sat Dec 8 17:22:00 2018 - [info] All slaves received relay logs to the same position. No need to resync each other. Sat Dec 8 17:22:00 2018 - [info] Searching new master from slaves.. Sat Dec 8 17:22:00 2018 - [info] Candidate masters from the configuration file: Sat Dec 8 17:22:00 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sat Dec 8 17:22:00 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:22:00 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sat Dec 8 17:22:00 2018 - [info] Non-candidate masters: Sat Dec 8 17:22:00 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22 (oldest major version between slaves) log-bin:disabled Sat Dec 8 17:22:00 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sat Dec 8 17:22:00 2018 - [info] Not candidate for the new Master (no_master is set) Sat Dec 8 17:22:00 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Sat Dec 8 17:22:00 2018 - [info] New master is 10.0.102.179(10.0.102.179:3306) Sat Dec 8 17:22:00 2018 - [info] Starting master failover.. Sat Dec 8 17:22:00 2018 - [info] From: 10.0.102.204(10.0.102.204:3306) (current master)+--10.0.102.179(10.0.102.179:3306)+--10.0.102.221(10.0.102.221:3306)To: 10.0.102.179(10.0.102.179:3306) (new master)+--10.0.102.221(10.0.102.221:3306) Sat Dec 8 17:22:00 2018 - [info] Sat Dec 8 17:22:00 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Sat Dec 8 17:22:00 2018 - [info] Sat Dec 8 17:22:00 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sat Dec 8 17:22:00 2018 - [info] Sending binlog.. Sat Dec 8 17:22:01 2018 - [info] scp from local:/data/log/app1/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog to root@10.0.102.179:/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog succeeded. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] * Phase 3.4: Master Log Apply Phase.. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Sat Dec 8 17:22:01 2018 - [info] Starting recovery on 10.0.102.179(10.0.102.179:3306).. Sat Dec 8 17:22:01 2018 - [info] Generating diffs succeeded. Sat Dec 8 17:22:01 2018 - [info] Waiting until all relay logs are applied. Sat Dec 8 17:22:01 2018 - [info] done. Sat Dec 8 17:22:01 2018 - [info] Getting slave status.. Sat Dec 8 17:22:01 2018 - [info] This slave(10.0.102.179)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(test2-bin.000007:154). No need to recover from Exec_Master_Log_Pos. Sat Dec 8 17:22:01 2018 - [info] Connecting to the target slave host 10.0.102.179, running recover script.. Sat Dec 8 17:22:01 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.0.102.179 --slave_ip=10.0.102.179 --slave_port=3306 --apply_files=/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog --workdir=/data/log/masterha --target_version=5.7.22-log --timestamp=20181208172159 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Sat Dec 8 17:22:01 2018 - [info] MySQL client version is 5.7.22. Using --binary-mode. Applying differential binary/relay log files /data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog on 10.0.102.179:3306. This may take long time... Applying log files succeeded. Sat Dec 8 17:22:01 2018 - [info] All relay logs were successfully applied. Sat Dec 8 17:22:01 2018 - [info] Getting new master's binlog name and position.. Sat Dec 8 17:22:01 2018 - [info] test1-bin.000001:154 Sat Dec 8 17:22:01 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.102.179', MASTER_PORT=3306, MASTER_LOG_FILE='test1-bin.000001', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Sat Dec 8 17:22:01 2018 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address. Sat Dec 8 17:22:01 2018 - [info] ** Finished master recovery successfully. Sat Dec 8 17:22:01 2018 - [info] * Phase 3: Master Recovery Phase completed. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] * Phase 4: Slaves Recovery Phase.. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] -- Slave diff file generation on host 10.0.102.221(10.0.102.221:3306) started, pid: 19729. Check tmp log /data/log/app1/10.0.102.221_3306_20181208172159.log if it takes time.. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] Log messages from 10.0.102.221 ... Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sat Dec 8 17:22:01 2018 - [info] End of log messages from 10.0.102.221. Sat Dec 8 17:22:01 2018 - [info] -- 10.0.102.221(10.0.102.221:3306) has the latest relay log events. Sat Dec 8 17:22:01 2018 - [info] Generating relay diff files from the latest slave succeeded. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] -- Slave recovery on host 10.0.102.221(10.0.102.221:3306) started, pid: 19731. Check tmp log /data/log/app1/10.0.102.221_3306_20181208172159.log if it takes time.. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] Log messages from 10.0.102.221 ... Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] Sending binlog.. Sat Dec 8 17:22:01 2018 - [info] scp from local:/data/log/app1/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog to root@10.0.102.221:/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog succeeded. Sat Dec 8 17:22:01 2018 - [info] Starting recovery on 10.0.102.221(10.0.102.221:3306).. Sat Dec 8 17:22:01 2018 - [info] Generating diffs succeeded. Sat Dec 8 17:22:01 2018 - [info] Waiting until all relay logs are applied. Sat Dec 8 17:22:01 2018 - [info] done. Sat Dec 8 17:22:01 2018 - [info] Getting slave status.. Sat Dec 8 17:22:01 2018 - [info] This slave(10.0.102.221)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(test2-bin.000007:154). No need to recover from Exec_Master_Log_Pos. Sat Dec 8 17:22:01 2018 - [info] Connecting to the target slave host 10.0.102.221, running recover script.. Sat Dec 8 17:22:01 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.0.102.221 --slave_ip=10.0.102.221 --slave_port=3306 --apply_files=/data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog --workdir=/data/log/masterha --target_version=5.7.22 --timestamp=20181208172159 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Sat Dec 8 17:22:01 2018 - [info] MySQL client version is 5.7.22. Using --binary-mode. Applying differential binary/relay log files /data/log/masterha/saved_master_binlog_from_10.0.102.204_3306_20181208172159.binlog on 10.0.102.221:3306. This may take long time... Applying log files succeeded. Sat Dec 8 17:22:01 2018 - [info] All relay logs were successfully applied. Sat Dec 8 17:22:01 2018 - [info] Resetting slave 10.0.102.221(10.0.102.221:3306) and starting replication from the new master 10.0.102.179(10.0.102.179:3306).. Sat Dec 8 17:22:01 2018 - [info] Executed CHANGE MASTER. Sat Dec 8 17:22:01 2018 - [info] Slave started. Sat Dec 8 17:22:01 2018 - [info] End of log messages from 10.0.102.221. Sat Dec 8 17:22:01 2018 - [info] -- Slave recovery on host 10.0.102.221(10.0.102.221:3306) succeeded. Sat Dec 8 17:22:01 2018 - [info] All new slave servers recovered successfully. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] * Phase 5: New master cleanup phase.. Sat Dec 8 17:22:01 2018 - [info] Sat Dec 8 17:22:01 2018 - [info] Resetting slave info on the new master.. Sat Dec 8 17:22:02 2018 - [info] 10.0.102.179: Resetting slave info succeeded. Sat Dec 8 17:22:02 2018 - [info] Master failover to 10.0.102.179(10.0.102.179:3306) completed successfully. Sat Dec 8 17:22:02 2018 - [info] Deleted server1 entry from /etc/masterha/app1.cnf . Sat Dec 8 17:22:02 2018 - [info] ----- Failover Report -----app1: MySQL Master failover 10.0.102.204(10.0.102.204:3306) to 10.0.102.179(10.0.102.179:3306) succeededMaster 10.0.102.204(10.0.102.204:3306) is down!Check MHA Manager logs at test3:/data/log/app1/manager.log for details.Started automated(non-interactive) failover. The latest slave 10.0.102.179(10.0.102.179:3306) has all relay logs for recovery. Selected 10.0.102.179(10.0.102.179:3306) as a new master. 10.0.102.179(10.0.102.179:3306): OK: Applying all logs succeeded. 10.0.102.221(10.0.102.221:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 10.0.102.221(10.0.102.221:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.0.102.179(10.0.102.179:3306) 10.0.102.179(10.0.102.179:3306): Resetting slave info succeeded. Master failover to 10.0.102.179(10.0.102.179:3306) completed successfully. Sat Dec 8 17:22:02 2018 - [info] Sending mail.. Unknown option: conf
由mha切换日志可以看出,整个故障切换已经完成了。整个过程各个阶段核心切换逻辑简化后如下:
Phase 1:配置文件检查
Phase 2:非存活Master关闭服务
Phase 3:Master恢复
Phase 3.1:获取与Master延迟最小的Slave节点
Phase 3.2:生成Master与延迟最小的Slave节点的差异binlog并保存到manager节点
Phase 3.3:找出新的New Master,如果New Master不是最新的Slave节点,那么需要生成它们之间的差异Relay log
Phase 3.4: New Master恢复差异Relay log,随后获取Master binlog位点信息,然后恢复差异binlog日志
Phase 4: Slaves恢复
Phase 4.1:多线程生成延迟最小的Slave节点与其他一个或多个Slave差异Relay log
Phase 4.2:多线程恢复Slaves节点差异Relay log,然后change master到NEW MASTER节点
Phase 5:New Master清理Slave信息,并删除掉MHA配置文件中的选主信息防止误操作
转载于:https://www.cnblogs.com/wxzhe/p/10088627.html
MySQL-MHA集群部署(binlog复制)相关推荐
- MySQL 部署MHA集群部署
目录 MySQL 部署MHA集群部署 MHA集群概述 MHA介绍 MHA简介 MHA组成 MHA工作过程 MHA集群架构 MHA工作过程 拓扑结构 IP规划 拓扑图 部署MHA集群 准备集群环境 安装 ...
- 部署mysql MHA集群
MHA 集群 集群:使用多台服务器提供相同的服务 集群类型:LB(负载均衡集群) HA (高可用集群) 拓扑结构 master51|| | | | | | slave52 slave53 slave5 ...
- Step By Step 搭建 MySql MHA 集群
关于MHA MHA(Master High Availability)是一款开源的mysql高可用程序,目前在mysql高可用方面是一个相对成熟的解决方案.MHA 搭建的前提是MySQL集群中已 ...
- mysql MHA 集群搭建
MHA 集群 集群:使用多台服务器提供相同的服务 集群类型:LB(负载均衡集群) HA (高可用集群) 拓扑结构 master51|| | | | | | slave52 slave53 slave5 ...
- mysql数据库集群 主主复制 原理_MySql搭建集群 之 主主复制(双主代从)MYSQL数据库...
作者:VEPHP 时间 2017-09-27 <MySql搭建集群 之 主主复制(双主代从)MYSQL数据库>要点: 本文介绍了MySql搭建集群 之 主主复制(双主代从)MYSQL数 ...
- 正式环境使用Mysql MGR集群部署(一看就会)
1.MySQL 5.7 推出了 MGR(MySQL Group Replication),能让我们方便的创建弹性.高可用.容错的复制拓扑. MGR 单主和多主两个模式. 单主模式:自动选主,每次只能接 ...
- mysql配置MHA集群
** 本文针对Mysql–MHA集群搭建.vip配置及宕机之后数据库和manager恢复做记录** 搭建环境: 用4台服务器塔尖Mysql-MHA集群 服务器版本:CentOS 7.6 1.192.1 ...
- Mysql数据库(十一)——MHA高可用集群部署及故障切换
Mysql数据库(十一)--MHA高可用集群部署及故障切换 一.MHA概述 二.MHA的组成 三.MHA的特点 四.案例环境 1.服务器配置 2.思路 3.关闭防火墙和安全机制,并进行主从配置 4.配 ...
- mysql备份恢复与集群部署
MySQL主从复制 1.如果主节点已经运行了一段时间,且有大量数据时,新增一个slave,如何配置并启动新增slave节点 思路步骤: 通过备份恢复数据至从服务器 复制起始位置为备份时,二进制日志文件 ...
最新文章
- 双击进入物料数据的指定视图
- 扎克伯格预言即将成真:计算机可解读图片内容
- vue项目打包到腾讯云服务器全过程
- Android中最详细的焦点问题,从概念出发带你一点点分享(1)
- STM32F103构建固件库模板(PS固件库文件树介绍)
- 程序员年入50万,我们该如何努力达到这个目标?
- Docker容器中的Linux机器快速设置国内源
- linux 截图程序源码,Linux下C语言实现C/S模式编程(附源码,运行截图)
- 于Eclipse传导C/C++配置方法开发(20140721新)
- Oracle掌管权限和角色
- “按字典序输出方案” 解决方法
- 物联网之卫星导航系统
- 常见网络故障及其解决办法
- 使用 Electron 构建桌面应用
- [计算机系统]大作业-hello程序人生
- 特征工程的准备:特征理解
- Kubernetes--k8s---存活探针和就绪探针的最佳实践
- Vim常用技巧--查看不可见字符
- 广告营销DSP和DMP概念解释
- python顺序结构例题_python的顺序结构、选择结构、循环结构的练习代码
热门文章
- Linux下C语言的系统头文件
- Java传统的io和nio区别_Java中IO和NIO的本质和区别
- python详细安装教程环境配置-python3.6环境安装+pip环境配置教程图文详解
- python编写爬虫的步骤-零基础写python爬虫之爬虫编写全记录
- python可以干嘛知乎-一行Python代码能做什么?
- kali查看python版本-kali中python版本的切换方法
- python下载代码-Python3----下载小说代码
- python语言的理解-初学Python语言者必须理解的下划线
- python3.7安装turtle步骤-Python怎么引入turtle
- python错误-python异常与错误区别