MooseFS灾备演练实录
昨天晚上去机房为数据库服务器做磁盘扩容,顺带为目前线上的存储系统MooseFS做了一次灾难演练。故此,今天准备把昨天的灾难演练的详情总结一下,分享给大家。如果大家正在使用MooseFS,那么就可以有所参考了。
MooseFS是一个分布式的文件系统,有关它的具体信息,我这里就不多做介绍了,大家可以去参考我之前写过的三篇博文:
分布式文件系统之MooseFS----介绍
分布式文件系统之MooseFS----部署
分布式文件系统之MooseFS----管理优化
这里简单先介绍一下,目前我们这套存储的架构设计:
CPU:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
CPU:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
整个MooseFS的架构,是以两台mfsmaster,一台主,一台备,中间heartbeat+drbd技术来做该服务的高可用,后端放置3台存储节点,负责提供数据存储服务。
通过记录发生故障时相关服务的日志记录来分析故障发生时,高可用软件的决策和动作
通过记录发生故障前后客户端的服务使用情况,来判断故障对客户端的影响程度
在测试之前,我会在 mfs 客户端放一个持续输入脚本,它会以一秒的间隔向挂载的mfs目录中的某个文件进行文字输入,以此用来判断 mfs 的恢复时间。
[root@web-phy13-rj ~]# for i in {1..20000};do echo `date` $i >> /mfsdata/testxxxx;sleep 1;done
[root@kvm-phy11-rj ~]# /etc/init.d/heartbeat stop Stopping High-Availability services: Done.
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 48931 1 0 20:03 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:10824 nr:3912 dw:12716 dr:16539 al:11 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:3912 nr:10896 dw:106643992 dr:16616 al:10 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Jun 25 20:05:29 mfs-master01-rj.btr heartbeat: [48425]: info: Heartbeat shutdown in progress. (48425) Jun 25 20:05:29 mfs-master01-rj.btr heartbeat: [48980]: info: Giving up all HA resources. ResourceManager(default)[48993]: 2015/06/25_20:05:29 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster ResourceManager(default)[48993]: 2015/06/25_20:05:29 info: Running /etc/ha.d/resource.d/mfsmaster stop ResourceManager(default)[48993]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop Filesystem(Filesystem_/dev/drbd0)[49056]: 2015/06/25_20:05:31 INFO: Running stop for /dev/drbd0 on /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[49056]: 2015/06/25_20:05:31 INFO: Trying to unmount /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[49056]: 2015/06/25_20:05:31 INFO: unmounted /usr/local/mfs successfully /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[49048]: 2015/06/25_20:05:31 INFO: Success ResourceManager(default)[48993]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/drbddisk drbd stop ResourceManager(default)[48993]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop IPaddr(IPaddr_10.1.1.26)[401]: 2015/06/25_20:05:31 INFO: IP status = ok, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[373]: 2015/06/25_20:05:31 INFO: Success Jun 25 20:05:31 mfs-master01-rj.btr heartbeat: [48980]: info: All HA resources relinquished. Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: WARN: 1 lost packet(s) for [mfs-master02-rj.btr] [2673:2675] Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: info: No pkts missing from mfs-master02-rj.btr! Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: info: killing /usr/lib64/heartbeat/ipfail process group 48439 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBFIFO process 48428 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48429 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48430 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48431 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48432 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48433 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48434 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48435 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48436 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48430 exited. 9 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48433 exited. 8 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48431 exited. 7 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48428 exited. 6 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48432 exited. 5 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48434 exited. 4 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48435 exited. 3 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48436 exited. 2 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48429 exited. 1 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: mfs-master01-rj.btr Heartbeat shutdown complete.
Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Received shutdown notice from 'mfs-master01-rj.btr'. Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Resources being acquired from mfs-master01-rj.btr. Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47668]: info: acquire local HA resources (standby). Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47669]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys mfs-master02-rj.btr] to acquire. Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47668]: info: local HA resource acquisition completed (standby). Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Standby resource acquisition done [foreign]. harc(default)[47694]: 2015/06/25_20:05:31 info: Running /etc/ha.d//rc.d/status status mach_down(default)[47711]: 2015/06/25_20:05:31 info: Taking over resource group IPaddr::10.1.1.26/24/em4 ResourceManager(default)[47738]: 2015/06/25_20:05:31 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[47766]: 2015/06/25_20:05:31 INFO: Resource is stopped ResourceManager(default)[47738]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start IPaddr(IPaddr_10.1.1.26)[47889]: 2015/06/25_20:05:31 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4 IPaddr(IPaddr_10.1.1.26)[47889]: 2015/06/25_20:05:31 INFO: Bringing device em4 up IPaddr(IPaddr_10.1.1.26)[47889]: 2015/06/25_20:05:31 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[47863]: 2015/06/25_20:05:31 INFO: Success ResourceManager(default)[47738]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/drbddisk drbd start /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48017]: 2015/06/25_20:05:32 INFO: Resource is stopped ResourceManager(default)[47738]: 2015/06/25_20:05:32 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 start Filesystem(Filesystem_/dev/drbd0)[48103]: 2015/06/25_20:05:32 INFO: Running start for /dev/drbd0 on /usr/local/mfs /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48095]: 2015/06/25_20:05:32 INFO: Success ResourceManager(default)[47738]: 2015/06/25_20:05:32 info: Running /etc/ha.d/resource.d/mfsmaster start mach_down(default)[47711]: 2015/06/25_20:05:32 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down(default)[47711]: 2015/06/25_20:05:32 info: mach_down takeover complete for node mfs-master01-rj.btr. Jun 25 20:05:32 mfs-master02-rj.btr heartbeat: [44613]: info: mach_down takeover complete. Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: WARN: node mfs-master01-rj.btr: is dead Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: info: Dead node mfs-master01-rj.btr gave up resources. Jun 25 20:05:42 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status dead Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: info: Link mfs-master01-rj.btr:em2 dead. Jun 25 20:05:44 mfs-master02-rj.btr ipfail: [44630]: info: NS: We are still alive! Jun 25 20:05:44 mfs-master02-rj.btr ipfail: [44630]: info: Link Status update: Link mfs-master01-rj.btr/em2 now has status dead Jun 25 20:05:45 mfs-master02-rj.btr ipfail: [44630]: info: Asking other side for ping node count. Jun 25 20:05:45 mfs-master02-rj.btr ipfail: [44630]: info: Checking remote count of ping nodes.
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:11876 nr:4536 dw:14392 dr:16551 al:12 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 48197 1 0 20:05 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:4604 nr:11864 dw:106645652 dr:18533 al:12 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Thu Jun 25 18:52:31 CST 2015 86 Thu Jun 25 18:52:32 CST 2015 87 Thu Jun 25 18:52:33 CST 2015 88 Thu Jun 25 18:52:34 CST 2015 89 ######## Thu Jun 25 18:52:47 CST 2015 90 ######## Thu Jun 25 18:52:48 CST 2015 91 Thu Jun 25 18:52:49 CST 2015 92
恢复故障1:mfsmaster 的heartbeat服务恢复之后
[root@mfs-master01-rj ~]# /etc/init.d/heartbeat start Starting High-Availability services: INFO: Resource is stopped Done.
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: Pacemaker support: false Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: ************************** Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: Configuration validated. Starting heartbeat 3.0.4 Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: heartbeat: version 3.0.4 Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Heartbeat generation: 1435221812 Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: UDP multicast heartbeat started for group 225.0.0.192 port 694 interface em2 (ttl=1 loop=0) Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_TriggerHandler: Added signal manual handler Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_TriggerHandler: Added signal manual handler Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Local status now set to: 'up' Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.27: status ping Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.27:10.1.1.27 up. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.28: status ping Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.28:10.1.1.28 up. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link mfs-master02-rj.btr:em2 up. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node mfs-master02-rj.btr: status active harc(default)[667]: 2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.1:10.1.1.1 up. Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.1: status ping Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Comm_now_up(): updating status to active Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Local status now set to: 'active' Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (497,496) Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [685]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 497 gid 496 (pid 685) Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed. Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed. Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Local Resource acquisition completed. (none) Jun 25 20:09:31 mfs-master01-rj.btr heartbeat: [654]: info: mfs-master02-rj.btr wants to go standby [foreign] Jun 25 20:09:33 mfs-master01-rj.btr heartbeat: [654]: info: standby: acquire [foreign] resources from mfs-master02-rj.btr Jun 25 20:09:33 mfs-master01-rj.btr heartbeat: [688]: info: acquire local HA resources (standby). ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[729]: 2015/06/25_20:09:34 INFO: Resource is stopped ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start IPaddr(IPaddr_10.1.1.26)[855]: 2015/06/25_20:09:34 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4 IPaddr(IPaddr_10.1.1.26)[855]: 2015/06/25_20:09:34 INFO: Bringing device em4 up IPaddr(IPaddr_10.1.1.26)[855]: 2015/06/25_20:09:34 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[828]: 2015/06/25_20:09:34 INFO: Success ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/drbddisk drbd start /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[985]: 2015/06/25_20:09:34 INFO: Resource is stopped ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 start Filesystem(Filesystem_/dev/drbd0)[1071]: 2015/06/25_20:09:34 INFO: Running start for /dev/drbd0 on /usr/local/mfs /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[1063]: 2015/06/25_20:09:34 INFO: Success ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/mfsmaster start Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [688]: info: local HA resource acquisition completed (standby). Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [654]: info: Standby resource acquisition done [foreign]. Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [654]: info: Initial resource acquisition complete (auto_failback) Jun 25 20:09:35 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed. Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Ping node count is balanced. Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Giving up foreign resources (auto_failback). Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Delayed giveup in 4 seconds. Jun 25 20:09:46 mfs-master01-rj.btr ipfail: [685]: info: giveup() called (timeout worked) Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [654]: info: mfs-master01-rj.btr wants to go standby [foreign] Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [654]: info: standby: mfs-master02-rj.btr can take our foreign resources Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [1166]: info: give up foreign HA resources (standby). Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [1166]: info: foreign HA resource release completed (standby). Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: Local standby process completed [foreign]. Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: WARN: 1 lost packet(s) for [mfs-master02-rj.btr] [2816:2818] Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed. Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: No pkts missing from mfs-master02-rj.btr! Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: Other node completed standby takeover of foreign resources.
Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Heartbeat restart on node mfs-master01-rj.btr Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Link mfs-master01-rj.btr:em2 up. Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status init Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Link Status update: Link mfs-master01-rj.btr/em2 now has status up Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status init Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status up Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status up harc(default)[48262]: 2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status harc(default)[48279]: 2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status active Jun 25 20:09:30 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status active harc(default)[48296]: 2015/06/25_20:09:30 info: Running /etc/ha.d//rc.d/status status Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed. Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: mfs-master02-rj.btr wants to go standby [foreign] Jun 25 20:09:31 mfs-master02-rj.btr heartbeat: [44613]: info: standby: mfs-master01-rj.btr can take our foreign resources Jun 25 20:09:31 mfs-master02-rj.btr heartbeat: [48313]: info: give up foreign HA resources (standby). ResourceManager(default)[48326]: 2015/06/25_20:09:31 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster ResourceManager(default)[48326]: 2015/06/25_20:09:31 info: Running /etc/ha.d/resource.d/mfsmaster stop Jun 25 20:09:32 mfs-master02-rj.btr ipfail: [44630]: info: Asking other side for ping node count. ResourceManager(default)[48326]: 2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop Filesystem(Filesystem_/dev/drbd0)[48389]: 2015/06/25_20:09:33 INFO: Running stop for /dev/drbd0 on /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[48389]: 2015/06/25_20:09:33 INFO: Trying to unmount /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[48389]: 2015/06/25_20:09:33 INFO: unmounted /usr/local/mfs successfully /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48381]: 2015/06/25_20:09:33 INFO: Success ResourceManager(default)[48326]: 2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/drbddisk drbd stop ResourceManager(default)[48326]: 2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop IPaddr(IPaddr_10.1.1.26)[48545]: 2015/06/25_20:09:33 INFO: IP status = ok, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[48519]: 2015/06/25_20:09:33 INFO: Success Jun 25 20:09:33 mfs-master02-rj.btr heartbeat: [48313]: info: foreign HA resource release completed (standby). Jun 25 20:09:33 mfs-master02-rj.btr heartbeat: [44613]: info: Local standby process completed [foreign]. Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: WARN: 1 lost packet(s) for [mfs-master01-rj.btr] [13:15] Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed. Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: No pkts missing from mfs-master01-rj.btr! Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: Other node completed standby takeover of foreign resources. Jun 25 20:09:42 mfs-master02-rj.btr ipfail: [44630]: info: No giveup timer to abort. Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: mfs-master01-rj.btr wants to go standby [foreign] Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: standby: acquire [foreign] resources from mfs-master01-rj.btr Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [48610]: info: acquire local HA resources (standby). Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [48610]: info: local HA resource acquisition completed (standby). Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: Standby resource acquisition done [foreign]. Jun 25 20:09:48 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed.
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 1165 1 0 20:09 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:12288 nr:5600 dw:15868 dr:18468 al:13 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:5600 nr:12324 dw:106647108 dr:18541 al:12 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Thu Jun 25 18:56:33 CST 2015 314 Thu Jun 25 18:56:34 CST 2015 315 Thu Jun 25 18:56:35 CST 2015 316 Thu Jun 25 18:56:36 CST 2015 317 ####### Thu Jun 25 18:56:49 CST 2015 318 ####### Thu Jun 25 18:56:50 CST 2015 319 Thu Jun 25 18:56:51 CST 2015 320 Thu Jun 25 18:56:52 CST 2015 321
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 1165 1 0 20:09 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:12692 nr:5600 dw:16272 dr:18468 al:13 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:5600 nr:12816 dw:106647600 dr:18541 al:12 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:6040 nr:13816 dw:106649040 dr:20458 al:13 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Jun 25 21:17:19 mfs-master01-rj.btr heartbeat: [12255]: WARN: node 10.1.1.1: is dead Jun 25 21:17:19 mfs-master01-rj.btr ipfail: [12269]: info: Status update: Node 10.1.1.1 now has status dead Jun 25 21:17:19 mfs-master01-rj.btr heartbeat: [12255]: WARN: node 10.1.1.28: is dead Jun 25 21:17:19 mfs-master01-rj.btr heartbeat: [12255]: info: Link 10.1.1.1:10.1.1.1 dead. Jun 25 21:17:19 mfs-master01-rj.btr heartbeat: [12255]: info: Link 10.1.1.28:10.1.1.28 dead. harc(default)[12798]: 2015/06/25_21:17:19 info: Running /etc/ha.d//rc.d/status status harc(default)[12824]: 2015/06/25_21:17:19 info: Running /etc/ha.d//rc.d/status status Jun 25 21:17:21 mfs-master01-rj.btr ipfail: [12269]: info: NS: We are still alive! Jun 25 21:17:21 mfs-master01-rj.btr ipfail: [12269]: info: Status update: Node 10.1.1.28 now has status dead Jun 25 21:17:23 mfs-master01-rj.btr ipfail: [12269]: info: NS: We are still alive! Jun 25 21:17:23 mfs-master01-rj.btr ipfail: [12269]: info: Link Status update: Link 10.1.1.1/10.1.1.1 now has status dead Jun 25 21:17:25 mfs-master01-rj.btr ipfail: [12269]: info: Asking other side for ping node count. Jun 25 21:17:25 mfs-master01-rj.btr ipfail: [12269]: info: Checking remote count of ping nodes. Jun 25 21:17:25 mfs-master01-rj.btr ipfail: [12269]: info: Link Status update: Link 10.1.1.28/10.1.1.28 now has status dead Jun 25 21:17:27 mfs-master01-rj.btr ipfail: [12269]: info: Asking other side for ping node count. Jun 25 21:17:27 mfs-master01-rj.btr ipfail: [12269]: info: Checking remote count of ping nodes. Jun 25 21:17:29 mfs-master01-rj.btr ipfail: [12269]: info: Giving up because we have less visible ping nodes. Jun 25 21:17:29 mfs-master01-rj.btr ipfail: [12269]: info: Delayed giveup in 4 seconds. Jun 25 21:17:29 mfs-master01-rj.btr ipfail: [12269]: info: Giving up because we were told that we have less ping nodes. Jun 25 21:17:29 mfs-master01-rj.btr ipfail: [12269]: info: Delayed giveup in 4 seconds. Jun 25 21:17:30 mfs-master01-rj.btr ipfail: [12269]: info: Giving up because we were told that we have less ping nodes. Jun 25 21:17:30 mfs-master01-rj.btr ipfail: [12269]: info: Delayed giveup in 4 seconds. Jun 25 21:17:34 mfs-master01-rj.btr ipfail: [12269]: info: giveup() called (timeout worked) Jun 25 21:17:35 mfs-master01-rj.btr heartbeat: [12255]: info: mfs-master01-rj.btr wants to go standby [all] Jun 25 21:17:35 mfs-master01-rj.btr heartbeat: [12255]: info: standby: mfs-master02-rj.btr can take our all resources Jun 25 21:17:35 mfs-master01-rj.btr heartbeat: [12865]: info: give up all HA resources (standby). ResourceManager(default)[12878]: 2015/06/25_21:17:35 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster ResourceManager(default)[12878]: 2015/06/25_21:17:35 info: Running /etc/ha.d/resource.d/mfsmaster stop ResourceManager(default)[12878]: 2015/06/25_21:17:37 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop Filesystem(Filesystem_/dev/drbd0)[12942]: 2015/06/25_21:17:37 INFO: Running stop for /dev/drbd0 on /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[12942]: 2015/06/25_21:17:37 INFO: Trying to unmount /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[12942]: 2015/06/25_21:17:37 INFO: unmounted /usr/local/mfs successfully /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[12934]: 2015/06/25_21:17:37 INFO: Success ResourceManager(default)[12878]: 2015/06/25_21:17:38 info: Running /etc/ha.d/resource.d/drbddisk drbd stop ResourceManager(default)[12878]: 2015/06/25_21:17:38 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop IPaddr(IPaddr_10.1.1.26)[13098]: 2015/06/25_21:17:38 INFO: IP status = ok, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[13072]: 2015/06/25_21:17:38 INFO: Success Jun 25 21:17:38 mfs-master01-rj.btr heartbeat: [12865]: info: all HA resource release completed (standby). Jun 25 21:17:38 mfs-master01-rj.btr heartbeat: [12255]: info: Local standby process completed [all]. Jun 25 21:17:39 mfs-master01-rj.btr heartbeat: [12255]: WARN: 1 lost packet(s) for [mfs-master02-rj.btr] [248:250] Jun 25 21:17:39 mfs-master01-rj.btr heartbeat: [12255]: info: remote resource transition completed. Jun 25 21:17:39 mfs-master01-rj.btr heartbeat: [12255]: info: No pkts missing from mfs-master02-rj.btr! Jun 25 21:17:39 mfs-master01-rj.btr heartbeat: [12255]: info: Other node completed standby takeover of all resources.
Jun 25 21:17:18 mfs-master02-rj.btr heartbeat: [3504]: WARN: node 10.1.1.27: is dead Jun 25 21:17:18 mfs-master02-rj.btr ipfail: [3535]: info: Status update: Node 10.1.1.27 now has status dead Jun 25 21:17:18 mfs-master02-rj.btr heartbeat: [3504]: info: Link 10.1.1.27:10.1.1.27 dead. harc(default)[4836]: 2015/06/25_21:17:18 info: Running /etc/ha.d//rc.d/status status Jun 25 21:17:19 mfs-master02-rj.btr ipfail: [3535]: info: NS: We are still alive! Jun 25 21:17:19 mfs-master02-rj.btr ipfail: [3535]: info: Link Status update: Link 10.1.1.27/10.1.1.27 now has status dead Jun 25 21:17:21 mfs-master02-rj.btr ipfail: [3535]: info: Asking other side for ping node count. Jun 25 21:17:21 mfs-master02-rj.btr ipfail: [3535]: info: Checking remote count of ping nodes. Jun 25 21:17:27 mfs-master02-rj.btr ipfail: [3535]: info: Telling other node that we have more visible ping nodes. Jun 25 21:17:29 mfs-master02-rj.btr ipfail: [3535]: info: Telling other node that we have more visible ping nodes. Jun 25 21:17:35 mfs-master02-rj.btr heartbeat: [3504]: info: mfs-master01-rj.btr wants to go standby [all] Jun 25 21:17:38 mfs-master02-rj.btr heartbeat: [3504]: info: standby: acquire [all] resources from mfs-master01-rj.btr Jun 25 21:17:38 mfs-master02-rj.btr heartbeat: [4870]: info: acquire all HA resources (standby). ResourceManager(default)[4883]: 2015/06/25_21:17:38 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[4911]: 2015/06/25_21:17:38 INFO: Resource is stopped ResourceManager(default)[4883]: 2015/06/25_21:17:38 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start IPaddr(IPaddr_10.1.1.26)[5035]: 2015/06/25_21:17:38 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4 IPaddr(IPaddr_10.1.1.26)[5035]: 2015/06/25_21:17:38 INFO: Bringing device em4 up IPaddr(IPaddr_10.1.1.26)[5035]: 2015/06/25_21:17:38 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[5009]: 2015/06/25_21:17:38 INFO: Success ResourceManager(default)[4883]: 2015/06/25_21:17:38 info: Running /etc/ha.d/resource.d/drbddisk drbd start /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[5165]: 2015/06/25_21:17:38 INFO: Resource is stopped ResourceManager(default)[4883]: 2015/06/25_21:17:38 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 start Filesystem(Filesystem_/dev/drbd0)[5251]: 2015/06/25_21:17:38 INFO: Running start for /dev/drbd0 on /usr/local/mfs /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[5243]: 2015/06/25_21:17:38 INFO: Success ResourceManager(default)[4883]: 2015/06/25_21:17:38 info: Running /etc/ha.d/resource.d/mfsmaster start Jun 25 21:17:38 mfs-master02-rj.btr heartbeat: [4870]: info: all HA resource acquisition completed (standby). Jun 25 21:17:38 mfs-master02-rj.btr heartbeat: [3504]: info: Standby resource acquisition done [all]. Jun 25 21:17:39 mfs-master02-rj.btr heartbeat: [3504]: info: remote resource transition completed.
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:7556 nr:1264 dw:2380 dr:9530 al:6 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 5345 1 0 21:17 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:1336 nr:7556 dw:106668492 dr:24076 al:16 bm:6421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Thu Jun 25 19:01:50 CST 2015 616 Thu Jun 25 19:01:51 CST 2015 617 Thu Jun 25 19:01:52 CST 2015 618 Thu Jun 25 19:01:53 CST 2015 619 Thu Jun 25 19:01:54 CST 2015 620 ################ Thu Jun 25 19:02:56 CST 2015 621 ################ Thu Jun 25 19:02:57 CST 2015 622 Thu Jun 25 19:02:58 CST 2015 623 Thu Jun 25 19:02:59 CST 2015 624
Jun 25 21:21:44 mfs-master01-rj.btr heartbeat: [12255]: WARN: Late heartbeat: Node 10.1.1.28: interval 275010 ms Jun 25 21:21:44 mfs-master01-rj.btr heartbeat: [12255]: info: Status update for node 10.1.1.28: status ping Jun 25 21:21:44 mfs-master01-rj.btr heartbeat: [12255]: info: Link 10.1.1.28:10.1.1.28 up. Jun 25 21:21:44 mfs-master01-rj.btr ipfail: [12269]: info: Status update: Node 10.1.1.28 now has status ping Jun 25 21:21:44 mfs-master01-rj.btr ipfail: [12269]: info: A ping node just came up. Jun 25 21:21:45 mfs-master01-rj.btr heartbeat: [12255]: info: Link 10.1.1.1:10.1.1.1 up. Jun 25 21:21:45 mfs-master01-rj.btr heartbeat: [12255]: WARN: Late heartbeat: Node 10.1.1.1: interval 276070 ms Jun 25 21:21:45 mfs-master01-rj.btr heartbeat: [12255]: info: Status update for node 10.1.1.1: status ping Jun 25 21:21:45 mfs-master01-rj.btr ipfail: [12269]: info: Asking other side for ping node count. Jun 25 21:21:45 mfs-master01-rj.btr ipfail: [12269]: info: Link Status update: Link 10.1.1.28/10.1.1.28 now has status up Jun 25 21:21:45 mfs-master01-rj.btr ipfail: [12269]: info: Link Status update: Link 10.1.1.1/10.1.1.1 now has status up Jun 25 21:21:45 mfs-master01-rj.btr ipfail: [12269]: info: Status update: Node 10.1.1.1 now has status ping Jun 25 21:21:45 mfs-master01-rj.btr ipfail: [12269]: info: A ping node just came up. Jun 25 21:21:47 mfs-master01-rj.btr ipfail: [12269]: info: Asking other side for ping node count. Jun 25 21:21:49 mfs-master01-rj.btr ipfail: [12269]: info: Ping node count is balanced. Jun 25 21:21:50 mfs-master01-rj.btr ipfail: [12269]: info: Giving up foreign resources (auto_failback). Jun 25 21:21:50 mfs-master01-rj.btr ipfail: [12269]: info: Delayed giveup in 4 seconds. Jun 25 21:21:50 mfs-master01-rj.btr ipfail: [12269]: info: Giving up because we were told that we have less ping nodes. Jun 25 21:21:50 mfs-master01-rj.btr ipfail: [12269]: info: Delayed giveup in 4 seconds. Jun 25 21:21:50 mfs-master01-rj.btr ipfail: [12269]: info: Aborted delayed giveup (8)
Jun 25 21:21:44 mfs-master02-rj.btr heartbeat: [3504]: WARN: Late heartbeat: Node 10.1.1.27: interval 276050 ms Jun 25 21:21:44 mfs-master02-rj.btr heartbeat: [3504]: info: Status update for node 10.1.1.27: status ping Jun 25 21:21:44 mfs-master02-rj.btr heartbeat: [3504]: info: Link 10.1.1.27:10.1.1.27 up. Jun 25 21:21:44 mfs-master02-rj.btr ipfail: [3535]: info: Status update: Node 10.1.1.27 now has status ping Jun 25 21:21:44 mfs-master02-rj.btr ipfail: [3535]: info: A ping node just came up. Jun 25 21:21:46 mfs-master02-rj.btr ipfail: [3535]: info: Asking other side for ping node count. Jun 25 21:21:46 mfs-master02-rj.btr ipfail: [3535]: info: Link Status update: Link 10.1.1.27/10.1.1.27 now has status up Jun 25 21:21:48 mfs-master02-rj.btr ipfail: [3535]: info: Telling other node that we have more visible ping nodes. Jun 25 21:21:50 mfs-master02-rj.btr ipfail: [3535]: info: Ping node count is balanced. Jun 25 21:21:50 mfs-master02-rj.btr ipfail: [3535]: info: Giving up foreign resources (auto_failback). Jun 25 21:21:50 mfs-master02-rj.btr ipfail: [3535]: info: Delayed giveup in 4 seconds. Jun 25 21:21:50 mfs-master02-rj.btr ipfail: [3535]: info: Aborted delayed giveup (4)
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:7556 nr:1264 dw:2380 dr:9530 al:6 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 5345 1 0 21:17 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:1336 nr:7556 dw:106668492 dr:24076 al:16 bm:6421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 14129 1 0 21:25 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:8312 nr:2556 dw:4428 dr:11447 al:8 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:2556 nr:8400 dw:106670556 dr:24084 al:16 bm:6421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Jun 25 21:30:09 mfs-master01-rj.btr heartbeat: [12255]: WARN: node mfs-master02-rj.btr: is dead Jun 25 21:30:09 mfs-master01-rj.btr heartbeat: [12255]: WARN: No STONITH device configured. Jun 25 21:30:09 mfs-master01-rj.btr ipfail: [12269]: info: Status update: Node mfs-master02-rj.btr now has status dead Jun 25 21:30:09 mfs-master01-rj.btr heartbeat: [12255]: WARN: Shared disks are not protected. Jun 25 21:30:09 mfs-master01-rj.btr heartbeat: [12255]: info: Resources being acquired from mfs-master02-rj.btr. Jun 25 21:30:09 mfs-master01-rj.btr heartbeat: [12255]: info: Link mfs-master02-rj.btr:em2 dead. harc(default)[14821]: 2015/06/25_21:30:09 info: Running /etc/ha.d//rc.d/status status mach_down(default)[14857]: 2015/06/25_21:30:09 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down(default)[14857]: 2015/06/25_21:30:09 info: mach_down takeover complete for node mfs-master02-rj.btr. Jun 25 21:30:09 mfs-master01-rj.btr heartbeat: [12255]: info: mach_down takeover complete. /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[14889]: 2015/06/25_21:30:09 INFO: Running OK Jun 25 21:30:09 mfs-master01-rj.btr heartbeat: [14822]: info: Local Resource acquisition completed. Jun 25 21:30:10 mfs-master01-rj.btr ipfail: [12269]: info: NS: We are still alive! Jun 25 21:30:10 mfs-master01-rj.btr ipfail: [12269]: info: Link Status update: Link mfs-master02-rj.btr/em2 now has status dead Jun 25 21:30:13 mfs-master01-rj.btr ipfail: [12269]: info: Asking other side for ping node count. Jun 25 21:30:13 mfs-master01-rj.btr ipfail: [12269]: info: Checking remote count of ping nodes.
Jun 25 21:30:09 mfs-master02-rj.btr heartbeat: [6134]: WARN: node mfs-master01-rj.btr: is dead Jun 25 21:30:09 mfs-master02-rj.btr heartbeat: [6134]: WARN: No STONITH device configured. Jun 25 21:30:09 mfs-master02-rj.btr ipfail: [6173]: info: Status update: Node mfs-master01-rj.btr now has status dead Jun 25 21:30:09 mfs-master02-rj.btr ipfail: [6173]: info: Status update: Node mfs-master01-rj.btr now has status dead Jun 25 21:30:09 mfs-master02-rj.btr heartbeat: [6134]: WARN: Shared disks are not protected. Jun 25 21:30:09 mfs-master02-rj.btr heartbeat: [6134]: info: Resources being acquired from mfs-master01-rj.btr. Jun 25 21:30:09 mfs-master02-rj.btr heartbeat: [6134]: info: Link mfs-master01-rj.btr:em2 dead. harc(default)[6484]: 2015/06/25_21:30:09 info: Running /etc/ha.d//rc.d/status status Jun 25 21:30:09 mfs-master02-rj.btr heartbeat: [6485]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys mfs-master02-rj.btr] to acquire. mach_down(default)[6514]: 2015/06/25_21:30:09 info: Taking over resource group IPaddr::10.1.1.26/24/em4 ResourceManager(default)[6541]: 2015/06/25_21:30:09 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[6569]: 2015/06/25_21:30:10 INFO: Resource is stopped ResourceManager(default)[6541]: 2015/06/25_21:30:10 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start IPaddr(IPaddr_10.1.1.26)[6692]: 2015/06/25_21:30:10 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4 IPaddr(IPaddr_10.1.1.26)[6692]: 2015/06/25_21:30:10 INFO: Bringing device em4 up IPaddr(IPaddr_10.1.1.26)[6692]: 2015/06/25_21:30:10 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[6666]: 2015/06/25_21:30:10 INFO: Success ResourceManager(default)[6541]: 2015/06/25_21:30:10 info: Running /etc/ha.d/resource.d/drbddisk drbd start Jun 25 21:30:10 mfs-master02-rj.btr ipfail: [6173]: info: NS: We are still alive! Jun 25 21:30:10 mfs-master02-rj.btr ipfail: [6173]: info: Link Status update: Link mfs-master01-rj.btr/em2 now has status dead Jun 25 21:30:12 mfs-master02-rj.btr ipfail: [6173]: info: Asking other side for ping node count. Jun 25 21:30:12 mfs-master02-rj.btr ipfail: [6173]: info: Checking remote count of ping nodes. ResourceManager(default)[6541]: 2015/06/25_21:30:22 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk ResourceManager(default)[6541]: 2015/06/25_21:30:22 CRIT: Giving up resources due to failure of drbddisk::drbd ResourceManager(default)[6541]: 2015/06/25_21:30:22 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster ResourceManager(default)[6541]: 2015/06/25_21:30:22 info: Running /etc/ha.d/resource.d/mfsmaster stop ResourceManager(default)[6541]: 2015/06/25_21:30:22 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop Filesystem(Filesystem_/dev/drbd0)[6886]: 2015/06/25_21:30:22 INFO: Running stop for /dev/drbd0 on /usr/local/mfs /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[6878]: 2015/06/25_21:30:22 INFO: Success ResourceManager(default)[6541]: 2015/06/25_21:30:22 info: Running /etc/ha.d/resource.d/drbddisk drbd stop ResourceManager(default)[6541]: 2015/06/25_21:30:22 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop IPaddr(IPaddr_10.1.1.26)[7012]: 2015/06/25_21:30:22 INFO: IP status = ok, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[6986]: 2015/06/25_21:30:22 INFO: Success mach_down(default)[6514]: 2015/06/25_21:30:22 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down(default)[6514]: 2015/06/25_21:30:22 info: mach_down takeover complete for node mfs-master01-rj.btr. Jun 25 21:30:22 mfs-master02-rj.btr heartbeat: [6134]: info: mach_down takeover complete. hb_standby(default)[7111]: 2015/06/25_21:30:52 Going standby [foreign]. Jun 25 21:30:53 mfs-master02-rj.btr heartbeat: [6134]: info: mfs-master02-rj.btr wants to go standby [foreign] Jun 25 21:31:04 mfs-master02-rj.btr heartbeat: [6134]: WARN: No reply to standby request. Standby request cancelled.
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 14129 1 0 21:25 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:8564 nr:2556 dw:4680 dr:11447 al:8 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:2556 nr:8564 dw:106670720 dr:24084 al:16 bm:6421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Thu Jun 25 20:17:10 CST 2015 4015 Thu Jun 25 20:17:11 CST 2015 4016 Thu Jun 25 20:17:12 CST 2015 4017 Thu Jun 25 20:17:13 CST 2015 4018 Thu Jun 25 20:17:14 CST 2015 4019 Thu Jun 25 20:17:15 CST 2015 4020 ########## Thu Jun 25 20:18:33 CST 2015 4021 ########## Thu Jun 25 20:18:34 CST 2015 4022 Thu Jun 25 20:18:35 CST 2015 4023
恢复故障2:关闭mfsmaster的heartbeat心跳中断
恢复heartbeat心跳:
插上mfsmaster主的em2网线
故障恢复时,mfsmaster主备服务器的日志信息:
mfsmaster主:
Jun 25 21:32:43 mfs-master01-rj.btr heartbeat: [12255]: CRIT: Cluster node mfs-master02-rj.btr returning after partition. Jun 25 21:32:43 mfs-master01-rj.btr heartbeat: [12255]: info: For information on cluster partitions, See URL: http://linux-ha.org/wiki/Split_Brain Jun 25 21:32:43 mfs-master01-rj.btr heartbeat: [12255]: WARN: Deadtime value may be too small. Jun 25 21:32:43 mfs-master01-rj.btr heartbeat: [12255]: info: See FAQ for information on tuning deadtime. Jun 25 21:32:43 mfs-master01-rj.btr heartbeat: [12255]: info: URL: http://linux-ha.org/wiki/FAQ#Heavy_Load Jun 25 21:32:43 mfs-master01-rj.btr heartbeat: [12255]: info: Link mfs-master02-rj.btr:em2 up. Jun 25 21:32:43 mfs-master01-rj.btr heartbeat: [12255]: WARN: Late heartbeat: Node mfs-master02-rj.btr: interval 164510 ms Jun 25 21:32:43 mfs-master01-rj.btr heartbeat: [12255]: info: Status update for node mfs-master02-rj.btr: status active Jun 25 21:32:43 mfs-master01-rj.btr ipfail: [12269]: info: Link Status update: Link mfs-master02-rj.btr/em2 now has status up Jun 25 21:32:43 mfs-master01-rj.btr ipfail: [12269]: info: Status update: Node mfs-master02-rj.btr now has status active harc(default)[15038]: 2015/06/25_21:32:43 info: Running /etc/ha.d//rc.d/status status Jun 25 21:32:45 mfs-master01-rj.btr heartbeat: [12255]: WARN: Shutdown delayed until current resource activity finishes. Jun 25 21:32:46 mfs-master01-rj.btr heartbeat: [12255]: info: Heartbeat shutdown in progress. (12255) Jun 25 21:32:46 mfs-master01-rj.btr heartbeat: [12255]: info: Received shutdown notice from 'mfs-master02-rj.btr'. Jun 25 21:32:46 mfs-master01-rj.btr heartbeat: [12255]: info: Resource takeover cancelled - shutdown in progress. Jun 25 21:32:46 mfs-master01-rj.btr heartbeat: [15055]: info: Giving up all HA resources. ResourceManager(default)[15068]: 2015/06/25_21:32:46 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster ResourceManager(default)[15068]: 2015/06/25_21:32:46 info: Running /etc/ha.d/resource.d/mfsmaster stop ResourceManager(default)[15068]: 2015/06/25_21:32:48 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop Filesystem(Filesystem_/dev/drbd0)[15131]: 2015/06/25_21:32:48 INFO: Running stop for /dev/drbd0 on /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[15131]: 2015/06/25_21:32:48 INFO: Trying to unmount /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[15131]: 2015/06/25_21:32:48 INFO: unmounted /usr/local/mfs successfully /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[15123]: 2015/06/25_21:32:48 INFO: Success ResourceManager(default)[15068]: 2015/06/25_21:32:48 info: Running /etc/ha.d/resource.d/drbddisk drbd stop ResourceManager(default)[15068]: 2015/06/25_21:32:48 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop IPaddr(IPaddr_10.1.1.26)[15287]: 2015/06/25_21:32:48 INFO: IP status = ok, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[15261]: 2015/06/25_21:32:48 INFO: Success Jun 25 21:32:48 mfs-master01-rj.btr heartbeat: [15055]: info: All HA resources relinquished. Jun 25 21:32:49 mfs-master01-rj.btr heartbeat: [12255]: info: killing /usr/lib64/heartbeat/ipfail process group 12269 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBFIFO process 12258 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBWRITE process 12259 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBREAD process 12260 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBWRITE process 12261 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBREAD process 12262 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBWRITE process 12263 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBREAD process 12264 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBWRITE process 12265 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: killing HBREAD process 12266 with signal 15 Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12260 exited. 9 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12261 exited. 8 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12265 exited. 7 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12262 exited. 6 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12258 exited. 5 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12263 exited. 4 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12264 exited. 3 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12266 exited. 2 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Core process 12259 exited. 1 remaining Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: mfs-master01-rj.btr Heartbeat shutdown complete. Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Heartbeat restart triggered. Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Restarting heartbeat. Jun 25 21:32:51 mfs-master01-rj.btr heartbeat: [12255]: info: Performing heartbeat restart exec. Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [12255]: info: Pacemaker support: false Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [12255]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [12255]: info: ************************** Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [12255]: info: Configuration validated. Starting heartbeat 3.0.4 Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: heartbeat: version 3.0.4 Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Heartbeat generation: 1435221816 Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: glib: UDP multicast heartbeat started for group 225.0.0.192 port 694 interface em2 (ttl=1 loop=0) Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: glib: ping heartbeat started. Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: glib: ping heartbeat started. Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: glib: ping heartbeat started. Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: G_main_add_TriggerHandler: Added signal manual handler Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: G_main_add_TriggerHandler: Added signal manual handler Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Local status now set to: 'up' Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Status update for node 10.1.1.27: status ping Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Link 10.1.1.27:10.1.1.27 up. Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Link 10.1.1.1:10.1.1.1 up. Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Status update for node 10.1.1.1: status ping Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Status update for node 10.1.1.28: status ping Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Link 10.1.1.28:10.1.1.28 up. Jun 25 21:33:02 mfs-master01-rj.btr heartbeat: [15355]: info: Link mfs-master02-rj.btr:em2 up. Jun 25 21:33:03 mfs-master01-rj.btr heartbeat: [15355]: info: Status update for node mfs-master02-rj.btr: status up Jun 25 21:33:03 mfs-master01-rj.btr heartbeat: [15355]: info: Status update for node mfs-master02-rj.btr: status active Jun 25 21:33:03 mfs-master01-rj.btr heartbeat: [15355]: info: Comm_now_up(): updating status to active Jun 25 21:33:03 mfs-master01-rj.btr heartbeat: [15355]: info: Local status now set to: 'active' Jun 25 21:33:03 mfs-master01-rj.btr heartbeat: [15355]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (497,496) Jun 25 21:33:03 mfs-master01-rj.btr heartbeat: [15369]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 497 gid 496 (pid 15369) harc(default)[15368]: 2015/06/25_21:33:03 info: Running /etc/ha.d//rc.d/status status harc(default)[15388]: 2015/06/25_21:33:03 info: Running /etc/ha.d//rc.d/status status Jun 25 21:33:14 mfs-master01-rj.btr heartbeat: [15355]: info: local resource transition completed. Jun 25 21:33:14 mfs-master01-rj.btr heartbeat: [15355]: info: Initial resource acquisition complete (T_RESOURCES(us)) /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[15441]: 2015/06/25_21:33:14 INFO: Resource is stopped Jun 25 21:33:14 mfs-master01-rj.btr heartbeat: [15405]: info: Local Resource acquisition completed. harc(default)[15522]: 2015/06/25_21:33:14 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp ip-request-resp(default)[15522]: 2015/06/25_21:33:14 received ip-request-resp IPaddr::10.1.1.26/24/em4 OK yes ResourceManager(default)[15545]: 2015/06/25_21:33:14 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[15573]: 2015/06/25_21:33:14 INFO: Resource is stopped ResourceManager(default)[15545]: 2015/06/25_21:33:14 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start Jun 25 21:33:14 mfs-master01-rj.btr heartbeat: [15355]: info: remote resource transition completed. IPaddr(IPaddr_10.1.1.26)[15696]: 2015/06/25_21:33:14 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4 IPaddr(IPaddr_10.1.1.26)[15696]: 2015/06/25_21:33:14 INFO: Bringing device em4 up IPaddr(IPaddr_10.1.1.26)[15696]: 2015/06/25_21:33:15 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[15670]: 2015/06/25_21:33:15 INFO: Success ResourceManager(default)[15545]: 2015/06/25_21:33:15 info: Running /etc/ha.d/resource.d/drbddisk drbd start /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[15824]: 2015/06/25_21:33:15 INFO: Resource is stopped ResourceManager(default)[15545]: 2015/06/25_21:33:15 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 start Filesystem(Filesystem_/dev/drbd0)[15910]: 2015/06/25_21:33:15 INFO: Running start for /dev/drbd0 on /usr/local/mfs /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[15902]: 2015/06/25_21:33:15 INFO: Success ResourceManager(default)[15545]: 2015/06/25_21:33:15 info: Running /etc/ha.d/resource.d/mfsmaster start Jun 25 21:33:17 mfs-master01-rj.btr ipfail: [15369]: info: Ping node count is balanced. Jun 25 21:33:18 mfs-master01-rj.btr ipfail: [15369]: info: Giving up foreign resources (auto_failback). Jun 25 21:33:18 mfs-master01-rj.btr ipfail: [15369]: info: Delayed giveup in 4 seconds. Jun 25 21:33:22 mfs-master01-rj.btr ipfail: [15369]: info: giveup() called (timeout worked) Jun 25 21:33:22 mfs-master01-rj.btr heartbeat: [15355]: info: mfs-master01-rj.btr wants to go standby [foreign] Jun 25 21:33:23 mfs-master01-rj.btr heartbeat: [15355]: info: standby: mfs-master02-rj.btr can take our foreign resources Jun 25 21:33:23 mfs-master01-rj.btr heartbeat: [16016]: info: give up foreign HA resources (standby). Jun 25 21:33:23 mfs-master01-rj.btr heartbeat: [16016]: info: foreign HA resource release completed (standby). Jun 25 21:33:23 mfs-master01-rj.btr heartbeat: [15355]: info: Local standby process completed [foreign]. Jun 25 21:33:23 mfs-master01-rj.btr heartbeat: [15355]: WARN: 1 lost packet(s) for [mfs-master02-rj.btr] [27:29] Jun 25 21:33:23 mfs-master01-rj.btr heartbeat: [15355]: info: remote resource transition completed. Jun 25 21:33:23 mfs-master01-rj.btr heartbeat: [15355]: info: No pkts missing from mfs-master02-rj.btr! Jun 25 21:33:23 mfs-master01-rj.btr heartbeat: [15355]: info: Other node completed standby takeover of foreign resources.
mfsmaster备:
Jun 25 21:32:43 mfs-master02-rj.btr heartbeat: [6134]: CRIT: Cluster node mfs-master01-rj.btr returning after partition. Jun 25 21:32:43 mfs-master02-rj.btr heartbeat: [6134]: info: For information on cluster partitions, See URL: http://linux-ha.org/wiki/Split_Brain Jun 25 21:32:43 mfs-master02-rj.btr heartbeat: [6134]: WARN: Deadtime value may be too small. Jun 25 21:32:43 mfs-master02-rj.btr heartbeat: [6134]: info: See FAQ for information on tuning deadtime. Jun 25 21:32:43 mfs-master02-rj.btr heartbeat: [6134]: info: URL: http://linux-ha.org/wiki/FAQ#Heavy_Load Jun 25 21:32:43 mfs-master02-rj.btr heartbeat: [6134]: info: Link mfs-master01-rj.btr:em2 up. Jun 25 21:32:43 mfs-master02-rj.btr heartbeat: [6134]: WARN: Late heartbeat: Node mfs-master01-rj.btr: interval 164060 ms Jun 25 21:32:43 mfs-master02-rj.btr heartbeat: [6134]: info: Status update for node mfs-master01-rj.btr: status active Jun 25 21:32:43 mfs-master02-rj.btr ipfail: [6173]: info: Link Status update: Link mfs-master01-rj.btr/em2 now has status up Jun 25 21:32:43 mfs-master02-rj.btr ipfail: [6173]: info: Status update: Node mfs-master01-rj.btr now has status active harc(default)[7153]: 2015/06/25_21:32:43 info: Running /etc/ha.d//rc.d/status status Jun 25 21:32:45 mfs-master02-rj.btr heartbeat: [6134]: info: Heartbeat shutdown in progress. (6134) Jun 25 21:32:45 mfs-master02-rj.btr heartbeat: [7170]: info: Giving up all HA resources. ResourceManager(default)[7183]: 2015/06/25_21:32:45 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster ResourceManager(default)[7183]: 2015/06/25_21:32:45 info: Running /etc/ha.d/resource.d/mfsmaster stop ResourceManager(default)[7183]: 2015/06/25_21:32:45 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop Filesystem(Filesystem_/dev/drbd0)[7246]: 2015/06/25_21:32:45 INFO: Running stop for /dev/drbd0 on /usr/local/mfs /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[7238]: 2015/06/25_21:32:45 INFO: Success ResourceManager(default)[7183]: 2015/06/25_21:32:45 info: Running /etc/ha.d/resource.d/drbddisk drbd stop ResourceManager(default)[7183]: 2015/06/25_21:32:45 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop IPaddr(IPaddr_10.1.1.26)[7372]: 2015/06/25_21:32:45 INFO: IP status = no, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[7346]: 2015/06/25_21:32:45 INFO: Success Jun 25 21:32:45 mfs-master02-rj.btr heartbeat: [7170]: info: All HA resources relinquished. Jun 25 21:32:46 mfs-master02-rj.btr heartbeat: [6134]: info: killing /usr/lib64/heartbeat/ipfail process group 6173 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBWRITE process 6138 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBREAD process 6139 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBWRITE process 6140 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBREAD process 6141 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBWRITE process 6142 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBREAD process 6143 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBWRITE process 6144 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBREAD process 6145 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: killing HBFIFO process 6137 with signal 15 Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6139 exited. 9 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6143 exited. 8 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6144 exited. 7 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6141 exited. 6 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6137 exited. 5 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6145 exited. 4 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6140 exited. 3 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6142 exited. 2 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Core process 6138 exited. 1 remaining Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: mfs-master02-rj.btr Heartbeat shutdown complete. Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Heartbeat restart triggered. Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Restarting heartbeat. Jun 25 21:32:47 mfs-master02-rj.btr heartbeat: [6134]: info: Performing heartbeat restart exec. Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [6134]: info: Pacemaker support: false Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [6134]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [6134]: info: ************************** Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [6134]: info: Configuration validated. Starting heartbeat 3.0.4 Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: heartbeat: version 3.0.4 Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: Heartbeat generation: 1435221806 Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: glib: UDP multicast heartbeat started for group 225.0.0.192 port 694 interface em2 (ttl=1 loop=0) Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: glib: ping heartbeat started. Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: glib: ping heartbeat started. Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: glib: ping heartbeat started. Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: G_main_add_TriggerHandler: Added signal manual handler Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: G_main_add_TriggerHandler: Added signal manual handler Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: Local status now set to: 'up' Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: Status update for node 10.1.1.28: status ping Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: Link 10.1.1.28:10.1.1.28 up. Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: Status update for node 10.1.1.27: status ping Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: Link 10.1.1.27:10.1.1.27 up. Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: Link 10.1.1.1:10.1.1.1 up. Jun 25 21:32:58 mfs-master02-rj.btr heartbeat: [7431]: info: Status update for node 10.1.1.1: status ping Jun 25 21:33:02 mfs-master02-rj.btr heartbeat: [7431]: info: Link mfs-master01-rj.btr:em2 up. Jun 25 21:33:02 mfs-master02-rj.btr heartbeat: [7431]: info: Status update for node mfs-master01-rj.btr: status up harc(default)[7443]: 2015/06/25_21:33:02 info: Running /etc/ha.d//rc.d/status status Jun 25 21:33:02 mfs-master02-rj.btr heartbeat: [7431]: info: Comm_now_up(): updating status to active Jun 25 21:33:02 mfs-master02-rj.btr heartbeat: [7431]: info: Local status now set to: 'active' Jun 25 21:33:02 mfs-master02-rj.btr heartbeat: [7431]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (497,496) Jun 25 21:33:02 mfs-master02-rj.btr heartbeat: [7461]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 497 gid 496 (pid 7461) Jun 25 21:33:03 mfs-master02-rj.btr heartbeat: [7431]: info: Status update for node mfs-master01-rj.btr: status active harc(default)[7464]: 2015/06/25_21:33:03 info: Running /etc/ha.d//rc.d/status status Jun 25 21:33:10 mfs-master02-rj.btr ipfail: [7461]: info: Status update: Node mfs-master01-rj.btr now has status active Jun 25 21:33:14 mfs-master02-rj.btr heartbeat: [7431]: info: remote resource transition completed. Jun 25 21:33:14 mfs-master02-rj.btr heartbeat: [7431]: info: remote resource transition completed. Jun 25 21:33:14 mfs-master02-rj.btr heartbeat: [7431]: info: Initial resource acquisition complete (T_RESOURCES(us)) Jun 25 21:33:14 mfs-master02-rj.btr heartbeat: [7481]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys mfs-master02-rj.btr] to acquire. Jun 25 21:33:14 mfs-master02-rj.btr ipfail: [7461]: info: Asking other side for ping node count. Jun 25 21:33:18 mfs-master02-rj.btr ipfail: [7461]: info: No giveup timer to abort. Jun 25 21:33:22 mfs-master02-rj.btr heartbeat: [7431]: info: mfs-master01-rj.btr wants to go standby [foreign] Jun 25 21:33:23 mfs-master02-rj.btr heartbeat: [7431]: info: standby: acquire [foreign] resources from mfs-master01-rj.btr Jun 25 21:33:23 mfs-master02-rj.btr heartbeat: [7494]: info: acquire local HA resources (standby). Jun 25 21:33:23 mfs-master02-rj.btr heartbeat: [7494]: info: local HA resource acquisition completed (standby). Jun 25 21:33:23 mfs-master02-rj.btr heartbeat: [7431]: info: Standby resource acquisition done [foreign]. Jun 25 21:33:24 mfs-master02-rj.btr heartbeat: [7431]: info: remote resource transition completed.
故障恢复后,mfsmaster主备服务器的状态:
mfsmaster主:
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 16004 1 0 21:33 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:9932 nr:2556 dw:6048 dr:13372 al:9 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfsmaster备:
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:2556 nr:10040 dw:106672196 dr:24084 al:16 bm:6421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfs客户端数据恢复信息:
Thu Jun 25 20:19:48 CST 2015 4095 Thu Jun 25 20:19:49 CST 2015 4096 Thu Jun 25 20:19:50 CST 2015 4097 Thu Jun 25 20:19:51 CST 2015 4098 ############### Thu Jun 25 20:20:30 CST 2015 4099 ############### Thu Jun 25 20:20:31 CST 2015 4100 Thu Jun 25 20:20:32 CST 2015 4101 Thu Jun 25 20:20:33 CST 2015 4102
故障3:mfsmaster主的drbd同步网络断开
模拟故障:
拔掉mfsmaster主的drbd同步的网卡em3
故障发生之前的mfsmaster主备服务器状态:
mfsmaster主:
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 16004 1 0 21:33 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:10348 nr:2556 dw:6464 dr:13372 al:9 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfsmaster备:
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:2556 nr:10376 dw:106672532 dr:24084 al:16 bm:6421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
故障发生时,mfsmaster主备服务器的日志信息:
mfsmaster主:
无日志
mfsmaster主:
无日志
故障发生后,mfsmaster主备服务器的状态:
mfsmaster主:
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 16004 1 0 21:33 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----ns:10508 nr:2556 dw:7004 dr:13372 al:9 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:308
mfsmaster备:
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----ns:2556 nr:10500 dw:106672656 dr:24084 al:16 bm:6421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfs客户端数据恢复信息:
Thu Jun 25 20:24:15 CST 2015 4321 Thu Jun 25 20:24:16 CST 2015 4322 Thu Jun 25 20:24:17 CST 2015 4323 ###### Thu Jun 25 20:24:23 CST 2015 4324 ###### Thu Jun 25 20:24:24 CST 2015 4325 Thu Jun 25 20:24:25 CST 2015 4326 Thu Jun 25 20:24:26 CST 2015 4327
备注:
当mfsmaster的drbd同步网络断开之后,在系统的中断上会报如下信息:
恢复故障3:mfsmaster主的drbd同步网络断开
模拟故障:
插上mfsmaster主的drbd同步的网卡em3
故障恢复时,mfsmaster主备服务器的日志信息:
mfsmaster主:
无日志
mfsmaster备:
无日志
故障恢复后,mfsmaster主备服务器的状态:
mfsmaster主:
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 16004 1 0 21:33 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.27/24 brd 10.1.1.255 scope global em4inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:340: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----ns:9932 nr:2556 dw:6048 dr:13372 al:9 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfsmaster备:
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:330: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----ns:2556 nr:10040 dw:106672196 dr:24084 al:16 bm:6421 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfs客户端数据恢复信息:
客户端无任何干扰
经过本次测试,目前线上的MooseFS运行稳定,灾难测试很成功,服务的高可用性得到了验证。
OK!以上即是本次博文的内容,希望能对51博友有所帮助!
转载于:https://blog.51cto.com/nolinux/1665967
MooseFS灾备演练实录相关推荐
- 从模拟切换到真实业务接管,看zCloud如何助力银行灾备演练升级
点击上方"蓝字" 关注我们,享更多干货! 近年来,随着国家和行业监管部门对银行信息系统风险管理的要求不断提高,<商业银行数据中心监管指引>和<商业银行业务连续性监 ...
- 云和恩墨BethuneX助力红塔银行实现可视化灾备演练
点击上方蓝字,记得关注我们! 云南红塔银行于近日完成了对运行在同城双中心机房Exadata一体机上的数据库进行主备切换及回切演练,并通过云和恩墨自主研发的数据库实时监控和智能巡检平台Bethune X ...
- 中小型金融企业该如何进行灾备建设?
来自:DBAplu社群 本文根据岑崟老师在[2019 DAMS中国数据智能管理峰会]现场演讲内容整理而成. 讲师介绍 岑崟,某fintech公司运维主管,负责应用运维,对DevOps抱有极大热情.曾任 ...
- 某股份制商业银行数据中心灾备建设经验
一.灾备定义与演进 灾备是指组织的灾难备援.在信息化的IT系统中,灾备是指在灾难未发生前,利用 IT 技术对信息系统的数据和应用程序进行保护,包括本地及异地的数据备份.应用和场所的接管等,确保系统遭受 ...
- 大话oraclerac集群、高可用性、备份与恢复_数腾Oracle RAC数据库灾备解决方案
"一个系统包含很多模块,数据库.前端.缓存.搜索.消息队列等,每个模块都需要做到高可用,才能保证整个系统的高可用." 数据库作为现代信息社会的基石,几乎所有的计算机应用软件都构建于 ...
- DT时代下 数据库灾备的探索与实践
170余场主题峰会和分论坛完美呈现,上千位分享嘉宾.数万名创新创业导师齐聚一堂,刚刚结束的2018杭州云栖大会让云栖小镇又一次成为探索数字世界的中心. 随着DT时代的到来,企业对数据的依赖程度与日俱增 ...
- 优维助力国内某省级商业银行同城异地灾备自动化建设
银监会在<商业银行数据中心监管指引>中明确要求"商业银行每年至少进行一次重要信息系统专项灾备切换演练,每三年至少一次重要信息系统全面灾备切换演练,以真实业务接管为目标,验证灾备系 ...
- 科力锐助力政务云统一灾备中心建设
一.政务云建设趋势 随着我国政府向公共服务型政府转型,政府对民生问题的重视不断加强,通过搭建政务云,对政府管理和服务职能进行精简.优化.整合,并通过信息化手段在政务上实现各种业务流程办理和职能服务,经 ...
- 云灾备、云容灾、云备份、数据库上云、线下线上云灾备、灾备有云等
云灾备.云容灾.云备份.数据库上云.线下线上云灾备等 基于云计算技术,灾难恢复系统的成本低,恢复速度快.未来,云灾备解决方案将为用户提供线上线下多态多云的数据复制.备份.恢复和接管,而灾备也会成为一种 ...
最新文章
- 「镁客·请讲」必捷网络王振中:融合视联网平台将从根本上治疗实时通讯行业应用痛点...
- 记录自己最近犯得一些傻事
- vector notes
- Perl中use、require的用法和区别
- AI入门:不用任何公式把主成分分析讲清楚
- WinXP的EFS加密文件如何解密?
- java自定义注解学习笔记
- Leetcode--714. 买卖股票的最佳时间含手续费
- AI学习笔记(三)特征选择与提取、边缘提取
- AmazeUI 图标的示例代码
- 计算机处理器(CPU)基础
- AJAX 提交表单以及文件上传
- mysql 支持中文的字符集_MySQL 中文显示乱码
- 计算机用户密码在哪里设置,电脑开机密码在哪里设置?怎么设置?
- 计算机怎么一键返回桌面,回到桌面的快捷键是啥_快速桌面快捷键怎么操作-win7之家...
- codeforces 348A Mafia
- 挑战程序设计(算法和数据结构)—九宫格
- Android Studio 一键切换界面风格
- jq 之 download下载图片或文件功能,以及一个神奇的download属性!
- c语言出圈问题10人,第38套题:c语言出圈问题分析.doc
热门文章
- WCF入门(八)---WCF服务绑定
- [SHELL]判断一个命令是否存在
- .NET精品文章系列(一)
- 有没有安卓4.0的java模拟器_电脑端安装Android4.0模拟器使用教程
- 服务器装系统提示获取分区失败,u盘安装系统分区错误解决方法
- 河南理工大学计算机专业几本,2018河南理工大学是几本 是一本还是二本
- linux开发 stc_Linux环境下搭建STC单片机平台的指南
- python源码不需要编译成二进制代码_编译 Python 源代码
- 笔记本平板电脑推荐_平板电脑推荐表(2020.618)
- python内置函数可以返回数值型序列中所有元素之和_Python内置函数________________用来返回数值型序列中所有元素之和。...