基于Zabbix IPMI监控服务器硬件状况
公司有多个分部,且机房没有专业值班,机房等级不够。在这种情况下,又想实时监控机房环境,于是使用IPMI方式来达到目的。由于之前已经部署了Zabbix监控系统,本次将结合Zabbix自带的IPMI,完成服务器温度及风扇转速等的监控。
1.环境说明
被监控端服务器型号:Dell PowerEdge R510
规划分配的IPMI地址: 10.103.1.100
2.Zabbix监控平台说明
Zabbix版本: 3.2.1,在安装时,未使用--with-openipmi
Zabbix网络接口可以连通10.103.1.100
3.前置学习
维基百科IPMI: http://zh.wikipedia.org/wiki/IPMI
IBM DeveloperWorks -- 使用ipmitool实现Linux系统下对服务器的ipmi管理:http://www.ibm.com/developerworks/cn/linux/l-ipmi/
Dell -- Managing Dell PowerEdge Servers Using IPMItool:http://www.dell.com/downloads/global/power/ps4q04-20040204-Murphy.pdf
Zabbix IPMI checks:https://www.zabbix.com/documentation/3.2/manual/config/items/itemtypes/ipmi
使用IPMITOOL实现终端重定向(课外读物):http://docs.linuxtone.org/ebooks/Dell/ipmitool.pdf
4.配置IPMI
4.1.配置IPMI地址
可以参考前置推荐中的《Managing Dell PowerEdge Servers Using IPMItool》在服务器启动时进行IPMI地址的配置,并开启IPMI Over LAN。
也可以使用Dell的iDRAC开启IPMI功能,具体可以查看文章最后的参考资料。
4.2.获取传感器信息
登录Zabbix服务器,通过ipmitool远程访问Dell服务器传感器信息
# ipmitool -I lan -H 10.103.1.100 -U root -P calvin -L user sensor list # ipmitool -I lan -H 10.103.1.100 -U root -P calvin -L user sensor get "FAN MOD 1B RPM"
4.3.安装IPMItool软件包
# yum -y install OpenIPMI OpenIPMI-devel ipmitool freeipmi
4.4.配置Zabbix
注:为了支持IPMI,需要在zabbix server/proxy安装时增加--with-openipmi参数
服务器端配置zabbix IPMI pollers
zabbix_server.conf/zabbix_proxy.conf
# sed -i '/# StartIPMIPollers=0/aStartIPMIPollers=5' zabbix_server.conf # service zabbix-server restart
4.5.导入监控模板
下面提供DELL的2个型号的IPMI模板:
template-ipmi-dell-poweredge-r510
template-ipmi-dell-poweredge-2950
添加监控主机,关联上本模板,并在IPMI页面,设置Authentication algorithm为Default,Privilege level为User, Username为sensor, Password为sensor_pass,保存即可。
使用此种方法获取数据的结果就是效率很差,基本没什么数据。
5.使用Zabbix External checks自定义IPMI
本来是选择nagios的IPMI插件:check_ipmi_sensor,文件是:check_ipmi_sensor_v3-v3.9.tar.gz
具体使用方法详见:http://www.thomas-krenn.com/en/wiki/IPMI_Sensor_Monitoring_Plugin
5.1.安装perl-IPC-Run模块
yum -y install perl-IPC-Run perl-Getopt-Long
5.2.使用check_ipmi_sensor查看效果
但是发现报错。
# ./check_ipmi_sensor -f ipmi.cfg -H 10.103.1.100 -vvv ------------- debug output for sel (-vvv is set): ------------/usr/sbin/ipmi-sel was executed with the following parameters:/usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-namesoutput of FreeIPMI: ID | Date | Time | Name | Type | State | Event 1 | Apr-08-2011 | 06:42:13 | System Board SEL | Event Logging Disabled | Nominal | Log Area Reset/Cleared 2 | Jan-01-1970 | 08:00:31 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 3 | Jan-01-1970 | 08:00:36 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 4 | Aug-15-2011 | 23:09:53 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 5 | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 6 | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 7 | Aug-16-2011 | 11:38:55 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 8 | Jun-10-2012 | 22:41:13 | System Board Ambient Temp | Temperature | Warning | Upper Non-critical - going high ; Sensor Reading = 45.00 C ; Threshold = 45.00 C 9 | Jun-11-2012 | 02:53:53 | System Board Ambient Temp | Temperature | Nominal | Upper Non-critical - going high ; Sensor Reading = 43.00 C ; Threshold = 45.00 C 10 | Nov-05-2012 | 21:56:42 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 11 | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 12 | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 13 | Nov-14-2012 | 21:54:19 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 14 | Nov-15-2012 | 16:12:03 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 15 | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 16 | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 17 | Nov-17-2012 | 17:15:40 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 18 | Nov-19-2012 | 20:47:57 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 19 | Nov-19-2012 | 20:50:04 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 20 | Jan-01-1970 | 08:00:33 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 21 | Jan-01-1970 | 08:00:38 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 22 | Jun-27-2014 | 17:27:38 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 23 | Jun-27-2014 | 17:27:53 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 24 | Jan-01-1970 | 08:00:31 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 25 | Jan-01-1970 | 08:00:36 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 26 | Oct-31-2016 | 05:48:35 | System Board Ambient Temp | Temperature | Warning | Lower Non-critical - going low ; Sensor Reading = 8.00 C ; Threshold = 8.00 C 27 | Oct-31-2016 | 09:00:38 | System Board Ambient Temp | Temperature | Nominal | Lower Non-critical - going low ; Sensor Reading = 10.00 C ; Threshold = 8.00 C ------------- debug output for sensors (-vvv is set): ------------script was executed with the following parameters:./check_ipmi_sensor -f ipmi.cfg -H 10.103.1.100 -vvvcheck_ipmi_sensor version:3.9FreeIPMI version:ipmi-sensors - 1.2.9FreeIPMI was executed with the following parameters:/usr/sbin/ipmi-sensors -h 10.103.1.100 --config-file ipmi.cfg --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors --driver-type=LAN_2_0 --output-sensor-thresholdsFreeIPMI return code: 0output of FreeIPMI: Record ID | Sensor Name | Sensor Group | Monitoring Status | Sensor Units | Sensor Reading 5 | Ambient Temp | Temperature | Nominal | C | 28.000000 7 | CMOS Battery | Battery | Nominal | N/A | 'OK' 8 | VCORE PG | Voltage | Nominal | N/A | 'State Deasserted' 9 | VCORE PG | Voltage | Nominal | N/A | 'State Deasserted' 10 | 0.75 VTT PG | Voltage | Nominal | N/A | 'State Deasserted' 11 | 0.75 VTT PG | Voltage | Nominal | N/A | 'State Deasserted' 12 | CPU VTT PG | Voltage | Nominal | N/A | 'State Deasserted' 13 | 1.5V PG | Voltage | Nominal | N/A | 'State Deasserted' 14 | 1.8V PG | Voltage | Nominal | N/A | 'State Deasserted' 15 | 5V PG | Voltage | Nominal | N/A | 'State Deasserted' 16 | MEM CPU2 FAIL | Voltage | Nominal | N/A | 'State Deasserted' 17 | 5V Riser PG | Voltage | Nominal | N/A | 'State Deasserted' 18 | MEM CPU1 FAIL | Voltage | Nominal | N/A | 'State Deasserted' 19 | VTT CPU2 FAIL | Voltage | Nominal | N/A | 'State Deasserted' 20 | VTT CPU1 FAIL | Voltage | Nominal | N/A | 'State Deasserted' 21 | 0.9V PG | Voltage | Nominal | N/A | 'State Deasserted' 22 | CPU2 1.8 PLL PG | Voltage | Nominal | N/A | 'State Deasserted' 23 | CPU1 1.8 PLL PG | Voltage | Nominal | N/A | 'State Deasserted' 24 | 1.1 FAIL | Voltage | Nominal | N/A | 'State Deasserted' 25 | 1.0 LOM FAIL | Voltage | Nominal | N/A | 'State Deasserted' 26 | 1.0 AUX FAIL | Voltage | Nominal | N/A | 'State Deasserted' 27 | Heatsink Pres | Entity Presence | Nominal | N/A | 'Entity Present' 28 | iDRAC6 Ent Pres | Entity Presence | Critical | N/A | 'Entity Absent' 29 | USB Cable Pres | Entity Presence | Nominal | N/A | 'Entity Present' 31 | Riser Presence | Entity Presence | Nominal | N/A | 'Entity Present' 32 | FAN MOD 1A RPM | Fan | Nominal | RPM | 3480.000000 34 | FAN MOD 2A RPM | Fan | Nominal | RPM | 3480.000000 36 | FAN MOD 3A RPM | Fan | Nominal | RPM | 3480.000000 39 | FAN MOD 4A RPM | Fan | Nominal | RPM | 3480.000000 40 | Presence | Entity Presence | Nominal | N/A | 'Entity Present' 41 | Presence | Entity Presence | Nominal | N/A | 'Entity Present' 42 | Presence | Entity Presence | Nominal | N/A | 'Entity Present' 43 | Presence | Entity Presence | Nominal | N/A | 'Entity Present' 44 | Presence | Entity Presence | Nominal | N/A | 'Entity Present' 45 | Status | Processor | Nominal | N/A | 'Processor Presence detected' 46 | Status | Processor | Nominal | N/A | 'Processor Presence detected' 47 | Status | Power Supply | Nominal | N/A | 'Presence detected' 48 | Current | Current | Nominal | A | 0.400000 49 | Current | Current | Nominal | A | 0.400000 50 | Voltage | Voltage | Nominal | V | 218.000000 51 | Voltage | Voltage | Nominal | V | 218.000000 52 | Status | Power Supply | Nominal | N/A | 'Presence detected' 53 | Status | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected' 54 | OS Watchdog | Watchdog 2 | Nominal | N/A | 'OK' 56 | Intrusion | Physical Security | Nominal | N/A | 'OK' 57 | PS Redundancy | Power Supply | Nominal | N/A | 'Fully Redundant' 58 | Fan Redundancy | Fan | Nominal | N/A | 'Fully Redundant' 60 | System Level | Current | Nominal | W | 168.000000 61 | Power Optimized | OEM Reserved | Nominal | N/A | 'Good' 62 | Drive | Drive Slot | Nominal | N/A | 'Drive Presence' 65 | Cable SAS A | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected' 66 | Cable SAS B | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected' 67 | DKM Status | OEM Reserved | N/A | N/A | 'OEM Event = 0000h' 119 | FAN MOD 5A RPM | Fan | Nominal | RPM | 3480.000000--------------------- end of debug output --------------------- IPMI Status: Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in string ne at ./check_ipmi_sensor line 737. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in string ne at ./check_ipmi_sensor line 749. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750. Use of uninitialized value in string ne at ./check_ipmi_sensor line 759. Critical [iDRAC6 Ent Pres = Critical ('Entity Absent'), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), System Board Ambient Temp = Warning (Temperature), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Ambient Temp = Warning (Temperature)] | 'Ambient Temp'=28.000000;:;: 'FAN MOD 1A RPM'=3480.000000;:;: 'FAN MOD 2A RPM'=3480.000000;:;: 'FAN MOD 3A RPM'=3480.000000;:;: 'FAN MOD 4A RPM'=3480.000000;:;: 'Current'=0.400000;:;: 'Current'=0.400000;:;: 'Voltage'=218.000000;:;: 'Voltage'=218.000000;:;: 'System Level'=168.000000;:;: 'FAN MOD 5A RPM'=3480.000000;:;: Ambient Temp = 28.000000 (Status: Nominal) CMOS Battery = 'OK' (Status: Nominal) VCORE PG = 'State Deasserted' (Status: Nominal) VCORE PG = 'State Deasserted' (Status: Nominal) 0.75 VTT PG = 'State Deasserted' (Status: Nominal) 0.75 VTT PG = 'State Deasserted' (Status: Nominal) CPU VTT PG = 'State Deasserted' (Status: Nominal) 1.5V PG = 'State Deasserted' (Status: Nominal) 1.8V PG = 'State Deasserted' (Status: Nominal) 5V PG = 'State Deasserted' (Status: Nominal) MEM CPU2 FAIL = 'State Deasserted' (Status: Nominal) 5V Riser PG = 'State Deasserted' (Status: Nominal) MEM CPU1 FAIL = 'State Deasserted' (Status: Nominal) VTT CPU2 FAIL = 'State Deasserted' (Status: Nominal) VTT CPU1 FAIL = 'State Deasserted' (Status: Nominal) 0.9V PG = 'State Deasserted' (Status: Nominal) CPU2 1.8 PLL PG = 'State Deasserted' (Status: Nominal) CPU1 1.8 PLL PG = 'State Deasserted' (Status: Nominal) 1.1 FAIL = 'State Deasserted' (Status: Nominal) 1.0 LOM FAIL = 'State Deasserted' (Status: Nominal) 1.0 AUX FAIL = 'State Deasserted' (Status: Nominal) Heatsink Pres = 'Entity Present' (Status: Nominal) iDRAC6 Ent Pres = 'Entity Absent' (Status: Critical) USB Cable Pres = 'Entity Present' (Status: Nominal) Riser Presence = 'Entity Present' (Status: Nominal) FAN MOD 1A RPM = 3480.000000 (Status: Nominal) FAN MOD 2A RPM = 3480.000000 (Status: Nominal) FAN MOD 3A RPM = 3480.000000 (Status: Nominal) FAN MOD 4A RPM = 3480.000000 (Status: Nominal) Presence = 'Entity Present' (Status: Nominal) Presence = 'Entity Present' (Status: Nominal) Presence = 'Entity Present' (Status: Nominal) Presence = 'Entity Present' (Status: Nominal) Presence = 'Entity Present' (Status: Nominal) Status = 'Processor Presence detected' (Status: Nominal) Status = 'Processor Presence detected' (Status: Nominal) Status = 'Presence detected' (Status: Nominal) Current = 0.400000 (Status: Nominal) Current = 0.400000 (Status: Nominal) Voltage = 218.000000 (Status: Nominal) Voltage = 218.000000 (Status: Nominal) Status = 'Presence detected' (Status: Nominal) Status = 'Cable/Interconnect is connected' (Status: Nominal) OS Watchdog = 'OK' (Status: Nominal) Intrusion = 'OK' (Status: Nominal) PS Redundancy = 'Fully Redundant' (Status: Nominal) Fan Redundancy = 'Fully Redundant' (Status: Nominal) System Level = 168.000000 (Status: Nominal) Power Optimized = 'Good' (Status: Nominal) Drive = 'Drive Presence' (Status: Nominal) Cable SAS A = 'Cable/Interconnect is connected' (Status: Nominal) Cable SAS B = 'Cable/Interconnect is connected' (Status: Nominal) FAN MOD 5A RPM = 3480.000000 (Status: Nominal)不过根据它的提示(其实插件也是调用如下命令),可以使用/usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names执行结果是:# /usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names ID | Date | Time | Name | Type | State | Event 1 | Apr-08-2011 | 06:42:13 | System Board SEL | Event Logging Disabled | Nominal | Log Area Reset/Cleared 2 | Jan-01-1970 | 08:00:31 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 3 | Jan-01-1970 | 08:00:36 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 4 | Aug-15-2011 | 23:09:53 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 5 | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 6 | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 7 | Aug-16-2011 | 11:38:55 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 8 | Jun-10-2012 | 22:41:13 | System Board Ambient Temp | Temperature | Warning | Upper Non-critical - going high ; Sensor Reading = 45.00 C ; Threshold = 45.00 C 9 | Jun-11-2012 | 02:53:53 | System Board Ambient Temp | Temperature | Nominal | Upper Non-critical - going high ; Sensor Reading = 43.00 C ; Threshold = 45.00 C 10 | Nov-05-2012 | 21:56:42 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 11 | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 12 | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 13 | Nov-14-2012 | 21:54:19 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 14 | Nov-15-2012 | 16:12:03 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 15 | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 16 | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2 | Drive Slot | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 17 | Nov-17-2012 | 17:15:40 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 18 | Nov-19-2012 | 20:47:57 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 19 | Nov-19-2012 | 20:50:04 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 20 | Jan-01-1970 | 08:00:33 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 21 | Jan-01-1970 | 08:00:38 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 22 | Jun-27-2014 | 17:27:38 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 23 | Jun-27-2014 | 17:27:53 | Disk Drive Bay 1 Drive 2 | Drive Slot | Nominal | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h 24 | Jan-01-1970 | 08:00:31 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 25 | Jan-01-1970 | 08:00:36 | System Board Intrusion | Physical Security | Critical | General Chassis Intrusion ; Intrusion while system Off 26 | Oct-31-2016 | 05:48:35 | System Board Ambient Temp | Temperature | Warning | Lower Non-critical - going low ; Sensor Reading = 8.00 C ; Threshold = 8.00 C 27 | Oct-31-2016 | 09:00:38 | System Board Ambient Temp | Temperature | Nominal | Lower Non-critical - going low ; Sensor Reading = 10.00 C ; Threshold = 8.00 C
5.3编写Zabbix外部检查(External checks)脚本
# pwd /usr/local/zabbix/share/zabbix/externalscripts # cat check_ipmi
下面是脚本内容
#!/bin/bash #用于检测ipmi相关信息 #Create on 2016-011-18 #@author: Chinge_Yangargs="$*" echo $(date +%F-%T) $args >> /tmp/check_ipmi.debugcheck_ipmi_dir=/usr/local/zabbix/shell/check_ipmi_sensor check_ipmi_bin=$check_ipmi_dir/check_ipmi_sensoripmi_sensors=/usr/sbin/ipmi-sensors ipmi_cfg=$check_ipmi_dir/ipmi.cfg#$check_ipmi_bin -f $ipmi_cfg -v $args #${ipmi_sel} $args --config-file $ipmi_cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names options="--quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors --driver-type=LAN_2_0 --output-sensor-thresholds"function usage(){echo "Usage: `basename $0` options (-h HOST|-n NAME)" }function check(){result=$($ipmi_sensors -h $host --config-file $ipmi_cfg $options|grep "$name"|awk -F"| " '{print $NF}')
printf "%.4f\n" $result }if [ $# -lt 4 ] thenusageexit 55 fi # 用法: scriptname -options # 注意: 必须使用破折号 (-) # 参数后接冒号,表示必须接值 while getopts ":h:n:" Option;docase $Option inh)host=$OPTARG;;n)name=$OPTARG;;*)usage;; # 默认情况的处理esac doneshift $(($OPTIND - 1)) # (译者注: shift命令是可以带参数的, 参数就是移动的个数) # 将参数指针减1, 这样它将指向下一个参数. # $1 现在引用的是命令行上的第一个非选项参数, #+ 如果有一个这样的参数存在的话.checkexit 0
添加执行权限
chmod a+x check_ipmi
5.4新建自定义模板
这里就不详细介绍内容了,其实就是改改上文中的模板而来,一张图看完内容:
给2张图看看效果:
好吧,最后发现,就算是自定义脚本,仍然是获取数据艰难,脚本执行ipmi的命令都timeout。。。。
参考资料:
http://pengyao.org/zabbix-monitor-ipmi-1.html
http://zh.community.dell.com/techcenter/w/techcenter_wiki/189.idrac-7
http://www.weibo.com/p/1001603921723593500304
http://www.thomas-krenn.com/en/wiki/IPMI_Sensor_Monitoring_Plugin
转载于:https://blog.51cto.com/ygqygq2/1874277
基于Zabbix IPMI监控服务器硬件状况相关推荐
- 从用户体验谈Zabbix与监控宝的差异和互补
无论是普通的个人站长还是专业的运维人员,都需要对自己的网站.服务器进行全面的监控.一来,我们可以随时监控到网络组件的运行状态.服务器的安全和稳定性状态:二来,我们可以通过监控分析来判断所使用的云服务是 ...
- zabbix监控方式(02) - zabbix通过IPMI监控硬件环境(温度和风扇)
通过IPMItools达到收集数据的目的,然后通过与zabbix的结合实现数据统计和报警. 1.IPMI的配置: 1.1介绍 IPMI(Intelligent PlatformManagement I ...
- java获取系统硬件温度,zabbix通过IPMI监控硬件环境(温度和风扇)
通过IPMItools达到收集数据的目的,然后通过与zabbix的结合实现数据统计和报警. 1.IPMI的配置: 1.1介绍 IPMI(Intelligent PlatformManagement I ...
- 使用Zabbix进行IPMI监控
转载来源 : 使用Zabbix进行IPMI监控 :jianshu.com/p/b9c9b87cde82 IPMI监控 IPMI监控前,请确保服务器已配置IPMI地址,并开启IPMI功能. 1.安装IP ...
- Linux Zabbix——企业监控基于钉钉、企业微信实现自动化报警
zabbix基于企业微信.钉钉群聊机器人实现自动化报警 在企业中,我们使用zabbix去进行监控,需要时刻关注应用服务的运行情况,这就少不了实时报警,而使用即时通讯软件报警相对就很方便. 监控环境准备 ...
- zabbix基于SNMP 协议监控路由器
zabbix基于SNMP 协议监控路由器 步骤 步骤超级方便. 1. 路由器上开启snmp 2. 确保外网能访问到 3. 用snmpwalk测试 4. 添加zabbix主机,SNMP interfac ...
- zabbix全网监控介绍
zabbix全网监控 面试题:你公司监控什么? 利用OSI/IOS7层协议进行回答 应用层:监控协议http/https/ssh 如nginx mysql rsync tomcat(进程,URL地址. ...
- linux下构建Zabbix网络监控平台
linux下构建Zabbix网络监控平台 由于图片过多,本人不想一张一张上传,请下载我的详细文章: linux下构建zabbix网络监控平台[技术文档](河南-清小小)-下载地址: http://do ...
- Linux下系统如何监控服务器硬件、操作系统、应用服务和业务
1.Linux监控概述 Linux服务器要保证系统的高可用性,需要实时了解到服务器的硬件.操作系统.应用服务等的运行状况,各项性能指标是否正常,需要使用各种LINUX命令.做到自动化运维就需要,将上述 ...
最新文章
- python 贴吧自动回复机-python借助wxpy与图灵实现微信机器人自动回复消息
- python php linux-Apache同时支持PHP和Python的配置方法
- shell中while循环案例
- 【MCTalk Live】网易对话谷歌:如何成为一个顶50个的A+++++程序员
- vue修改计算属性的值_Vue语法高级之计算属性和侦听器
- unity fixedupdate_Unity之滚球游戏(上)
- 中文深度学习入门书:小白易入,课程、实战项目全有 | 五位导师联合出品
- 带外壳版本4G LTE模块,包括华为ME909系列、移远EC20系列、移远EC200T系列
- 毕业设计专用 完全开源 基于Java的房屋租赁管理系统
- flash绘制荷花多个图层_《荷花》Flash动画课件
- 关于LSF的高级用法
- TF/06_Neural_Networks/01_Introduction02gate03activate fuctions
- hadoop大数据工程师、数据开发工程师、数据仓库工程师 面试题目分享
- 二:统计基础:描述统计
- Ant Chmod命令详解
- 农林大学有计算机专业吗,福建农林大学计算机与信息学院研究生招生
- 告白墙php搭建教程,ThinkPHP内核 校园表白墙源码 校园恋爱微信表白墙V源码下载 含详细说明...
- css实现3D魔方效果
- jQuery知识梳理总结
- 青海特色美食制作工艺数字化保护平台