1、监控服务器是否重启

  - alert: CentosServiceRestartexpr: time() - node_boot_time_seconds < 180for: 2mlabels:severity: warningannotations:summary: "Instance is restart"description: "Instance is restarted, uptime <3min"
  - alert: WindowsServiceRestartexpr: time() - windows_system_system_up_time < 180for: 2mlabels:severity: warningannotations:summary: "Instance is restart"description: "Instance is restarted, uptime <3min"

2、内存使用过高

  - alert: InstanceMemUsageHighexpr: 100 - (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)*100 > 98for: 2mlabels:severity: criticalannotations:summary: "Memory usage high"description: "Memory usage above 98%.(current usage: {{ $value }}%)"
  - alert: WinInstanceMemUsageHighexpr: 100-(windows_os_physical_memory_free_bytes/windows_cs_physical_memory_bytes)*100 > 98for: 3mlabels:severity: criticalannotations:summary: "Instance memory usage high"description: "Instance memory usage above 98%.(current usage: {{ $value }}%)"

3、CPU使用过高

  - alert: CPUUsageHighexpr: 100-(avg(irate(node_cpu_seconds_total[2m])) by (instance,region) *100) > 90for: 3mlabels:severity: warningannotations:summary: "CPU usage high"description: "CPU usage above 90%.(current usage: {{ $value }})"
  - alert: WinCpuUsageexpr: 100 - (avg by (instance,region) (irate(windows_cpu_time_total{mode="idle"}[2m])) * 100) > 90for: 3mlabels:severity: warningannotations:summary: "Instance CPU usage high"description: "Instance CPU Usage is more than 90%.(current usage: {{ $value }}%)"

4、磁盘使用率过高

  - alert: DiskUsageHighexpr: 100 - (node_filesystem_avail_bytes{fstype=~"ext4|xfs"}/node_filesystem_size_bytes{fstype=~"ext4|xfs"} )*100 > 95for: 1mlabels:severity: criticalannotations:summary: "Disk usage high"description: "Disk {{ $labels.mountpoint }} usage above 95%.(current usage: {{ $value }})"
  - alert: WinDiskUsageHighexpr: 100-(windows_logical_disk_free_bytes/windows_logical_disk_size_bytes)*100 > 95for: 1mlabels:severity: criticalannotations:summary: "Instance disk usage high"description: "Instance disk {{ $labels.volume }} usage above 95%.(current usage: {{ $value }}%)"

5、网络吞吐量

  - alert: HostUnusualNetworkThroughputInexpr: sum by (instance,device,region) (irate(node_network_receive_bytes_total[2m])) / 1024 / 1024 > 30for: 5mlabels:severity: warningannotations:summary: "Host unusual network throughput in"description: "Host network interfaces are receiving too much data (> 30 MB/s).(current speed:{{ $value }}MB/s)"
  - alert: WinHostUnusualNetworkThroughputInexpr: sum by (instance,nic,region) (irate(windows_net_bytes_received_total{nic=~".*VirtIO.*"}[2m])) / 1024 / 1024>30for: 5mlabels:severity: warningannotations:summary: "Host unusual network throughput in"description: "Host network interfaces are probably receiving too much data (> 30 MB/s).(current speed: {{ $value }})"
  - alert: HostUnusualNetworkThroughputOutexpr: sum by (instance,device,region) (irate(node_network_transmit_bytes_total[2m])) / 1024 / 1024 > 30for: 5mlabels:severity: warningannotations:summary: "Host unusual network throughput out"description: "Host network interfaces are sending too much data (> 30 MB/s).(current speed:{{ $value }}MB/s)"

6、TCP连接

  - alert: TCPEstablishedNumexpr: node_netstat_Tcp_CurrEstab > 2000for: 1mlabels:severity: warningannotations:summary: "TCP established connect too many"description: "TCP establised connect count excess 2000.(current count: {{ $value }})"

7、服务器网络传输错误

  - alert: HostNetworkTransmitErrorsexpr: increase(node_network_transmit_errs_total[5m]) > 2for: 5mlabels:severity: warningannotations:summary: "Host Network Transmit Errors"#description: "{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%v" $value }} transmit errors in the last five minutes.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"description: "Interface {{ $labels.device }} has transmit errors in the last five minutes.(current error packages:{{ $value }})"

8、磁盘读写延迟

  - alert: HostUnusualDiskReadLatencyexpr: rate(node_disk_read_time_seconds_total[1m]) / rate(node_disk_reads_completed_total[1m]) * 1000 > 100for: 5mlabels:severity: warningannotations:summary: "Host unusual disk read latency"description: "Disk read latency is growing (read operations > 100ms).(current latency: {{ $value }}ms)"
  - alert: HostUnusualDiskWriteLatencyexpr: rate(node_disk_write_time_seconds_total[1m]) / rate(node_disk_writes_completed_total[1m]) * 1000 > 100for: 5mlabels:severity: warningannotations:summary: "Host unusual disk write latency"description: "Disk write latency is growing (write operations > 100ms).(current latency: {{ $value }}ms)"

9、磁盘IO过高

  - alert: DiskIOTimePerSecexpr: irate(node_disk_io_time_seconds_total[1m])*100 > 60for: 2mlabels:severity: warning annotations:summary: "Host disk io time high"description: "Disk {{ $labels.device }} io time occupy above 60% (current rate: {{ $value }})"

prometheus监控常用告警规则相关推荐

  1. Prometheus监控以及告警配置

    Prometheus监控 Prometheus简介 Prometheus是一套开源的系统监控报警框架.Prometheus作为新一代的云原生监控系统,相比传统监控监控系统(Nagios或者Zabbix ...

  2. 实用干货丨如何使用Prometheus配置自定义告警规则

    前 言 Prometheus是一个用于监控和告警的开源系统.一开始由Soundcloud开发,后来在2016年,它迁移到CNCF并且称为Kubernetes之后最流行的项目之一.从整个Linux服务器 ...

  3. Prometheus监控告警规则

    Prometheus监控MongoDB报警规则.MySQL报警规则.Nginx报警规则.Redis报警规则. MongoDB报警规则 报警名称 表达式 采集数据时间(分钟) 报警触发条件 Mongod ...

  4. prometheus监控+告警

    1 开始安装前的准备 1.1 修改主机名 1.2 关闭防火墙 1.3 关闭seliunx 1.4 关闭防火墙 1.5 下载阿里云的yum源 2 下载所用到的包 2.1 安装 node_porter 2 ...

  5. Prometheus 监控报警系统 AlertManager 之邮件告警

    文章目录 1.Prometheus & AlertManager 介绍 2.环境.软件准备 3.启动并配置 Prometheus 3.1.Docker 启动 Prometheus 3.2.Do ...

  6. Prometheus告警规则

    完整译文请访问:http://www.coderdocument.com/docs/prometheus/v2.14/prometheus/configuration/alerting_rules.h ...

  7. prometheus告警规则管理

    微型公众号:运维开发故事,作者:夏老师 什么是Rule Prometheus支持用户自定义Rule规则. Rule分为两类,一类是Recording Rule,另一类是Alerting Rule.Re ...

  8. 【Prometheus】Alertmanager告警全方位讲解

    Prometheus告警简介 告警能力在Prometheus的架构中被划分成两个独立的部分.如下所示,通过在Prometheus中定义AlertRule(告警规则),Prometheus会周期性的对告 ...

  9. Prometheus 监控

    Prometheus 企业监控 一.介绍 本文介绍Prometheus 监控及在k8s集群中使用node-exporter.prometheus.grafana对集群进行监控.实现原理类似ELK.EF ...

最新文章

  1. Android基于mAppWidget实现手绘地图(四)--如何附加javadoc
  2. 乐山师范计算机科学与技术怎么样,乐山师范学院计算机科学与技术(本科)教育概况...
  3. [持续更新][小工具]计算器
  4. pdf保存如何带批注_带有批注的SpringSelenium测试
  5. Spring Security 示例UserDetailsS​​ervice
  6. 20140213-面向对象技术概论
  7. Hive jdbc执行seelct 语句时报 return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
  8. ZABBIX Agent2监控docker
  9. BZOJ 1597 [Usaco2008 Mar]土地购买 (斜率优化dp)
  10. 利用cloudera manager搭建Hadoop集群
  11. 三菱modbusRTU通讯实例_modbus通讯协议详解 | 每位工控人都应该了解
  12. 推荐一款开源游戏开发平台
  13. [转载]Wifi OKC 验证
  14. Ubuntu/Jetson Nano问题解决“Illegal instruction(core dumped)”
  15. Shine 和 8hz-mp3
  16. Matplotlib中画图,使用带有边框的条线
  17. WPS2019 For Ubuntu
  18. AVFrame结构体中变量解释
  19. Qt学习之信号与槽函数断开:disconnect
  20. 2018 Mossad Challenge

热门文章

  1. HTML中的幽灵节点
  2. JavaScript--AJAX页面传值
  3. ​【​观察】云栖大会共话JDM模式 揭秘创新背后的价值和启示
  4. 物联网的好处_物联网的应用前景
  5. 欧姆龙气压传感器 2SMPB-02E程序编写
  6. 如何在计算机桌面恢复我的电脑,桌面上我的电脑图标不见了怎么恢复?桌面计算机图标不见了的3个解决方法...
  7. 海信85u8e和海信85e7f有什么区别 哪个好详细性能配置对比
  8. 【黄啊码】MySQL入门—5、数据库小技巧:单个列group by就会,多个列呢?
  9. LAMP+haproxy+varnish实现网站访问的动静分离及静态资源缓存
  10. ntfs格式分区是什么意思