目录

一、Promethues. 1

二、Grafana. 2

三、环境准备... 2

四、安装Prometheus与Grafana. 3

五、配置Prometheus与Grafana. 4

1、配置prometheus. 4

2、配置grafana. 6

六、监控机器硬件资源... 7

七、监控基础服务... 9

1、监控NGINX. 9

2、监控MYSQL. 11

3、监控 REDIS. 16

八、监控应用程序... 18

九、监控业务接口数据... 19

十、告警设置... 21

1、设置告警方式... 21

2、设置告警规则... 24

①/data/prometheus/rules/node.yml 24

② /data/prometheus/rules/redis.yml 26

③/data/prometheus/rules/mysql.yml 27

④/data/prometheus/rules/nginx.yml 29

使用Prometheus+Grafana搭建监控系统,主要监控内容包括机器硬件资源,基础服务,应用程序,业务接口数据。

一、Promethues

Prometheus - Monitoring system & time series database

Prometheus是一个开源的服务监控系统和时间序列数据库。Prometheus生态系统由多个组件组成,包括负责数据采集和存储并提供PromQL查询语言支持的Prometheus Server,提供多语言的客户端SDK,支持临时性Job主动推送指标的中间网关Push Gateway,数据采集组件Exporter,它负责从目标处搜集数据,并将其转化为Prometheus支持的格式以及提供告警功能的Alertmanager。

Promethues Exporter组件与传统的数据采集组件不同的是它并不向中央服务器发送数据,而是等待中央服务器主动前来抓取,prometheus提供多种类型的exporter用于采集各种不同服务的运行状态。

Promethues官网提供的生态系统架构图:

二、Grafana

Grafana: The open observability platform | Grafana Labs

Grafana是一个跨平台的开源的度量分析和可视化工具,支持从多种数据源(如prometheus)获取数据进行可视化数据展示。

三、环境准备

&&&监控服务器的硬件资源:cpu 8核、内存32G、磁盘250G、网卡。

Prometheus及其它监控服务部署安排如下:

1.查看操作系统版本

[root@node64 ~]# cat /etc/redhat-release

CentOS Linux release 7.1.1503 (Core)

[root@node64 ~]# getconf LONG_BIT

64

2.查看网卡IP及配置

[root@node64 ~]# ip a

eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

link/ether fa:16:3e:a3:89:25 brd ff:ff:ff:ff:ff:ff

inet 192.168.0.91/24 brd 192.168.0.255 scope global eth1

valid_lft forever preferred_lft forever

inet6 fe80::f816:3eff:fea3:8925/64 scope link

valid_lft forever preferred_lft forever

[root@node64 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1

NAME=eth1

TYPE=Ethernet

BOOTPROTO=dhcp

DEVICE=eth1

ONBOOT=yes

IPV4_ROUTE_METRIC=100

四、安装Prometheus与Grafana

1.创建安装目录   mkir /data/prometheus

groupadd prometheus

useradd -g prometheus -s /sbin/nologin prometheus

chown -R prometheus:prometheus prometheus

2.下载prometheus,grafana安装包及prometheus插件包,

包括node_exporter、mysqld_exporter、nginx-vts-exporter、redis_exporter、alertmanager

Prometheus安装包参考如下:

wget https://github.com/prometheus/prometheus/releases/download/v2.20.0/prometheus-2.20.0.linux-amd64.tar.gz

Grafana安装包参考如下:

wget https://dl.grafana.com/oss/release/grafana-7.1.1.linux-amd64.tar.gz

3.解压安装

五、配置Prometheus与Grafana

1、配置prometheus

a.修改配置文件prometheus.yml

vi /data/prometheus/prometheus/prometheus.yml

scrape_configs:

metrics_path: /prometheus/metrics

static_configs:

- targets: ['192.168.0.91:9090']

b.检查配置文件

/data/prometheus/prometheus

[root@node64 prometheus]# ./promtool check config prometheus.yml

注:使prometheus配置生效 pgrep -fl  prometheus

c.将prometheus注册为系统服务

[root@node64 prometheus]# cat /usr/lib/systemd/system/prometheus.service

[Unit]

Description=prometheus

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/prometheus/prometheus/prometheus --web.external-url=prometheus   --web.enable-admin-api --config.file=/data/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus/prometheus/data --storage.tsdb.retention=15d --log.level=info --web.enable-lifecycle

Restart=on-failure

[Install]

WantedBy=multi-user.target

d.启动并查看prometheus服务

systemctl enable prometheus

systemctl start prometheus

systemctl status prometheus

[root@node64 prometheus]# netstat -anp | grep 9090

e. nginx转发prometheus服务

Prometheus 的 Node Exporter 并没有提供任何认证支持。不过,借助 Nginx 作为反向代理服务器,我们可以很容易地为 Node Exporter 添加 HTTP Basic Auth 功能。

yum -y install httpd

[root@node23 conf]# htpasswd -c .htpasswd_prometheus Prometheus

Iampwd

location /prometheus/ {

proxy_set_header X-Real-IP $remote_addr;

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_set_header Host $http_host;

proxy_set_header X-Nginx-Proxy true;

proxy_pass http://192.168.0.91:9090;

proxy_redirect off;

proxy_buffering off;

proxy_read_timeout 90;

proxy_send_timeout 90;

auth_basic "Prometheus";

auth_basic_user_file ".htpasswd";

}

f. web访问prometheus

https://***.com:9091/prometheus/

prometheus/Iampwd

2、配置grafana

a.修改配置文件default.ini

vi /data/prometheus/grafana/conf/defaults.ini

http_port = 3000

root_url = %(protocol)s://%(domain)s:%(http_port)s/grafana/

b.将grafana注册为系统服务

[root@node64 conf]# cat /usr/lib/systemd/system/grafana-server.service

[Unit]

Description=Grafana

After=network.target

[Service]

Type=notify

ExecStart=/data/prometheus/grafana/bin/grafana-server -homepath /data/prometheus/grafana

Restart=on-failure

[Install]

WantedBy=multi-user.target

c.启动并查看grafana服务

systemctl enable grafana-server

systemctl start grafana-server

systemctl status grafana-server

d. nginx转发grafana服务

location /grafana/ {

proxy_set_header X-Real-IP $remote_addr;

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_set_header Host $http_host;

proxy_set_header X-Nginx-Proxy true;

proxy_pass http://192.168.0.91:3000/;

proxy_redirect off;

proxy_buffering off;

proxy_read_timeout 90;

proxy_send_timeout 90;

}

e. web访问grafana并配置数据源

https://***.com:9091/grafana/

admin/Iampwd

数据源URL: http://192.168.0.91:9090/prometheus

六、监控机器硬件资源

Prometheus使用node_exporter插件监控机器硬件资源,prometheus主动从安装了node_exporter服务的网络互通的机器上抓取所需的数据

1.在需要监控的机器上安装node_exporter

2.将node_exporter注册为系统服务

[root@node64 conf]# cat /usr/lib/systemd/system/node_exporter.service

[Unit]

Description=node_exporter

Documentation=https://prometheus.io/

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/prometheus/node_exporter/node_exporter

Restart=on-failure

[Install]

WantedBy=multi-user.target

3.启动并查看node_exporter服务

systemctl enable node_exporter

systemctl start node_exporter

systemctl status node_exporter

4.修改prometheus.yml并重启prometheus服务

scrape_configs:

- job_name: 'node_exporter'

static_configs:

-targets:['192.168.0.91:9100','192.168.0.92:9100'…]

5.访问prometheus查看监控状态

6.引入grafana监控面板

Dashboards | Grafana Labs

node_exporter 8919

mysql_exporter 11323

mysql overview 7362

nginxvts 2949

redis 11835

七、监控基础服务

1、监控NGINX

Nginx通过nginx-module-vts模块获取nginx的某些指标数据, Prometheus再通过nginx-vts-exporter组件采集nginx信息。

a、在nginx服务器安装nginx-module-vts模块

  • https://github.com/vozlt/nginx-module-vts  ——(选择zip包下载)
  • 编译安装nginx

./configure --prefix= /data/nginx --with-http_gzip_static_module --with-http_stub_status_module --with-http_ssl_module --with-pcre --with-file-aio --with-http_realip_module --add-module=/data/nginx-module-vts

make && make install

  • 修改nginx.conf文件

http{

vhost_traffic_status_zone;

vhost_traffic_status_filter_by_host on;

location /status {

vhost_traffic_status_display;

vhost_traffic_status_display_format html;

}

}

b、在nginx服务器与prometheus服务器都下载安装nginx-vts-exporter插件

wget

https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.10.3/nginx-vts-exporter-0.10.3.linux-amd64.tar.gz

c、在nginx服务器(192.168.0.71)将nginx-vts-exporter注册为系统服务

cat /etc/systemd/system/nginx-vts-exporter.service

[Unit]

Description=nginx_exporter

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/nginx-vts-exporter/nginx-vts-exporter -nginx.scrape_uri=https://公网IP:9091/status/format/json

Restart=on-failure

[Install]

WantedBy=multi-user.target

d、在prometheus服务器(192.168.0.91)将nginx-vts-exporter注册为系统服务

cat /etc/systemd/system/nginx-vts-exporter.service

[Unit]

Description=nginx_exporter

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/prometheus/nginx-vts-exporter/nginx-vts-exporter -nginx.scrape_uri=https://192.168.0.71:9091/status/format/json

Restart=on-failure

[Install]

WantedBy=multi-user.target

e、在nginx服务器与prometheus服务器启动并查看nginx-vts-exporter服务

systemctl enable nginx-vts-exporter

systemctl start nginx-vts-exporter

systemctl status nginx-vts-exporter

f、在prometheus服务器上修改配置文件prometheus.yml并重启prometheus服务

- job_name: 'nginx'

static_configs:

- targets: ['192.168.0.91:9913']

https://公网IP:9091/statusg、查看nginx页面监控

https://公网IP:9091/status

在grafana上导入nginx监控面板 nginx-vts-exporter 2949并查看面板

https://***.com:9091/grafana/d/5-RKCVxGk/nginx-vts-stats?orgId=1

2、监控MYSQL

Prometheus通过mysqld_exporter组件采集MySQL主从服务器的相关数据。

1)使用自动化脚本安装mysql后将mysql服务加入系统服务并设置开机自启

[root@centos7-min4 nginx]# cp /opt/mysql57/support-files/mysql.server /etc/rc.d/init.d/mysqld

chmod +x /etc/init.d/mysqld

chkconfig --add mysqld

chkconfig --list

# systemctl start mysqld

# systemctl status mysqld

[mysql@centos7-min4 nginx]$ mysql -uroot -p —— 123456

mysql> select version();

+------------+

| version() |

+------------+

| 5.7.24-log |

2)在prometheus服务器上安装mysqld_exporter组件

prometheus监控mysql主从服务器

  • 登录mysql为exporter创建账号并授权

create user 'exporter'@'192.168.0.%' identified by 'Abc123';

grant process,replication client,select on *.* to 'exporter'@'192.168.0.%';

flush privileges;

  • 在Prometheus服务器上安装mysqld_exporter服务,同时监控mysql主从服务

ls -al /data/prometheus/mysqld_exporter/

.my-master.cnf

.my-slave.cnf

root@node64 mysqld_exporter]# cat .my-master.cnf

[client]

user=exporter

password=Abc123

host=192.168.0.92

port=3306

[root@node64 mysqld_exporter]# cat .my-slave.cnf

[client]

user=exporter

password=Abc123

host=192.168.0.93

port=3306

  • 启动mysqld_exporter服务

为mysql主从服务分别启动一个服务:

Mysql主服务启动

/data/prometheus/mysqld_exporter/mysqld_exporter --web.listen-address=192.168.0.91:9104 --config.my-cnf=/data/prometheus/mysqld_exporter/.my-master.cnf --collect.auto_increment.columns --collect.binlog_size --collect.global_status --collect.engine_innodb_status --collect.global_variables --collect.info_schema.innodb_metrics --collect.info_schema.innodb_tablespaces --collect.info_schema.innodb_cmp --collect.info_schema.innodb_cmpmem --collect.info_schema.processlist --collect.info_schema.query_response_time --collect.info_schema.tables --collect.info_schema.tablestats --collect.info_schema.userstats --collect.perf_schema.eventswaits --collect.perf_schema.file_events --collect.perf_schema.indexiowaits --collect.perf_schema.tableiowaits --collect.perf_schema.tablelocks

Mysql从服务启动

/data/prometheus/mysqld_exporter/mysqld_exporter --web.listen-address=192.168.0.91:9105 --config.my-cnf=/data/prometheus/mysqld_exporter/.my-slave.cnf

(注:其余参数请与上面Mysql主服务启动保持一致)

3)修改prometheus配置文件信息并重启prometheus

prometheus.yml

- job_name: 'mysql_exporter'

static_configs:

#  - targets: ['192.168.0.92:9104','192.168.0.93:9104']

- labels:

instance: master:3306 # grafana显示的实例的别名

- targets:

- 192.168.0.91:9104 # mysqld_exporter暴露的端口

- labels:

instance: slave:3306 # grafana显示的实例的别名

- targets:

- 192.168.0.91:9105 # mysqld_exporter暴露的端口

4)查看prometheus与grafana面板的mysql数据,导入mysql监控面板

mysql_exporter 11323

mysql overview 7362

PS: mysql同步故障处理: Slave_SQL_Running: No

分析:造成mysql数据同步失败的原因

  1. 程序可能在slave上进行了写操作
  2. 可能是slave机器重启后,事务回滚造成的

处理办法:首先停掉slave服务,查看主服务器上主机状态,并根据File和Position对应的值同步到从服务器,最后启动slave服务,查看同步状态

主服务器:

mysql> show master status;

从服务器:

mysql> stop slave;

Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql>

mysql> change master to master_host='192.168.0.92',

-> master_user='repl',

-> master_password='123456',

-> master_log_file='mysql-bin-T-prod-3306.000005',

-> master_log_pos=653020;

Query OK, 0 rows affected, 2 warnings (0.05 sec)

mysql> start slave;

3、监控 REDIS

使用redis_exporter组件监控三主三从的redis集群。

1)使用自动化脚本安装redis三主三从集群

2)在prometheus服务器下载并安装redis_exporter服务

wget https://github.com/oliver006/redis_exporter/releases/download/v1.3.5/redis_exporter-v1.3.5.linux-amd64.tar.gz

3)监控redis集群中其中一台服务器即可监控整个已创建好的集群

cd /data/prometheus/redis_exporter

./redis_exporter -redis.addr 192.168.0.93:7000 -redis.password 'zxcvb123' &

4)修改prometheus配置文件信息并重启prometheus

- job_name: 'redis_exporter_targets'

static_configs:

- targets:

- redis://192.168.0.3:7000

- redis://192.168.0.2:7003

- redis://192.168.0.72:7002

- redis://192.168.0.35:7001

- redis://192.168.0.14:7004

- redis://192.168.0.13:7005

metrics_path: /scrape

relabel_configs:

- source_labels: [__address__]

target_label: __param_target

- source_labels: [__param_target]

target_label: instance

- target_label: __address__

replacement: 192.168.0.91:9121

- job_name: 'redis_exporter'

static_configs:

- targets:

- 192.168.0.91:9121

5)查看prometheus与grafana面板数据,导入面板redis 11835

八、监控应用程序

Prometheus使用node_exporter组件监控应用程序

1)在需要监控的应用程序服务器上安装process-exporter组件

wget

https://github.com/ncabatoff/process-exporter/releases/download/v0.5.0/process-exporter-0.5.0.linux-amd64.tar.gz

2)配置应用程序监控信息

process-conf.yml

3)启动应用程序监控服务并指定配置文件

./process-exporter -config.path  process-conf.yml &

4)修改prometheus配置信息并重启prometheus

[root@node64 prometheus]# vi prometheus.yml

- job_name: process

static_configs:

- targets: ['192.168.0.35:9256','192.168.0.13:9256'…]

5)查看prometheus与grafana面板数据

process-exporter对应的dashboard为:Named processes | Grafana Labs

九、监控业务接口数据

根据开发提供的业务监控接口数据配置grafana数据展示。

https://***.com:9092/api

  • 注意:

1.Grafana不保存普罗米修斯的数据。它查询Prometheus并显示UI。在这种情况下,您必须查看清除普罗米修斯数据。

Prometheus默认具有15天的保留期。但这可以通过-storage.local.retention标志进行调整以满足您的需求

2.prometheus会把过期数据置NaN,sum()求和函数不支持NaN,需要调整metric:

sum(st_invoke_count{app_id=~'$appid',road_type='1'}>0)

3.Grafana全局变量

4.Grafana菜单级联

十、告警设置

1、设置告警方式

> Prometheus

Prometheus通过组件alertmanager实现告警,Alertmanager接收prometheus发送的告警并对告警进行一系列的处理后发送给指定的用户。

prometheus--->触发阈值--->超出持续时间--->alertmanager--->分组|抑制|静默--->媒体类型--->邮件、钉钉、微信等。

1)安装alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.16.2/alertmanager-0.16.2.linux-amd64.tar.gz

2)修改alertmanager配置文件信息alertmanager.yml

3)开启smtp服务

4)配置告警通知模板

[root@centos7-min4 alertmanager-0.21.0]# cat template/test.tmpl

{{ define "test.html" }}

<table border="5">

<tr>

<td>报警项</td>

<td>事项</td>

<td>报警阀值</td>

<td>开始时间</td>

</tr>

{{ range $i, $alert := .Alerts }}

<tr>

<td>{{ index $alert.Labels "alertname" }}</td>

<td>{{ index $alert.Labels "instance" }}</td>

<td>{{ index $alert.Annotations "value" }}</td>

<td>{{ $alert.StartsAt }}</td>

</tr>

{{ end }}

</table>

{{ end }}

5)启动alertmanager服务

(1)指定配置文件启动

[root@centos7-min4 alertmanager-0.21.0]# ./alertmanager --config.file=alertmanager.yml &

(2)配置成系统服务启动

[root@centos7-min4 alertmanager-0.21.0]# cat /usr/lib/systemd/system/alertmanager.service

[Unit]

Description=https://prometheus.io

[Service]

Restart=on-failure

ExecStart=/opt/alertmanager-0.21.0/alertmanager --config.file=/opt/alertmanager-0.21.0/alertmanager.yml

[Install]

WantedBy=multi-user.target

systemctl enable alertmanager

systemctl start alertmanager

systemctl status alertmanager

6)修改prometheus配置文件并重启prometheus

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

- 192.168.0.91:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

- "/data/prometheus/rules/node.yml"

- "/data/prometheus/rules/redis.yml"

- "/data/prometheus/rules/mysql.yml"

- "/data/prometheus/rules/nginx.yml"

- "/data/prometheus/rules/service-api.yml"

> Grafana

2、设置告警规则

①/data/prometheus/rules/node.yml

groups:

- name: NodeProcess

rules:

- alert: NodeStatus

expr: up == 0

for: 1m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}:服务器宕机"

description: "{{$labels.instance}}:服务器延时超过5分钟"

- alert: NodeFilesystemUsage

expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80

for: 2m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: {{$labels.mountpoint }} 分区使用过高"

description: "{{$labels.instance}}: {{$labels.mountpoint }} 分区使用大于 80% (当前值: {{ $value }})"

- alert: NodeMemoryUsage

expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 80

for: 2m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: 内存使用过高"

description: "{{$labels.instance}}: 内存使用大于 80% (当前值: {{ $value }})"

- alert: NodeCPUUsage

expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 80

for: 2m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: CPU使用过高"

description: "{{$labels.instance}}: CPU使用大于 80% (当前值: {{ $value }})"

- alert: LoadCPU

expr: node_load5 > 5

for: 2m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: 负载过高"

description: "{{$labels.instance}}: 5分钟内平均负载超过5 (当前值: {{ $value }})"

- alert: DiskIORead

expr: irate(node_disk_read_bytes_total{device="sda"}[1m]) > 30000000

for: 1m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: I/O读负载过高"

description: "{{$labels.instance}}: I/O 每分钟读已超过 30MB/s (当前值: {{ $value }})"

- alert: DiskIOWrite

expr: irate(node_disk_written_bytes_total{device="sda"}[1m]) > 30000000

for: 1m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: I/O写负载过高"

description: "{{$labels.instance}}: I/O 每分钟写已超过 30MB/s (当前值: {{ $value }})"

- alert: 流入网络带宽

expr: ((sum(rate (node_network_receive_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 18432

for: 1m

labels:

status: warning

annotations:

summary: "{{$labels.mountpoint}} 流入网络带宽过高!"

description: "{{$labels.mountpoint }}流入网络带宽持续5分钟高于18M. RX带宽使用率{{$value}}"

- alert: 流出网络带宽

expr: ((sum(rate (node_network_transmit_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 18432

for: 1m

labels:

status: warning

annotations:

summary: "{{$labels.mountpoint}} 流出网络带宽过高!"

description: "{{$labels.mountpoint }}流出网络带宽持续5分钟高于18M. RX带宽使用率{{$value}}"

- alert: 网络连接数

expr: node_sockstat_TCP_inuse > 240

for: 1m

labels:

status: warning

annotations:

summary: "{{$labels.mountpoint}} 连接数过高!"

description: "{{$labels.mountpoint }}当前连接数{{$value}}"

② /data/prometheus/rules/redis.yml

groups:

- name:  Redis

rules:

- alert: RedisDown

expr: redis_up  == 0

for: 5m

labels:

severity: warning

annotations:

summary: "Redis down (instance {{ $labels.instance }})"

description: "Redis集群节点故障\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

- alert: OutOfMemory

expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90

for: 5m

labels:

severity: warning

annotations:

summary: "Out of memory (instance {{ $labels.instance }})"

description: "Redis is running out of memory (> 90%)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

- alert: ReplicationBroken

expr: delta(redis_connected_slaves[1m]) < 0

for: 5m

labels:

severity: warning

annotations:

summary: "Replication broken (instance {{ $labels.instance }})"

description: "Redis instance lost a slave\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

- alert: TooManyConnections

expr: redis_connected_clients > 1000

for: 5m

labels:

severity: warning

annotations:

summary: "Too many connections (instance {{ $labels.instance }})"

description: "Redis instance has too many connections\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

- alert: RejectedConnections

expr: increase(redis_rejected_connections_total[1m]) > 0

for: 5m

labels:

severity: warning

annotations:

summary: "Rejected connections (instance {{ $labels.instance }})"

description: "Some connections to Redis has been rejected\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

- alert: AofSaveStatus

expr: redis_aof_last_bgrewrite_status < 1

for: 5m

lables:

serverity: warning

annotations:

summary: "Missing backup (instance {{ $labels.instance }})"

description: "Redis AOF persistence failed\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

③/data/prometheus/rules/mysql.yml

groups:

- name: MySQL

rules:

- alert: MySQL Status

expr: mysql_up == 0

for: 5s

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: MySQL has stop !!!"

description: "检测MySQL数据库运行状态"

- alert: MySQL Slave IO Thread Status

expr: mysql_slave_status_slave_io_running != 1

for: 5s

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: MySQL Slave IO Thread has stop !!!"

description: "检测MySQL主从IO线程运行状态"

- alert: MySQL Slave SQL Thread Status

expr: mysql_slave_status_slave_sql_running != 1

for: 5s

labels:

severity: warning

annotations:

summary: '{{$labels.instance}}: MySQL Slave SQL Thread has stop !!!'

description: "检测MySQL主从SQL线程运行状态"

- alert: MySQL Slave Delay Status

expr: mysql_slave_status_sql_delay == 30

for: 5s

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: MySQL Slave Delay has more than 30s !!!"

description: "检测MySQL主从延时状态"

- alert: Mysql_Too_Many_Connections

expr: rate(mysql_global_status_threads_connected[5m]) > 200

for: 2m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: 连接数过多"

description: "{{$labels.instance}}: 连接数过多,请处理 ,(current value is: {{ $value }})"

④/data/prometheus/rules/nginx.yml

groups:

- name: nginx

rules:

- alert: Nginx Status

expr: up{instance="192.168.0.91:9913",job="nginx"} == 0

for: 5s

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: Nginx has stop !!!"

description: "检测Nginx运行状态异常"

Prometheus+Grafana监控系统相关推荐

  1. Grafana监控系统之Prometheus+Grafana监控系统搭建

    Grafana监控系统之Prometheus+Grafana监控系统搭建 本文章内容较长,可通过右上角点击目录快速定位想看的内容 => => 一. 概述 1.1 Grafana介绍 Gra ...

  2. 搭建prometheus+grafana监控系统

    prometheus简介 Prometheus是最初在SoundCloud上构建的开源系统监视和警报工具包 .自2012年成立以来,许多公司和组织都采用了Prometheus,该项目拥有非常活跃的开发 ...

  3. 万字好文!Docker环境部署Prometheus+Grafana监控系统

    点击上方 "编程技术圈"关注, 星标或置顶一起成长 后台回复"大礼包"有惊喜礼包! 每日英文 It doesn't matter how slow you ar ...

  4. k8s prometheus/grafana 监控系统建设

    全栈工程师开发手册 (作者:栾鹏) 架构系列文章 prometheus架构 其中 1.pushgateway是用来接收业务推送的数据形成metrics接口. 2.exporter是用来监控组件(三方中 ...

  5. Prometheus 搭建监控系统

    Prometheus 是一款基于时序数据库的开源监控告警系统,说起 Prometheus 则不得不提 SoundCloud,这是一个在线音乐分享的平台,类似于做视频分享的 YouTube,由于他们在微 ...

  6. 基于 Prometheus 的监控系统实践

    监控作为底层基础设施的一环,是保障生产环境服务稳定性不可或缺的一部分,线上问题从发现到定位再到解决,通过监控和告警手段可以有效地覆盖了「发现」和「定位」,甚至可以通过故障自愈等手段实现解决,服务开发和 ...

  7. Prometheus+Grafana 监控 MySQL

    Prometheus 获取 MySQL 的监控数据,并通过 Grafana 展示的过程.首先来看整体架构图: 1 架构图 如上图,通过 mysql_exporter 获取 MySQL 的监控数据,通过 ...

  8. 十一、Grafana监控系统

    @Author : By Runsen @Date : 2020/6/19 作者介绍:Runsen目前大三下学期,专业化学工程与工艺,大学沉迷日语,Python, Java和一系列数据分析软件.导致翘 ...

  9. 使用Prometheus+Grafana 监控MySQL/MONGODB

    使用Prometheus+Grafana 监控MySQL/MONGODB 之前就久仰 Prometheus 大名,因为有用zabbix 进行监控,就没去安装它.现在正好用上监控MONGO+MYSQL. ...

  10. 基于Prometheus+Grafana监控SQL Server数据库

    墨墨导读:本文整理了基于Prometheus+Grafana监控SQL Server数据库的全过程,分享至此,希望对大家有帮助. 搭建SQL Server环境 使用容器建立SQL Server环境非常 ...

最新文章

  1. 模板 - Min_25 筛法求素数和
  2. Tungsten Fabric SDN — Orchestrator 集成部署模式 — with Kubernetes
  3. java JFrame 设置背景颜色无效的理解
  4. 在mysql-workbench的存储过程中使用循环while,repeat,loop
  5. 【C++深度剖析教程4】C++的二阶构造模式
  6. 为什么DevOps的必然趋势是BizDevOps
  7. Flash of Unstyled Content (FOUC)
  8. python接口自动化(二十三)--unittest断言——上(详解)
  9. delphi FastReport 安装方法
  10. Codeforces1005E1 - Median on Segments (Permutations Edition)(中位数计数)
  11. 移动硬盘打开文件突然变得很卡
  12. 嵌入式就业去哪?中国10大军工央企,1000+单位!史上最全!
  13. OSChina 周四乱弹 ——你想用小鱼干拐走本公主么
  14. C#中Action和=>用法(入门)
  15. java入门之 ArrayList类(动态数组)
  16. java处理保留小数点后几位
  17. U8 存货期初余额 货位
  18. 论文笔记-DeepLung: Deep 3D Dual Path Nets for Automated Pulmonary Nodule Detection and Classification
  19. CANOE学习(3)
  20. STM32入门之点亮LED灯

热门文章

  1. 4 Three.js一个案例详解
  2. php设置图片文字布局,Word如何排版图片和文字
  3. jQuery设计思想
  4. pm2 start 带参数_pm2 start命令进阶详解
  5. 大事务的处理方式对比
  6. 【数据分析】快速获取微博用户数据,图片,视频
  7. 从单体式架构迁移到微服务架构,简述java程序的运行原理
  8. [1] Instances as Queries
  9. Android开发基础之控件ImageView及第三方库Glide的使用
  10. hb100 微波雷达arduino_【Arduino】108种传感器系列实验(03)-微波雷达感应开关...-Arduino中文社区 - Powered by Discuz!...