1 进行时间同步

实现报警前把所有机器时间同步再检查一遍.

ntpdate cn.ntp.org.cn

2 Linux部署

第一步：下载安装包

下载安装包：alertmanager-0.16.2.linux-amd64.tar.gz
链接：https://pan.baidu.com/s/1kRDIZ8zPByhjs11JP30e5A
提取码：l3i1

第二步：上传压缩包解压到特定的文件夹

[root@localhost ~]# mv alertmanager-0.16.2.linux-amd64.tar.gz /opt/prometheus/
[root@localhost ~]# cd /opt/prometheus/
[root@localhost prometheus]# ls
alertmanager-0.16.2.linux-amd64.tar.gz  prometheus-2.6.1.linux-amd64
grafana-5.3.4-1.x86_64.rpm              prometheus-2.6.1.linux-amd64.tar.gz
[root@localhost prometheus]# tar -zxvf alertmanager-0.16.2.linux-amd64.tar.gz
alertmanager-0.16.2.linux-amd64/
alertmanager-0.16.2.linux-amd64/LICENSE
alertmanager-0.16.2.linux-amd64/alertmanager.yml
alertmanager-0.16.2.linux-amd64/alertmanager
alertmanager-0.16.2.linux-amd64/amtool
alertmanager-0.16.2.linux-amd64/NOTICE
[root@localhost prometheus]#
[root@localhost prometheus]# ls
alertmanager-0.16.2.linux-amd64         prometheus-2.6.1.linux-amd64
alertmanager-0.16.2.linux-amd64.tar.gz  prometheus-2.6.1.linux-amd64.tar.gz
grafana-5.3.4-1.x86_64.rpm
[root@localhost prometheus]# mv alertmanager-0.16.2.linux-amd64 alertmanager
[root@localhost prometheus]# ls
alertmanager                            prometheus-2.6.1.linux-amd64
alertmanager-0.16.2.linux-amd64.tar.gz  prometheus-2.6.1.linux-amd64.tar.gz
grafana-5.3.4-1.x86_64.rpm
[root@localhost prometheus]#

查看是否安装成功

[root@localhost alertmanager]# ./alertmanager --version
alertmanager, version 0.16.2 (branch: HEAD, revision: 308b7620642dc147794e6686a3f94d1b6fc8ef4d)build user:       root@1e9a48272b38build date:       20190405-12:27:40go version:       go1.11.6
[root@localhost alertmanager]#

第三步：启动alertManager

启动 AlertManager 来接受 Prometheus 发送过来的报警信息，并执行各种方式的告警。

在alertmanager的安装目录下执行：

[root@localhost alertmanager]# ./alertmanager --config.file=alertmanager.yml

AlertManager 默认启动的端口为 9093，启动完成后，浏览器访问 http://<IP>:9093可以看到默认提供的 UI 页面，因为我们还没有配置报警规则来触发报警，所有现在是没有任何告警信息的，

3 配置告警信息

查看目录结构

[root@localhost prometheus]# cd alertmanager/
[root@localhost alertmanager]# ls
alertmanager  alertmanager.yml  amtool  LICENSE  NOTICE
[root@localhost alertmanager]# ll
总用量 38964
-rwxr-xr-x. 1 3434 3434 23072841 4月   5 2019 alertmanager
-rw-r--r--. 1 3434 3434      380 4月   5 2019 alertmanager.yml
-rwxr-xr-x. 1 3434 3434 16801752 4月   5 2019 amtool
-rw-r--r--. 1 3434 3434    11357 4月   5 2019 LICENSE
-rw-r--r--. 1 3434 3434      457 4月   5 2019 NOTICE
[root@localhost alertmanager]#

3.1 查看默认配置

[root@localhost alertmanager]# cat alertmanager.yml
global:resolve_timeout: 5mroute:group_by: ['alertname']group_wait: 10sgroup_interval: 10srepeat_interval: 1hreceiver: 'web.hook'
receivers:
- name: 'web.hook'webhook_configs:- url: 'http://127.0.0.1:5001/'
inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']

3.2 其主要配置的作用

global: 全局配置

包括报警解决后的超时时间、SMTP 相关配置、各种渠道通知的 API 地址等等。

route: 用来设置报警的分发策略，它是一个树状结构，按照深度优先从左向右的顺序进行匹配。

receivers: 配置告警消息接受者信息，

例如常用的 email、wechat、slack、webhook 等消息通知方式。

inhibit_rules: 抑制规则配置，当存在与另一组匹配的警报（源）时，抑制规则将禁用与一组匹配的警报（目标）。

3.3 邮件告警的配置

配置告警信息：配置详情

global:resolve_timeout: 5m   # 超时,默认5min#这里为 QQ 邮箱 SMTP 服务地址，官方地址为 smtp.qq.com 端口为 465 或 587，同时要设置开启 POP3/SMTP 服务。smtp_smarthost: 'smtp.qq.com:465'   smtp_from: 'xxx@qq.com'smtp_auth_username: 'xxx@qq.com'smtp_auth_password: 'xxxxxx' # 这里是邮箱的授权密码，不是登录密码smtp_require_tls: false  # 是否使用 tls，根据环境不同，来选择开启和关闭。#如果提示报错 email.loginAuth failed: 530 Must issue a STARTTLS command first，那么就需要设置为 true。#如果开启了 tls，提示报错 starttls failed: x509: certificate signed by unknown authority，需要在 email_configs 下配置 insecure_skip_verify: true 来跳过 tls 验证。smtp_hello: 'qq.com'route:   # route用来设置报警的分发策略group_by: ['alertname']  # 采用哪个标签来作为分组依据# 组告警等待时间。也就是告警产生后等待5s，如果有同组告警一起发出group_wait: 5s   group_interval: 5s  # 两组告警的间隔时间repeat_interval: 5m  # 重复告警的间隔时间，减少相同邮件的发送频率receiver: 'email'  # 设置默认接收人receivers:  # 配置报警信息接收者信息。
- name: 'email'   # 警报接收者名称email_configs:# 接收警报的email（这里是引用模板文件中定义的变量）- to: 'xxxxxxxx@qq.com' send_resolved: true    # 故障恢复后通知
# 抑制规则配置，当存在与另一组匹配的警报（源）时，抑制规则将禁用与一组匹配的警报（目标）。
inhibit_rules:  - source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']

3.4 告警的具体操作

[root@localhost alertmanager]# vim alertmanager.yml
global:resolve_timeout: 5msmtp_from: '15***775@qq.com'smtp_smarthost: 'smtp.qq.com:465'smtp_auth_username: '154***75@qq.com'smtp_auth_password: 'y***bhjhi'smtp_require_tls: false
route:group_by: ['alertname']group_wait: 5sgroup_interval: 5srepeat_interval: 5mreceiver: 'email'
receivers:
- name: 'email'email_configs:- to: '154***5@qq.com'send_resolved: true
inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']
~
"alertmanager.yml" 25L, 566C 已写入
[root@localhost alertmanager]#

3.5 使用amtool工具检查配置

修改好配置文件后，可以使用amtool工具检查配置

[root@localhost alertmanager]# ./amtool check-config  alertmanager.yml
Checking 'alertmanager.yml'  SUCCESS
Found:- global config- route- 1 inhibit rules- 1 receivers- 0 templates[root@localhost alertmanager]#

3.6 重新启动alert manager

[root@localhost alertmanager]# ./alertmanager --config.file=alertmanager.yml

4 Prometheus 配置 AlertManager 告警规则

在 Prometheus 配置 AlertManager 服务地址以及告警规则，新建报警规则文件 node-up.rules 如下：

4.1node-up.rules规则的设置

groups:
- name: node-uprules:- alert: node-upexpr: up{job="node-exporter"} == 0for: 15slabels:severity: 1team: nodeannotations:summary: "{{ $labels.instance }} 已停止运行超过 15s！"

4.2 具体操作

[root@localhost prometheus-2.6.1.linux-amd64]# ls
console_libraries  consoles  data  LICENSE  NOTICE  prometheus  prometheus.yml  promtool
[root@localhost prometheus-2.6.1.linux-amd64]# mkdir rules
[root@localhost prometheus-2.6.1.linux-amd64]# cd rules/
[root@localhost rules]# vim node-up.rules
groups:
- name: node-uprules:- alert: node-upexpr: up{job="agent1"} == 0for: 15slabels:severity: 1team: nodeannotations:summary: "{{ $labels.instance }} 已停止运行超过 15s！"
~
~
"node-up.rules" [新] 11L, 237C 已写入
[root@localhost rules]#

该 rules 目的是监测 node 是否存活，

expr ：为 PromQL 表达式验证特定节点 job=“agent1” 是否活着，
for ：表示报警状态为 Pending 后等待 15s 变成 Firing 状态，一旦变成 Firing 状态则将报警发送到 AlertManager，
labels 和 annotations 对该 alert 添加更多的标识说明信息，所有添加的标签注解信息，以及 prometheus.yml 中该 job 已添加 label 都会自动添加到邮件内容中

4.3 修改 prometheus.yml 配置文件，添加 rules 规则文件

[root@localhost ~]# cd /opt/prometheus/prometheus-2.6.1.linux-amd64/
[root@localhost prometheus-2.6.1.linux-amd64]# vim prometheus.yml
# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:- 192.168.156.133:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"- "/opt/prometheus/prometheus-2.6.1.linux-amd64/rules/*.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ['localhost:9090']- job_name: 'agent1'static_configs:- targets: ['192.168.156.133:9100']
"prometheus.yml" 32L, 1074C 已写入

4.4 重启Prometheus

"prometheus.yml" 32L, 1074C 已写入
[root@localhost prometheus-2.6.1.linux-amd64]# pkill prometheus
[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9090
[root@localhost prometheus-2.6.1.linux-amd64]# ./prometheus --config.file=prometheus.yml &

4.5 查看是否配置成功

按下面的操作，便会进入下面的界面

由此可知，我们配置成功了

4.6 告警状态有三种状态

Prometheus Alert 告警状态有三种状态： Inactive、Pending、Firing。

Inactive：非活动状态，表示正在监控，但是还未有任何警报触发。
Pending：表示这个警报必须被触发。由于警报可以被分组、压抑/抑制或静默/静音，所以等待验证，一旦所有的验证都通过，则将转到 Firing 状态。
Firing：将警报发送到 AlertManager，它将按照配置将警报发送给所有接收者。一旦警报解除，则将状态转到 Inactive，如此循环。

5 触发警报

定义的 rule 规则为监测 job="agent1" Node 是否活着，那么就可以停掉 node-exporter 服务来间接起到 Node Down 的作用，从而达到报警条件，触发报警规则。

查看配置信息，确定监控的节点端口等信息，进行对应的停止

[root@localhost prometheus-2.6.1.linux-amd64]# cat prometheus.yml
# my global config
global:scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"- "/opt/prometheus/prometheus-2.6.1.linux-amd64/rules/*.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ['localhost:9090']- job_name: 'agent1'static_configs: - targets: ['192.168.156.133:9100']

查看对应端口的进程


[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9100
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
node_expo 67601 root    3u  IPv6 629243      0t0  TCP *:jetdirect (LISTEN)
node_expo 67601 root    5u  IPv6 783547      0t0  TCP localhost.localdomain:jetdirect->localhost.localdomain:47248 (ESTABLISHED)
prometheu 76836 root   15u  IPv4 783055      0t0  TCP localhost.localdomain:47248->localhost.localdomain:jetdirect (ESTABLISHED)

停止node结点：agent1的进程

[root@localhost prometheus-2.6.1.linux-amd64]# kill 67601
[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9100
[root@localhost prometheus-2.6.1.linux-amd64]#

停止服务后，

等待 15s 之后可以看到 Prometheus target 里面 node-exproter 状态为 unhealthy 状态，
等待 15s 后，alert 页面由绿色 agent1 (0 active) Inactive 状态变成了黄色 node-up (1 active) Pending 状态，
继续等待 15s 后状态变成红色 Firing 状态，向 AlertManager 发送报警信息，此时 AlertManager 则按照配置规则向接受者发送邮件告警。

查看邮箱

重新启动node

[root@localhost node_export]# nohup ./node_exporter &
[2] 81062
[1]   已终止               nohup ./node_exporter
[root@localhost node_export]# nohup: 忽略输入并把输出追加到"nohup.out"[root@localhost node_export]#

会再次发一个邮件，如下

5 使用自定义模板发送

5.1 编写模板文件

在alert manager的安装目录里面新建应该template目录，这template目录里面编写模板文件
模板文件如下

{{ define "email.from" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to.html" }}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2019-08-04 16:58:15" }} <br>
=========end==========<br>
{{ end }}
{{ end }}

实际操作

[root@localhost alertmanager]#
[root@localhost alertmanager]# mkdir template
[root@localhost alertmanager]# ls
alertmanager  alertmanager.yml  amtool  data  LICENSE  NOTICE  template
[root@localhost alertmanager]# cd template/
[root@localhost template]# vim email1.tepl
{{ define "email.from" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to.html" }}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2019-08-04 16:58:15" }} <br>
=========end==========<br>
{{ end }}
{{ end }}
~
~
~
~
~
~
~
~
~
~
~
"email1.tepl" [新] 15L, 550C 已写入
[root@localhost template]#

5.2 新增alertmanager的配置文件进行测试

global:resolve_timeout: 5msmtp_from: '{{ template "email.from" . }}'smtp_smarthost: 'smtp.qq.com:465'smtp_auth_username: '{{ template "email.from" . }}'smtp_auth_password: 'ymbwwkcakpxbhjhi'smtp_require_tls: falsesmtp_hello: 'qq.com'
templates:-  '/opt/prometheus/alertmanager/template/email1.tmpl'
route:group_by: ['alertname']group_wait: 5sgroup_interval: 5srepeat_interval: 5mreceiver: 'email'
receivers:
- name: 'email'email_configs:- to: '{{ template "email.to" . }}'html: '{{ template "email.to.html" . }}'send_resolved: true
inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']

email.from、email.to、email.to.html 三种模板变量，可以在 alertmanager.yml 文件中直接配置引用
email.to.html 就是要发送的邮件内容，支持 Html 和 Text 格式，这里为了显示好看，采用 Html 格式简单显示信息。下
{{ range .Alerts }}是个循环语法，用于循环获取匹配的 Alerts 的信息，下边的告警信息跟上边默认邮件显示信息一样，只是提取了部分核心值来展示。

实际操作：

[root@localhost alertmanager]# ls
alertmanager  alertmanager.yml  amtool  data  LICENSE  NOTICE  template
[root@localhost alertmanager]# vim alertmanager1.yml
global:resolve_timeout: 5msmtp_from: '{{ template "email.from" . }}'smtp_smarthost: 'smtp.qq.com:465'smtp_auth_username: '{{ template "email.from" . }}'smtp_auth_password: 'ymbww****xbhjhi'smtp_require_tls: falsesmtp_hello: 'qq.com'
templates:- '/etc/alertmanager-tmpl/email.tmpl'
route:group_by: ['alertname']group_wait: 5sgroup_interval: 5srepeat_interval: 5mreceiver: 'email'
receivers:
- name: 'email'email_configs:- to: '{{ template "email.to" . }}'html: '{{ template "email.to.html" . }}'send_resolved: true
inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']"alertmanager1.yml" [新] 29L, 719C 已写入
[root@localhost alertmanager]# ls
alertmanager  alertmanager1.yml  alertmanager.yml  amtool  data  LICENSE  NOTICE  template
[root@localhost alertmanager]#

5.3 查看配置文件是否正确

[root@localhost alertmanager]# ./amtool check-config  alertmanager1.yml
Checking 'alertmanager1.yml'  SUCCESS
Found:- global config- route- 1 inhibit rules- 1 receivers- 1 templatesSUCCESS[root@localhost alertmanager]#

5.4 启动alert manager

[root@localhost alertmanager]# ./alertmanager --config.file=alertmanager1.yml
level=info ts=2022-02-10T03:24:22.679397949Z caller=main.go:177 msg="Starting Alertmanager" version="(version=0.16.2, branch=HEAD, revision=308b7620642dc147794e6686a3f94d1b6fc8ef4d)"
level=info ts=2022-02-10T03:24:22.679510727Z caller=main.go:178 build_context="(go=go1.11.6, user=root@1e9a48272b38, date=20190405-12:27:40)"
level=info ts=2022-02-10T03:24:22.68530334Z caller=cluster.go:161 component=cluster msg="setting advertise address explicitly" addr=192.168.156.133 port=9094
level=info ts=2022-02-10T03:24:22.689931066Z caller=cluster.go:632 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2022-02-10T03:24:22.703779166Z caller=main.go:334 msg="Loading configuration file" file=alertmanager1.yml
level=info ts=2022-02-10T03:24:22.707237841Z caller=main.go:428 msg=Listening address=:9093
level=info ts=2022-02-10T03:24:24.690305758Z caller=cluster.go:657 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000287352s
level=info ts=2022-02-10T03:24:32.693591832Z caller=cluster.go:649 component=cluster msg="gossip settled; proceeding" elapsed=10.003586882s

5.5 修改node-up.rules

由于配置了 {{ .Annotations.description }} 变量，而之前 node-up.rules 中并没有配置该变量，会导致获取不到值。

所以需要在Prometheus的安装目录里面修改之前配置的规则文件

[root@localhost prometheus-2.6.1.linux-amd64]# ls
console_libraries  consoles  data  LICENSE  NOTICE  prometheus  prometheus.yml  promtool  rules
[root@localhost prometheus-2.6.1.linux-amd64]# cd rules/
[root@localhost rules]# ls
node-up.rules
[root@localhost rules]# vim node-up.rules
groups:
- name: node-uprules:- alert: node-upexpr: up{job="agent1"} == 0for: 15slabels:severity: 1team: nodeannotations:summary: "{{ $labels.instance }} 已停止运行超过 15s！"description: "{{ $labels.instance }} 检测到异常停止！请重点关注！！！"
~
"node-up.rules" 12L, 323C 已写入
[root@localhost rules]#

5.6 重启 Promethues 服务

[root@localhost rules]#
[root@localhost rules]# cd ..
[root@localhost prometheus-2.6.1.linux-amd64]# ls
console_libraries  consoles  data  LICENSE  NOTICE  prometheus  prometheus.yml  promtool  rules
[root@localhost prometheus-2.6.1.linux-amd64]# pkill prometheus
level=warn ts=2022-02-10T03:28:40.638273674Z caller=main.go:405 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2022-02-10T03:28:40.638327573Z caller=main.go:430 msg="Stopping scrape discovery manager..."
level=info ts=2022-02-10T03:28:40.638335586Z caller=main.go:444 msg="Stopping notify discovery manager..."
level=info ts=2022-02-10T03:28:40.63834017Z caller=main.go:466 msg="Stopping scrape manager..."
level=info ts=2022-02-10T03:28:40.638359536Z caller=main.go:426 msg="Scrape discovery manager stopped"
level=info ts=2022-02-10T03:28:40.638369808Z caller=main.go:440 msg="Notify discovery manager stopped"
level=info ts=2022-02-10T03:28:40.638431616Z caller=manager.go:664 component="rule manager" msg="Stopping rule manager..."
level=info ts=2022-02-10T03:28:40.638478552Z caller=manager.go:670 component="rule manager" msg="Rule manager stopped"
level=info ts=2022-02-10T03:28:40.638521662Z caller=main.go:460 msg="Scrape manager stopped"
[root@localhost prometheus-2.6.1.linux-amd64]# level=info ts=2022-02-10T03:28:40.640008618Z caller=notifier.go:521 component=notifier msg="Stopping notification manager..."
level=info ts=2022-02-10T03:28:40.640035125Z caller=main.go:615 msg="Notifier manager stopped"
level=info ts=2022-02-10T03:28:40.640192411Z caller=main.go:627 msg="See you next time!"[1]+  完成                  ./prometheus --config.file=prometheus.yml
[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9090
[root@localhost prometheus-2.6.1.linux-amd64]# ./prometheus --config.file=prometheus.yml &
[1] 81615
[root@localhost prometheus-2.6.1.linux-amd64]# level=info ts=2022-02-10T03:28:53.958420258Z caller=main.go:243 msg="Starting Prometheus" version="(version=2.6.1, branch=HEAD, revision=b639fe140c1f71b2cbad3fc322b17efe60839e7e)"
level=info ts=2022-02-10T03:28:53.95851453Z caller=main.go:244 build_context="(go=go1.11.4, user=root@4c0e286fe2b3, date=20190115-19:12:04)"
level=info ts=2022-02-10T03:28:53.958534672Z caller=main.go:245 host_details="(Linux 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021 x86_64 localhost.localdomain (none))"
level=info ts=2022-02-10T03:28:53.958548683Z caller=main.go:246 fd_limits="(soft=1024, hard=4096)"
level=info ts=2022-02-10T03:28:53.95855905Z caller=main.go:247 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2022-02-10T03:28:53.959002719Z caller=main.go:561 msg="Starting TSDB ..."
level=info ts=2022-02-10T03:28:53.959671934Z caller=web.go:429 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2022-02-10T03:28:53.959878293Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1644301801123 maxt=1644364800000 ulid=01FVEDMKCQGGJ3F9NDEETVAZW0
level=info ts=2022-02-10T03:28:53.959919384Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1644364800000 maxt=1644429600000 ulid=01FVGFR6499R9A354RPZ3BC6ET
level=info ts=2022-02-10T03:28:53.95993753Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1644451200000 maxt=1644458400000 ulid=01FVGS5JZ95MA7N14KF461PTZ5
level=info ts=2022-02-10T03:28:53.959958412Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1644429600000 maxt=1644451200000 ulid=01FVGS5K4K55VJQEH45W337PQK
level=warn ts=2022-02-10T03:28:54.114211565Z caller=head.go:434 component=tsdb msg="unknown series references" count=320781
level=info ts=2022-02-10T03:28:54.116993838Z caller=main.go:571 msg="TSDB started"
level=info ts=2022-02-10T03:28:54.117041776Z caller=main.go:631 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2022-02-10T03:28:54.11854499Z caller=main.go:657 msg="Completed loading of configuration file" filename=prometheus.yml
level=info ts=2022-02-10T03:28:54.118568236Z caller=main.go:530 msg="Server is ready to receive web requests."[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9090
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
prometheu 81615 root    3u  IPv6 838619      0t0  TCP *:websm (LISTEN)
prometheu 81615 root    7u  IPv4 838620      0t0  TCP localhost:43908->localhost:websm (ESTABLISHED)
prometheu 81615 root    8u  IPv6 838621      0t0  TCP localhost:websm->localhost:43908 (ESTABLISHED)
[root@localhost prometheus-2.6.1.linux-amd64]#

5.7 测试

上面的配置有一些问题，测试会出现下面这个问题
好像是模板里面的内容获取不到，大家可以参考去看，最终的效果如下：

[root@localhost node_export]# lsof -i:9100
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
prometheu 83165 root   23u  IPv4 870803      0t0  TCP localhost.localdomain:55416->localhost.localdomain:jetdirect (ESTABLISHED)
node_expo 84543 root    3u  IPv6 870789      0t0  TCP *:jetdirect (LISTEN)
node_expo 84543 root    5u  IPv6 870804      0t0  TCP localhost.localdomain:jetdirect->localhost.localdomain:55416 (ESTABLISHED)
[root@localhost node_export]# kill 84543
[root@localhost node_export]#

重新启动node节点后，也是会发送一封邮件

[root@localhost node_export]# nohup ./node_exporter &
[1] 84685
[root@localhost node_export]# nohup: 忽略输入并把输出追加到"nohup.out"[root@localhost node_export]#

Prometheus（6）Pormetheus+ Alertmanager配置邮件警告，并使用模板进行发送相关推荐

Prometheus 监控报警系统 AlertManager 之邮件告警
文章目录 1.Prometheus & AlertManager 介绍 2.环境.软件准备 3.启动并配置 Prometheus 3.1.Docker 启动 Prometheus 3.2.Do ...
alertmanager监控 Prometheus 告警，alertmanage配置邮件告警
1.搭建Prometheus,node及mysql参考该链接: 配置grafana展示prometheus监控数据 2.下载安装 alert manager: https://prometheus.i ...
Ambari配置邮件警告--监控大数据平台
1 打开管理界面 2 创建Alert 3 编辑内容 -- Name:自定义 -- Group:Ambari提供了一些默认的警告方案,也可以自定义.每个Group包含的报警信息都可以自己配置. -- S ...
alertmanager 配置邮件+邮件模板
alertmanager启动脚本 /usr/lib/systemd/system/alertmanager.service [Unit] Description=alertmanager[Servic ...
JumpServer配置邮件服务
缺德地图持续为您撞墙为什么需要配置邮件服务? 配置jumpserver邮件服务邮件配置说明获取邮箱授权码个人配置案例测试邮件配置是否可用编辑邮件内容配置创建用户,测试邮件提示可用性为什 ...
prometheus监控之alertmanager安装配置（2）接入电话报警、微信告警、短信、邮件等报警
电话报警(语音报警)可使用(OpsAlert) 支持电话.短信.邮件.微信多种报警,直接使用WebHook配置即可,比较简单. 介绍 Prometheus 将数据采集和报警分成两个模块.报警规则配置在 ...
配置文件详解+AlertManager微信邮件告警配置
文章目录前言 AlertManager告警简单部署一.AlertManager告警简介 1.简介 2.告警规则组成 1)告警名称 2)告警规则 3.Alertmanager特性 1)分组 2)抑制 ...
通过Alertmanager实现Prometheus的告警告警配置(邮箱加钉钉)
通过Alertmanager实现Prometheus的告警告警配置 Prometheus本身不支持的告警功能,主要通过插件Alertmanager来实现告警.Alertmanager用于接收Prom ...
CentOS7安装可移植Prometheus+grafana--alertmanager配置邮件告警
背景前两篇博文我们介绍了Prometheus及其相关的监控组件,本次我们记录一下告警组件:alertmanager 安装配置获取安装包官方下载地址:https://prometheus.io/d ...
（四） prometheus + grafana + alertmanager 配置Kafka监控
安装请看https://blog.51cto.com/liuqs/2027365 ,最好是对应的版本组件,否则可能会有差别. (一)prometheus + grafana + alertmanage ...

Prometheus（6）Pormetheus+ Alertmanager配置邮件警告，并使用模板进行发送