文章目录

0，数据接入和告警流程
1，Prometheus
- 1.1 Prometheus 主程序
- - 1.1.1 修改配置文件：prometheus.yml
  - 1.1.2 验证配置是否正确，然后启动服务（windows 双击exe文件）
  - 1.1.3 访问页面 `http://localhost:9090`
  - 1.1.4 Prometheus QL 查询 `http://localhost:9090/graph`
- 1.2 采集器 exporter
- - 1.2.1 修改Prometheus 配置文件 prometheus.yml，添加采集任务
- 1.3 告警器 alertmanager
- - 1.3.1 修改配置文件 alertmanager.yml
  - 1.3.2 修改Prometheus 配置文件 prometheus.yml，添加告警规则
  - 1.3.3 验证配置是否正确，然后启动服务（windows 双击exe文件）
  - 1.3.4 访问页面 `http://localhost:9093`
  - 1.3.5 验证收到的邮件
2，Grafana
- 2.1 下载安装
- 2.2 安装使用

0，数据接入和告警流程

1，Prometheus

服务简介：https://prometheus.io/docs/introduction/overview/

监控和告警工具：Prometheus is an open-source systems monitoring and alerting toolkit 。
数据时序存储，可附加标签： Prometheus collects and stores its metrics as time series data ( metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels )

1.1 Prometheus 主程序

安装文档：https://prometheus.io/docs/prometheus/latest/installation/

1.1.1 修改配置文件：prometheus.yml

# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
#alerting:
#  alertmanagers:
#  - static_configs:
#    - targets:
#      - "192.168.56.1:9093"     # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
#rule_files:#- "rules/*.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["192.168.56.1:9090"]- job_name: "alertmanager"static_configs:- targets: ["192.168.56.1:9093"]        - job_name: "linux-c7"static_configs:- targets: ["192.168.56.71:9100"]- job_name: "redis"static_configs:- targets: ["192.168.56.71:9121"]   - job_name: "es"static_configs:- targets: ["192.168.56.71:9114"]

1.1.2 验证配置是否正确，然后启动服务（windows 双击exe文件）

1.1.3 访问页面 `http://localhost:9090`

查看采集器状态（数据接入接口地址）：Status --> Targets

查看采集器收集的指标: http://192.168.56.71:9114/metrics （如磁盘使用指标 node_filesystem_free_bytes ）

1.1.4 Prometheus QL 查询 `http://localhost:9090/graph`

1.2 采集器 exporter

参考文档：https://prometheus.io/docs/instrumenting/exporters/
以下用node export为例：默认启动端口 9100

[root@c71 ~ ]# tar -xf node_exporter-1.4.0.linux-amd64.tar.gz -C /opt
[root@c71 ~ ]# cd /opt/node_exporter-1.4.0.linux-amd64/[root@c71 node_exporter-1.4.0.linux-amd64]# ./node_exporter --help--web.listen-address=":9100"Address on which to expose metrics and web interface.--web.telemetry-path="/metrics"Path under which to expose metrics.--web.disable-exporter-metricsExclude metrics about the exporter itself (promhttp_*, process_*, go_*).--web.max-requests=40      Maximum number of parallel scrape requests. Use 0 to disable.--collector.disable-defaultsSet all collectors to disabled by default.--web.config=""            [EXPERIMENTAL] Path to config yaml file that can enable TLS or authentication.--log.level=info           Only log messages with the given severity or above. One of: [debug, info, warn, error]--log.format=logfmt        Output format of log messages. One of: [logfmt, json]--version                  Show application version.[root@c71 node_exporter-1.4.0.linux-amd64]# ./node_exporter &

1.2.1 修改Prometheus 配置文件 prometheus.yml，添加采集任务

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "linux-c7"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.          static_configs:- targets: ["192.168.56.71:9100"]

1.3 告警器 alertmanager

1.3.1 修改配置文件 alertmanager.yml

global:# The smarthost and SMTP sender used for mail notifications.resolve_timeout: 5m #处理超时时间，默认为5minsmtp_smarthost: 'smtp.163.com:25' # 邮箱smtp服务器代理smtp_from: 'user@163.com' # 发送邮箱名称smtp_auth_username: 'user@163.com' # 邮箱名称smtp_auth_password: 'xxx' # 邮箱授权码 (登录163邮箱，并开通smtp服务，获取授权码） smtp_require_tls: falseroute:group_by: ['alertname']group_wait: 30sgroup_interval: 5mrepeat_interval: 1hreceiver: 'email'# 定义模板信心
#templates:
#  - 'template/*.tmpl'receivers:- name: 'email'email_configs: # 邮箱配置- to: 'user@163.com'  # 接收警报的email配置#html: '{{ template "test.html" . }}' # 设定邮箱的内容模板#headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题

1.3.2 修改Prometheus 配置文件 prometheus.yml，添加告警规则

# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets: - "192.168.56.1:9093"     # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"- "rules/*.yml"

编辑 rules/node_alert.yml

# groups：组告警
groups:
# name：组名。报警规则组名称
- name: 主机监控# rules：定义角色rules:# alert：告警名称。 任何实例5分钟内无法访问发出告警- alert: 磁盘使用率告警# expr：表达式。 获取磁盘使用率 大于百分之80 触发expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 10# for：持续时间。 表示持续一分钟获取不到信息，则触发报警。0表示不使用持续时间for: 1m# labels：定义当前告警规则级别labels:# severity: 指定告警级别。 warning critical severity: warning# annotations: 注释 告警通知annotations:# 调用标签具体指附加通知信息summary: "Instance {{ $labels.instance  }} ：{{ $labels.mountpoint }} 分区使用率过高" # 自定义摘要description: "{{ $labels.instance  }} ： {{ $labels.job  }} ：{{ $labels.mountpoint  }} 这个分区使用大于百分之10% (当前值：{{ $value }})" - name: 采集器状态监控   rules:   - alert: 节点宕机告警 # 告警名称expr: up == 0 # 告警的判定条件，参考Prometheus高级查询来设定for: 2m # 满足告警条件持续时间多久后，才会发送告警labels: #标签项team: nodeannotations: # 解析项，详细解释告警信息summary: "{{$labels.instance}}: has been down"description: "任务名：{{ $labels.job  }} ，节点ip: {{ $labels.instance  }} ，状态：下线了" # 自定义具体描述

重启Prometheus 主程序，查看加载到的告警配置（点击 Alerts 或Status–> Rules）

1.3.3 验证配置是否正确，然后启动服务（windows 双击exe文件）

1.3.4 访问页面 `http://localhost:9093`

1.3.5 验证收到的邮件

2，Grafana

服务简介：https://grafana.com/docs/grafana/latest/introduction/

指标、日志查询： enables you to query, visualize, alert on, and explore your metrics, logs, and traces wherever they are stored
可视化图表展现数据：turn your time-series database (TSDB) data into insightful graphs and visualizations

2.1 下载安装

安装包下载：https://grafana.com/grafana/download
windows 点击exe文件，安装后服务自动启动（默认端口3000，访问ip:port, 设置密码）

配置文件：D:\soft\GrafanaLabs\grafana\conf\defaults.ini

[server]
# Protocol (http, https, h2, socket)
protocol = http# The ip address to bind to, empty will bind to all interfaces
http_addr =# The http port to use
http_port = 3000

2.2 安装使用

1. 接入数据源prometheus（已接入各采集器数据）
1. 创建dashboard（根据grafana模板生成）
  搜索模板：https://grafana.com/grafana/dashboards/

例如主机监控模板，https://grafana.com/grafana/dashboards/1860-node-exporter-full/，复制ID号（或下载json然后离线导入）

Dashboard-- > Import --> 输入id号、或导入json文件，然后 load

1. 调整相应的指标展示项

Prometheus + Grafana 监控和告警相关推荐

prometheus+grafana监控以及企业微信告警
prometheus+grafana监控以及企业微信告警(单机二进制部署) 一.下载部署包,更改其中两个包名称,放到/data下 1.安装包以及解压步骤 grafana-enterprise-8.4. ...
Grafana监控系统之Prometheus+Grafana监控系统搭建
Grafana监控系统之Prometheus+Grafana监控系统搭建本文章内容较长,可通过右上角点击目录快速定位想看的内容 => => 一. 概述 1.1 Grafana介绍 Gra ...
Prometheus+Grafana 监控 MySQL
Prometheus 获取 MySQL 的监控数据,并通过 Grafana 展示的过程.首先来看整体架构图: 1 架构图如上图,通过 mysql_exporter 获取 MySQL 的监控数据,通过 ...
使用Prometheus+Grafana 监控MySQL/MONGODB
使用Prometheus+Grafana 监控MySQL/MONGODB 之前就久仰 Prometheus 大名,因为有用zabbix 进行监控,就没去安装它.现在正好用上监控MONGO+MYSQL. ...
基于Prometheus+Grafana监控SQL Server数据库
墨墨导读:本文整理了基于Prometheus+Grafana监控SQL Server数据库的全过程,分享至此,希望对大家有帮助. 搭建SQL Server环境使用容器建立SQL Server环境非常 ...
Prometheus+Grafana监控PostgreSQL
Prometheus+Grafana监控PostgreSQL Prometheus:2.32.0 Grafana:8.3.3 PG:13.2 Linux:CentOS7.6 docker:18.06. ...
搭建prometheus+grafana监控系统
prometheus简介 Prometheus是最初在SoundCloud上构建的开源系统监视和警报工具包 .自2012年成立以来,许多公司和组织都采用了Prometheus,该项目拥有非常活跃的开发 ...
k8s prometheus/grafana 监控系统建设
全栈工程师开发手册 (作者:栾鹏) 架构系列文章 prometheus架构其中 1.pushgateway是用来接收业务推送的数据形成metrics接口. 2.exporter是用来监控组件(三方中 ...
运维(32) Prometheus+Grafana监控SpringBoot
文章目录一.前言二.SpringBoot集成Micrometer 1.`pom.xml`中引入依赖 2.`application.yml`配置 3.Micrometer配置三.部署Prometh ...

Prometheus + Grafana 监控和告警

文章目录

0，数据接入和告警流程

1，Prometheus

1.1 Prometheus 主程序

1.1.1 修改配置文件：prometheus.yml

1.1.2 验证配置是否正确，然后启动服务（windows 双击exe文件）

1.1.3 访问页面 `http://localhost:9090`

1.1.4 Prometheus QL 查询 `http://localhost:9090/graph`

1.2 采集器 exporter

1.2.1 修改Prometheus 配置文件 prometheus.yml，添加采集任务

1.3 告警器 alertmanager

1.3.1 修改配置文件 alertmanager.yml

1.3.2 修改Prometheus 配置文件 prometheus.yml，添加告警规则

1.3.3 验证配置是否正确，然后启动服务（windows 双击exe文件）

1.3.4 访问页面 `http://localhost:9093`

1.3.5 验证收到的邮件

2，Grafana

2.1 下载安装

2.2 安装使用

Prometheus + Grafana 监控和告警相关推荐

最新文章

热门文章

Prometheus + Grafana 监控和告警

文章目录

0，数据接入和告警流程

1，Prometheus

1.1 Prometheus 主程序

1.1.1 修改配置文件：prometheus.yml

1.1.2 验证配置是否正确，然后启动服务（windows 双击exe文件）

1.1.3 访问页面 http://localhost:9090

1.1.4 Prometheus QL 查询 http://localhost:9090/graph

1.2 采集器 exporter

1.2.1 修改Prometheus 配置文件 prometheus.yml，添加采集任务

1.3 告警器 alertmanager

1.3.1 修改配置文件 alertmanager.yml

1.3.2 修改Prometheus 配置文件 prometheus.yml，添加告警规则

1.3.3 验证配置是否正确，然后启动服务（windows 双击exe文件）

1.3.4 访问页面 http://localhost:9093

1.3.5 验证收到的邮件

2，Grafana

2.1 下载安装

2.2 安装使用

Prometheus + Grafana 监控和告警相关推荐

最新文章

热门文章

1.1.3 访问页面 `http://localhost:9090`

1.1.4 Prometheus QL 查询 `http://localhost:9090/graph`

1.3.4 访问页面 `http://localhost:9093`