一文学会 Prometheus：开源系统监视和警报工具包！

作者 | 阿文，责编 | 郭芮

头图 | CSDN 下载自东方IC

出品 | CSDN（ID：CSDNnews）

Prometheus是最初在SoundCloud上构建的开源系统监视和警报工具包。自2012年成立以来，许多公司和组织都采用了Prometheus，这个项目发展到今天，已经全面接管了 Kubernetes 项目的整套监控体系。

Prometheus 项目与 Kubernetes 项目一样，也来自于 Google 的 Borg 体系，它的原型系统，叫作 BorgMon，是一个几乎与 Borg 同时诞生的内部监控系统。

Prometheus 优势

Prometheus非常适合记录任何纯数字时间序列。它既适合以机器为中心的监视，也适合监视高度动态的面向服务的体系结构。在微服务世界中，它对多维数据收集和查询的支持是一种特别的优势。

Prometheus的设计旨在提高可靠性，使其成为中断期间要使用的系统，从而使您能够快速诊断问题。每个Prometheus服务器都是独立的，而不依赖于网络存储或其他远程服务。当基础结构的其他部分损坏时，您可以依靠它，而无需建立广泛的基础结构来使用它。

Prometheus 不适合哪些场景？

普罗米修斯重视可靠性。即使在故障情况下，也始终可以查看有关系统的可用统计信息。但是如果您需要100％的准确性（例如按请求计费），则Prometheus并不是一个很好的选择，因为所收集的数据可能不会足够详细和完整。在这种情况下，最好使用其他系统来收集和分析计费数据，并使用Prometheus进行其余的监视。

Prometheus 架构

下图是 Prometheus 和它的组件的整体架构：

从图中可看到包含以下主要组件：

Prometheus Server: 用于收集和存储时间序列数据。Prometheus Server是Prometheus组件中的核心部分，负责实现对监控数据的获取，存储以及查询。Prometheus Server可以通过静态配置管理监控目标，也可以配合使用Service Discovery的方式动态管理监控目标，并从这些监控目标中获取数据。其次Prometheus Server需要对采集到的监控数据进行存储，Prometheus Server本身就是一个时序数据库，将采集到的监控数据按照时间序列的方式存储在本地磁盘当中。最后Prometheus Server对外提供了自定义的PromQL语言，实现对数据的查询以及分析。
Client Library: 客户端库，为需要监控的服务生成相应的 metrics 并暴露给 Prometheus server。当 Prometheus server 来 pull 时，直接返回实时状态的 metrics。
Push Gateway: 主要用于短期的 jobs。由于这类 jobs 存在时间较短，可能在 Prometheus 来 pull 之前就消失了。为此，这些 jobs 可以直接向 Prometheus server 端推送它们的 metrics。
Exporters: 用于暴露已有的第三方服务的 metrics 给 Prometheus。Exporter将监控数据采集的端点通过HTTP服务的形式暴露给Prometheus Server，Prometheus Server通过访问该Exporter提供的Endpoint端点，即可获取到需要采集的监控数据。
Alertmanager: 从 Prometheus server 端接收到 alerts 后，会进行去除重复数据，分组，并路由到对方的接受方式，发出报警。常见的接收方式有：电子邮件，pagerduty 等。
WEB UI：Prometheus Server内置的Express Browser UI，通过这个UI可以直接通过PromQL实现数据的查询以及可视化。
一些其他的工具。

特点

Prometheus的主要特点是：

多维数据模型（有metric名称和键值对确定的时间序列）
灵活的查询语言
不依赖分布式存储
通过pull方式采集时间序列，通过http协议传输
支持通过中介网关的push时间序列的方式
监控数据通过服务或者静态配置来发现
支持图表和dashboard等多种方式

Prometheus包含多个组件，其中有许多是可选的，例如：

Prometheus主服务器，用来收集和存储时间序列数据
应用程序client代码库
短时jobs的push gateway
基于Rails/SQL的GUI dashboard
特殊用途的exporter（包括HAProxy、StatsD、Ganglia等）
用于报警的alertmanager
命令行工具查询
大多数的组件都是用Go来完成的，使得它们方便构建和部署。

下载运行

直接去GitHub 下载最新的版本：

官网网站 https://prometheus.io/
下载地址 https://github.com/prometheus/prometheus/releases

下载后解压并进入到目录，执行：

[root@k8s prometheus-2.15.0.linux-amd64]# ls
console_libraries  consoles  data  LICENSE  NOTICE  prometheus  prometheus.yml  promtool  tsdb
[root@k8s prometheus-2.15.0.linux-amd64]# ./prometheus

启动后程序会输出一些日志，默认监听的端口是9090，使用的是prometheus目录下的prometheus.yaml 配置文件，程序启动时首选会启动prometheus，然后启动TSDB(时序数据库)：

level=info ts=2019-12-24T06:34:56.601Z caller=main.go:294 msg="no time or size retention was set so using the default time retention" duration=15d
level=info ts=2019-12-24T06:34:56.601Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.0, branch=HEAD, revision=ec1868b0267d13cb5967286fd5ec6afff507905b)"
level=info ts=2019-12-24T06:34:56.601Z caller=main.go:331 build_context="(go=go1.13.5, user=root@240f2f89177f, date=20191223-12:03:32)"
level=info ts=2019-12-24T06:34:56.601Z caller=main.go:332 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 k8s (none))"
level=info ts=2019-12-24T06:34:56.601Z caller=main.go:333 fd_limits="(soft=1024, hard=4096)"
level=info ts=2019-12-24T06:34:56.602Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-12-24T06:34:56.604Z caller=main.go:648 msg="Starting TSDB ..."
level=info ts=2019-12-24T06:34:56.604Z caller=web.go:506 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-12-24T06:34:56.607Z caller=head.go:584 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2019-12-24T06:34:56.612Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=2
level=info ts=2019-12-24T06:34:56.616Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=1 maxSegment=2
level=info ts=2019-12-24T06:34:56.617Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=2 maxSegment=2
level=info ts=2019-12-24T06:34:56.618Z caller=main.go:663 fs_type=EXT4_SUPER_MAGIC
level=info ts=2019-12-24T06:34:56.618Z caller=main.go:664 msg="TSDB started"
level=info ts=2019-12-24T06:34:56.619Z caller=main.go:734 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2019-12-24T06:34:56.620Z caller=main.go:762 msg="Completed loading of configuration file" filename=prometheus.yml
level=info ts=2019-12-24T06:34:56.620Z caller=main.go:617 msg="Server is ready to receive web requests."

此时通过浏览器访问，可以看到如下界面，这就是prometheus 的控制台：

配置文件

prometheus.yml 是prometheus 的配置文件，你可以使用如下命令来指定配置文件启动 prometheus：

prometheus --config.file=prometheus.yml

它的默认配置如下：

# cat prometheus.yml
# my global config
global:scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ['localhost:9090']

包括了：

global 全局配置
alerting 用来接收prometheus发出的告警，然后按照配置文件的要求，将告警用对应的方式发送出去。
rule_files 指定加载的告警规则文件
scrape_configs 指定prometheus要监控的目标

其中global是一些常规的全局配置，这里只列出了两个参数：

scrape_interval: 15s #每15s采集一次数据
evaluation_interval: 15s #每15s做一次告警检测

scrape_configs指定prometheus要监控的目标，在scrape_config 中每个监控目标是一个 job，但job的类型有很多种。可以是最简单的static_config，即静态地指定每一个目标，例如上面的：

- job_name: prometheusstatic_configs:- targets: ['localhost:9090']

默认的配置文件scrape_configs 定义了一个job 对 prometheus 自身进行监控。你可以访问ip:9090/metrics 来访问 prometheus 自身的监控数据：

我们用浏览器访问http://host:9090/metrics，即可看到一个instance向外暴露的监控指标。除了注释外，其它每一行都是一个监控指标项，大部分指标形如：

go_info{version="go1.10.3"} 1

这里go_info即为度量指标名称，version为这个度量指标的标签，go1.10.3为这个度量指标version标签的值，1为这个度量指标当前采样的值，一个度量指标的标签可以有0个或多个标签。这就是上面说到的监控指标数据模型。

可以看到有些度量指标的形式如下：

go_memstats_frees_total 131961

按prometheus官方建议的规范，以_total为后缀的度量指标一般类型是counter计数器类型。

有些度量指标的形式如下：

go_memstats_gc_sys_bytes 213408

这种度量指标一般类型是gauge测量器类型。

有些度量指标的形式如下：

prometheus_http_response_size_bytes_bucket{handler="/metrics",le="100"} 0prometheus_http_response_size_bytes_bucket{handler="/metrics",le="1000"} 0prometheus_http_response_size_bytes_bucket{handler="/metrics",le="10000"} 46prometheus_http_response_size_bytes_bucket{handler="/metrics",le="100000"} 46prometheus_http_response_size_bytes_bucket{handler="/metrics",le="1e+06"} 46prometheus_http_response_size_bytes_bucket{handler="/metrics",le="1e+07"} 46prometheus_http_response_size_bytes_bucket{handler="/metrics",le="1e+08"} 46prometheus_http_response_size_bytes_bucket{handler="/metrics",le="1e+09"} 46prometheus_http_response_size_bytes_bucket{handler="/metrics",le="+Inf"} 46prometheus_http_response_size_bytes_sum{handler="/metrics"} 234233prometheus_http_response_size_bytes_count{handler="/metrics"} 46

这种就是histogram柱状图类型。

还有的形式如下：

go_gc_duration_seconds{quantile="0"} 7.3318e-05go_gc_duration_seconds{quantile="0.25"} 0.000118693go_gc_duration_seconds{quantile="0.5"} 0.000236845go_gc_duration_seconds{quantile="0.75"} 0.000337872go_gc_duration_seconds{quantile="1"} 0.000707002go_gc_duration_seconds_sum 0.003731953go_gc_duration_seconds_count 14

这种就是summary总结类型。

更多关于配置相关的说明，可以阅读官网文档：https://prometheus.io/docs/prometheus/latest/configuration/configuration/

Prometheus 的一些概念

Jobs和Instances(任务和实例)

就Prometheus而言，pull拉取采样点的端点服务称之为instance。多个这样pull拉取采样点的instance, 则构成了一个job。

例如, 一个被称作api-server的任务有四个相同的实例。

job: api-server   instance 1：1.2.3.4:5670   instance 2：1.2.3.4:5671   instance 3：5.6.7.8:5670   instance 4：5.6.7.8:5671

自动化生成的标签和时间序列

当Prometheus拉取一个目标, 会自动地把两个标签添加到度量名称的标签列表中，分别是：

job: 目标所属的配置任务名称api-server。
instance: 采样点所在服务: host:port 如果以上两个标签二者之一存在于采样点中，这个取决于honor_labels配置选项。

对于每个采样点所在服务instance，Prometheus都会存储以下的度量指标采样点：

up{job=”[job-name]", instance="instance-id”}: up值=1，表示采样点所在服务健康; 否则，网络不通, 或者服务挂掉了
scrape_duration_seconds{job=”[job-name]", instance=”[instance-id]"}: 尝试获取目前采样点的时间开销
scrape_samples_scraped{job=”[job-name]", instance=”[instance-id]"}: 这个采样点目标暴露的样本点数量

up度量指标对服务健康的监控是非常有用的。

数据模型

Prometheus从根本上存储的所有数据都是时间序列: 具有时间戳的数据流只属于单个度量指标和该度量指标下的多个标签维度。除了存储时间序列数据外，Prometheus也可以利用查询表达式存储5分钟的返回结果中的时间序列数据

metrics和labels(度量指标名称和标签)

每一个时间序列数据由metric度量指标名称和它的标签labels键值对集合唯一确定。

这个metric度量指标名称指定监控目标系统的测量特征（如：http_requests_total- 接收http请求的总计数）. metric度量指标命名ASCII字母、数字、下划线和冒号，他必须配正则表达式[a-zA-Z_:][a-zA-Z0-9_:]*。

标签开启了Prometheus的多维数据模型：对于相同的度量名称，通过不同标签列表的结合, 会形成特定的度量维度实例。(例如：所有包含度量名称为/api/tracks的http请求，打上method=POST的标签，则形成了具体的http请求)。这个查询语言在这些度量和标签列表的基础上进行过滤和聚合。改变任何度量上的任何标签值，则会形成新的时间序列图

标签label名称可以包含ASCII字母、数字和下划线。它们必须匹配正则表达式[a-zA-Z_][a-zA-Z0-9_]*。带有_下划线的标签名称被保留内部使用。

标签labels值包含任意的Unicode码。

有序的采样值

有序的采样值形成了实际的时间序列数据列表。每个采样值包括：

一个64位的浮点值
一个精确到毫秒级的时间戳一个样本数据集是针对一个指定的时间序列在一定时间范围的数据收集。

小结：指定度量名称和度量指标下的相关标签值，则确定了所关心的目标数据，随着时间推移形成一个个点，在图表上实时绘制动态变化的线条’’

Notation(符号)

表示一个度量指标和一组键值对标签，需要使用以下符号：

[metric name]{[label name]=[label value], …}

例如，度量指标名称是api_http_requests_total，标签为method="POST", handler="/messages" 的示例如下所示：

api_http_requests_total{method="POST”, handler=”/messages”}

这些命名和OpenTSDB使用方法是一样的。

metrics类型

Prometheus 提供了四个核心的metrics类型。这四种类型目前仅在客户库和wire协议中区分。Prometheus服务还没有充分利用这些类型。不久的将来就会发生改变。

Counter(计数器)

counter 是一个累计度量指标，它是一个只能递增的数值。计数器主要用于统计服务的请求数、任务完成数和错误出现的次数等等。计数器是一个递增的值。反例：统计goroutines的数量。

Gauge(测量器)

gauge是一个度量指标，它表示一个既可以递增, 又可以递减的值。

测量器主要测量类似于温度、当前内存使用量等，也可以统计当前服务运行随时增加或者减少的Goroutines数量

Histogram(柱状图)

histogram，是柱状图，在Prometheus系统中的查询语言中，有三种作用：

对每个采样点进行统计，打到各个分类值中(bucket)
对每个采样点值累计和(sum)
对采样点的次数累计和(count)

度量指标名称: [basename]的柱状图, 上面三类的作用度量指标名称：

[basename]_bucket{le="上边界”}, 这个值为小于等于上边界的所有采样点数量
[basename]_sum
[basename]_count

小结：所以如果定义一个度量类型为Histogram，则Prometheus系统会自动生成三个对应的指标。

使用histogram_quantile()函数, 计算直方图或者是直方图聚合计算的分位数阈值。一个直方图计算Apdex值也是合适的, 当在buckets上操作时，记住直方图是累计的。

总结：类似histogram柱状图，summary是采样点分位图统计，(通常的使用场景：请求持续时间和响应大小)。它也有三种作用：

对于每个采样点进行统计，并形成分位图。（如：正态分布一样，统计低于60分不及格的同学比例，统计低于80分的同学比例，统计低于95分的同学比例）
统计班上所有同学的总成绩(sum)
统计班上同学的考试总人数(count)

带有度量指标的[basename]的summary 在抓取时间序列数据展示。

观察时间的φ-quantiles (0 ≤ φ ≤ 1), 显示为[basename]{分位数="[φ]"}
[basename]_sum，是指所有观察值的总和
[basename]_count, 是指已观察到的事件计数值

自定义监控上报

系统自带的 exporter

在prometheus的世界里70%的场景并不需要专门写埋点逻辑代码，因为已经有现成的各类exporter了，只要找到合适的exporter，启动exporter就直接暴露出一个符合prometheus规范的服务端点了。

exporter列表参见这里（https://prometheus.io/docs/instrumenting/exporters/），另外官方git仓库（https://github.com/prometheus）里也有一些exporter。

举例，在某个宿主机上运行node_exporter后，以Centos为例，安装：

# curl -Lo /etc/yum.repos.d/_copr_ibotty-prometheus-exporters.repo https://copr.fedorainfracloud.org/coprs/ibotty/prometheus-exporters/repo/epel-7/ibotty-prometheus-exporters-epel-7.repo
# yum install node_exporter

然后执行：

node_exporter

如图所示：

用浏览器访问http://${host_ip}:9100/metrics即可看到node_exporter暴露出的这个宿主机各类监控指标数据：

然后在prometheus的配置文件里加入以下一段：

scrape_configs:......- job_name: 'node_monitor_demo'static_configs:- targets: ['${host_ip}:9100']

然后在prometheus的web管理控制台里就可以查询到相应的监控指标了。在http://${HOST}:9090/graph界面里输入go_memstats_alloc_bytes{instance="${host_ip}:9100"}点击Execute按钮即可。

将 ${host_ip} 替换成你的IP

如图：

在控制台中，切换到Graph 可以看到对应的监控图标，在图标列中可以显示对应job 的监控指标：

编写自定义的监控代码

假如你的监控指标很特殊，需要自己写埋点上报逻辑代码，也是比较简单的。已经有各个语言的Client Libraries（https://prometheus.io/docs/instrumenting/clientlibs/）了，照着示例写就可以了。


新勋章，新奖品，高流量，还有更多福利等你来拿更多精彩推荐☞AI 换脸项目 ALAE 登顶 Github，AI 换脸又升级？
☞吊打面试官系列：你会「递归」么？
☞蒋凡“罚酒三杯”，这些被价值观干掉的阿里人表示不服
☞用 Python 实现手机自动答题，这下百万答题游戏谁也玩不过我！
☞IDEA 惊天 bug：进程已结束，退出代码 1073741819
☞当 DeFi 遇上 Rollup，将擦出怎样的火花？
你点的每个“在看”，我都认真当成了喜欢