部署Prometheus

我们使用kube-prometheus 在K8S中部署Prometheus ,我们直接使用开源的 mainfest 文件即可。我们创建单独的 namespace 进行监控

apiVersion: v1
kind: Namespace
metadata:name: monitoring

官网如下:

## 参考1
https://github.com/prometheus-operator/kube-prometheus
###
https://github.com/prometheus-operator/kube-prometheus/tree/main/manifests/setup## 参考2
https://github.com/camilb/prometheus-kubernetes###  告警配置
https://www.qikqiak.com/post/prometheus-operator-custom-alert/

安装部署如下:

###先查看k8s 是哪个版本,切到那个版本下
git checkout -b 本地分支 origi/远程分支###
# Create the namespace and CRDs, and then wait for them to be available before creating the remaining resources
kubectl apply --server-side -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl apply -f manifests/

组件分析

(1)kube-state-metrics与metrics-server对比?

我们服务在运行过程中,我们想了解服务运行状态,pod有没有重启,伸缩有没有成功,pod的状态是怎么样的等,这时就需要kube-state-metrics,它主要关注deployment,、node 、 pod等内部对象的状态。而metrics-server 主要用于监测node,pod等的CPU,内存,网络等系统指标。

报错解决

原因是拉不下镜像,因为网络原因。kube-state-metrics 镜像无法拉下来

  Type     Reason                 Age                   From               Message----     ------                 ----                  ----               -------Normal   Scheduled              19m                   default-scheduler  Successfully assigned monitoring/kube-state-metrics-5fcb7d6fcb-k6sfd to 172.19.193.25Normal   SuccessfulMountVolume  19m                   kubelet            Successfully mounted volumes for pod "kube-state-metrics-5fcb7d6fcb-k6sfd_monitoring(0c0134c9-120c-4fd7-adcf-e61b2dae680a)"Normal   Pulling                18m                   kubelet            Pulling image "quay.io/brancz/kube-rbac-proxy:v0.11.0"Normal   Pulled                 18m                   kubelet            Successfully pulled image "quay.io/brancz/kube-rbac-proxy:v0.11.0" in 39.040594662sNormal   Pulled                 18m                   kubelet            Container image "quay.io/brancz/kube-rbac-proxy:v0.11.0" already present on machineNormal   Started                18m                   kubelet            Started container kube-rbac-proxy-mainNormal   SuccessfulCreate       18m                   kubelet            Created container kube-rbac-proxy-mainNormal   SuccessfulCreate       18m                   kubelet            Created container kube-rbac-proxy-selfNormal   Started                18m                   kubelet            Started container kube-rbac-proxy-selfNormal   Pulling                17m (x3 over 19m)     kubelet            Pulling image "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0"Warning  FailedCreate           16m (x3 over 18m)     kubelet            Error: ErrImagePullWarning  FailedPullImage        16m (x3 over 18m)     kubelet            Failed to pull image "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)Warning  FailedCreate           16m (x4 over 17m)     kubelet            Error: ImagePullBackOffWarning  BackOffPullImage       4m29s (x50 over 17m)  kubelet            Back-off pulling image "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0"
$ kubectl describe po kube-state-metrics-5fcb7d6fcb-k6sfd -n monitoring

我们换成其他镜像下载:

##
vim kubeStateMetrics-deployment.yaml ##
spec:replicas: 1selector:matchLabels:app.kubernetes.io/component: exporterapp.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/part-of: kube-prometheustemplate:metadata:annotations:kubectl.kubernetes.io/default-container: kube-state-metricslabels:app.kubernetes.io/component: exporterapp.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/part-of: kube-prometheusapp.kubernetes.io/version: 2.3.0spec:containers:- args:- --host=127.0.0.1- --port=8081- --telemetry-host=127.0.0.1- --telemetry-port=8082#image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0image: quay.io/coreos/kube-state-metrics:v1.9.8   # 改成可下载的

prometheus-adapter 镜像无法拉下来

QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:Type     Reason                 Age                  From               Message----     ------                 ----                 ----               -------Normal   Scheduled              30m                  default-scheduler  Successfully assigned monitoring/prometheus-adapter-58668f79bc-lgj95 to 172.19.193.102Normal   SuccessfulMountVolume  30m                  kubelet            Successfully mounted volumes for pod "prometheus-adapter-58668f79bc-lgj95_monitoring(518e3bbd-23d5-4944-ad63-25948338122d)"Normal   Pulling                27m (x4 over 30m)    kubelet            Pulling image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1"Warning  FailedPullImage        27m (x4 over 30m)    kubelet            Failed to pull image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)Warning  FailedCreate           27m (x4 over 30m)    kubelet            Error: ErrImagePullWarning  FailedCreate           27m (x6 over 30m)    kubelet            Error: ImagePullBackOffWarning  BackOffPullImage       38s (x115 over 30m)  kubelet            Back-off pulling image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1"
$ kubectl describe po prometheus-adapter-58668f79bc-lgj95  -n monitoring

解决:Prometheus-Adapter安装_多云容器平台 MCP_用户指南_监控中心_华为云

##
vim prometheusAdapter-deployment.yamlspec:replicas: 2selector:matchLabels:app.kubernetes.io/component: metrics-adapterapp.kubernetes.io/name: prometheus-adapterapp.kubernetes.io/part-of: kube-prometheusstrategy:rollingUpdate:maxSurge: 1maxUnavailable: 1template:metadata:labels:app.kubernetes.io/component: metrics-adapterapp.kubernetes.io/name: prometheus-adapterapp.kubernetes.io/part-of: kube-prometheus.... 省略#image: k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1image: directxman12/k8s-prometheus-adapter-amd64:v0.7.0  # 改成这个name: prometheus-adapter

最后查看是否都起来了:

$ kubectl get po -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   0          38m
alertmanager-main-1                    2/2     Running   0          38m
alertmanager-main-2                    2/2     Running   0          38m
blackbox-exporter-776596fdf8-82qj7     3/3     Running   0          39m
grafana-667874d57-xvvpt                1/1     Running   0          39m
kube-state-metrics-584858f6fc-24jlx    3/3     Running   0          12m
node-exporter-hn88p                    2/2     Running   0          39m
node-exporter-jt7b8                    2/2     Running   0          39m
prometheus-adapter-544596c9f5-gsbzp    1/1     Running   0          42s
prometheus-adapter-544596c9f5-rsb7d    1/1     Running   0          42s
prometheus-k8s-0                       2/2     Running   0          38m
prometheus-k8s-1                       2/2     Running   0          38m
prometheus-operator-7ddc6877d5-d58rd   2/2     Running   0          39m

开启对外访问

(1)修改proms的svc

# vi prometheus-service.yaml##
[root@k8s-01 manifests]# cat prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/component: prometheusapp.kubernetes.io/instance: k8sapp.kubernetes.io/name: prometheusapp.kubernetes.io/part-of: kube-prometheusapp.kubernetes.io/version: 2.36.1name: prometheus-k8snamespace: monitoring
spec:type: NodePortports:- name: webport: 9090targetPort: webnodePort: 30100 # 外部访问#  - name: reloader-web
#    port: 8080
#    targetPort: reloader-webselector:app.kubernetes.io/component: prometheusapp.kubernetes.io/instance: k8sapp.kubernetes.io/name: prometheusapp.kubernetes.io/part-of: kube-prometheussessionAffinity: ClientIP
[root@k8s-01 manifests]# 

(2)修改grafana的svc

[root@k8s-01 manifests]# cat grafana-service.yaml
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/component: grafanaapp.kubernetes.io/name: grafanaapp.kubernetes.io/part-of: kube-prometheusapp.kubernetes.io/version: 8.5.5name: grafananamespace: monitoring
spec:type: NodePortports:- name: httpport: 3000targetPort: httpnodePort: 30200selector:app.kubernetes.io/component: grafanaapp.kubernetes.io/name: grafanaapp.kubernetes.io/part-of: kube-prometheus

(3)访问:

##
http://xx.cn:30200
#
http://xx.cn:30100### grafana的默认账号和密码为
admin/admin

proms查询

# 查询指定命名空间信息
container_cpu_usage_seconds_total{namespace="car-stg"}

告警规则编写

规则文章可参考如下:

## 参考1
https://awesome-prometheus-alerts.grep.to/rules.html## 参考2
https://github.com/camilb/prometheus-kubernetes/blob/master/manifests/prometheus/prometheus-k8s-rules.yaml

如何修改alert rule?

####  方式1: 通过rule规则修改
## edit
kubectl edit cm  prometheus-k8s-rulefiles-0  -n monitoring #### 方式2: 修改配置文件方式
cd /opt/proms-k8s/kube-prometheus/manifests
vim kubePrometheus-prometheusRule.yaml###
kubectl apply kubePrometheus-prometheusRule.yaml

(1)Kubernetes Node ready

  - alert: KubernetesNodeReadyexpr: kube_node_status_condition{condition="Ready",status="true"} == 0for: 10mlabels:severity: criticalannotations:summary: Kubernetes Node ready (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has been unready for a long time\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

(2)Kubernetes 内存告警

  - alert: KubernetesMemoryPressureexpr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1for: 2mlabels:severity: criticalannotations:summary: Kubernetes memory pressure (instance {{ $labels.instance }})description: "{{ $labels.node }} has MemoryPressure condition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

(3)Kubernetes out of disk

  - alert: KubernetesOutOfDiskexpr: kube_node_status_condition{condition="OutOfDisk",status="true"} == 1for: 2mlabels:severity: criticalannotations:summary: Kubernetes out of disk (instance {{ $labels.instance }})description: "{{ $labels.node }} has OutOfDisk condition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

配置告警推送到媒介

##

Kubernetes部署Prometheus operator相关推荐

  1. kubernetes部署Prometheus

    文章目录 准备工作 整一个PV来存放TSDB数据 部署Prometheus 准备工作 开始部署Prometheus 部署Grafana 准备工作 开始部署Grafana 部署Ingress,通过Ing ...

  2. Kubernetes更优雅的监控工具Prometheus Operator

    Kubernetes更优雅的监控工具Prometheus Operator [TOC] 1. Kubernetes Operator 介绍 在 Kubernetes 的支持下,管理和伸缩 Web 应用 ...

  3. 在Kubernetes上使用Prometheus Operator监视应用程序

    您可以使Prometheus配置了解您的应用程序在其中运行的Kubernetes环境.在先前的博客文章中 ,我已经描述了如何手动执行该操作. Prometheus Operator是Kubernete ...

  4. k8s部署Kube Prometheus(Prometheus Operator)

    摘要 本文通过Prometheus-operator框架一键化安装prometheus.alertmanage.granfana,并配置企业微信api以及告警推送,搭建 prometheus 的前提环 ...

  5. Prometheus Operator 部署

    安装 为了使用 Prometheus-Operator,这里我们直接使用 kube-prometheus 这个项目来进行安装(提供了很多的内置规则,可以直接拿来使用),该项目和 Prometheus- ...

  6. 运维实操——kubernetes(十九)k8s中部署Prometheus、监控nginx、HPA自动伸缩

    k8s中部署Prometheus.监控nginx.HPA自动伸缩 1.什么是Prometheus? 2.k8s中部署Prometheus监控 3.prometheus监控nginx 4.基于prome ...

  7. Kubernetes(k8s)之在集群环境部署Prometheus(普罗米修斯监控)和集群的ui管理工具Grafana

    Prometheus Prometheus 演示环境 部署Prometheus和Grafana 测试 Prometheus Prometheus是一个开源系统监控和警报工具包. 现在是一个独立的开源项 ...

  8. 安装kube-prometheus项目:k8s部署prometheus、监控k8s核心组件、添加告警(微信、钉钉、企业微信)、进行数据持久化

    概述 很多地方提到Prometheus Operator是kubernetes集群监控的终极解决方案,但是目前Prometheus Operator已经不包含完整功能,完整的解决方案已经变为kube- ...

  9. Kubernetes API 与 Operator:不为人知的开发者战争

    戳蓝字"CSDN云计算"关注我们哦! 前情回顾:<Kubernetes API 与 Operator:不为人知的开发者战争(上)> 2016 年秋天,原 CoreOS ...

最新文章

  1. 判断javascript数组的方法
  2. CentOS挂载U盘
  3. 常见windows 2000系统进程描述
  4. opcua客户端实现断线重连_虹科教您|实现OPC UA C/S快速部署及数据采集
  5. verilog 移位运算符 说明_Verilog学习笔记基本语法篇(二)·········运算符...
  6. 基于联合非负矩阵分解的多视角聚类学习笔记
  7. Word打开以后界面很小的问题(office办公)
  8. 竞合关系会是云计算行业主流:阿里云发布新一代数据库
  9. opencv--normalize函数详解
  10. 核心单词Word List 9
  11. 【网络安全系列】之新型勒索病毒WannaRen疑在国内大规模传播,威力不亚于新冠
  12. 计算机编程辅导班,昆明少儿计算机编程辅导班
  13. 02论文分享与分析——基于ROS的移动机械臂底层规划及运动仿真
  14. 关于'//![cdata['和 '//]]'解答
  15. 常见的Nginx 502 Bad Gateway解决办法
  16. keras的数字图像识别
  17. 探索TP6验证场景的only、remove、append规则
  18. 妈蛋的,写篇博客~~
  19. win内置ubuntu安装_win10内置的ubuntu安装在什么位置
  20. 在浏览器中输入URL并回车后都发生了什么?

热门文章

  1. Android NDK编译常见错误及解决方案
  2. 幻影路由服务器无响应,幻影D128路由器连不上网怎么办? | 192路由网
  3. Win11正式发布:支持安卓应用
  4. LaTeX命令速查手册1
  5. android自定义sidebar,Sidebar - WiFi、GPS、手电筒们都到这里来! - Android 应用 - 【最美应用】...
  6. arctanx麦克劳林公式推导过程_多元正态分布的推导、n维球体积面积的计算
  7. luoguP3600 随机数生成器概率与期望Dp
  8. Linux清理GPU显存
  9. 关于香港DHL与大陆DHL有什么不同?
  10. 一个即将30岁的Android程序员妄想用他的「逆袭」来劝你不再「焦虑」