gpu-manager安装

  • 概述
  • 准备工作
  • 部署gpu-manager
  • 部署gpu-admission
  • 查看结果
  • 参考

概述

gpu-manager是腾讯的一个开源vGPU应用,具体原理就不介绍了,详见GPUManager虚拟化方案。

本文主要参照腾讯开源vgpu方案gpu-manager安装教程进行安装,并就安装时出现的问题,对其中的部分配置进行了更改,如果根据上述文章安装失败,可以参考本文来进行安装。

准备工作

gpu-manager不提供nvidia容器运行时,需要提前在所有有GPU的节点上安装nvidia驱动。如果集群中之前安装了gpu-operator之类的应用,需要先卸载,否则会因为kubelet占用Xserver进程导致安装过程出现error。具体过程不赘述了,参考如下文章:
超全超详细的安装nvidia显卡驱动教程
Ubuntu安装nvidia驱动
解决centos下安装显卡驱动出现的unable to find the kernel source tree等关于内核版本问题
如何关闭X Server,以避免在更新nVidia驱动程序时出错?

安装完之后重启(没有试过不重启是否可以)并运行如下命令,以初始化/dev下的硬件:

nvidia-smi
nvidia-modprobe -u -c=0

运行后/dev下应该有如下等内容被创建:

[root@xxxxxx dev]# ls /dev|grep nvid
nvidia0
nvidia-caps
nvidiactl
nvidia-uvm
nvidia-uvm-tools

否则容器初始化时会报一个/dev/xxx找不到的错误
(参考:https://blog.csdn.net/JosephThatwho/article/details/107869332)

部署gpu-manager

本文集群中docker的驱动是systemd,而gpu-manager默认为cgroupfs,因此需要修改配置,而更换驱动的配置在gpu-manager较高版本才支持。
并且如果集群版本较高,低版本的gpu-manager会不兼容(本文k8s版本为v1.22.10)。
创建gpu-manager.yaml配置如下:

apiVersion: v1
kind: ServiceAccount
metadata:name: gpu-managernamespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: gpu-manager-role
subjects:
- kind: ServiceAccountname: gpu-managernamespace: kube-system
roleRef:kind: ClusterRolename: cluster-adminapiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: DaemonSet
metadata:name: gpu-manager-daemonsetnamespace: kube-system
spec:updateStrategy:type: RollingUpdateselector:matchLabels:name: gpu-manager-dstemplate:metadata:# This annotation is deprecated. Kept here for backward compatibility# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/annotations:scheduler.alpha.kubernetes.io/critical-pod: ""labels:name: gpu-manager-dsspec:serviceAccount: gpu-managertolerations:# This toleration is deprecated. Kept here for backward compatibility# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/- key: CriticalAddonsOnlyoperator: Exists- key: tencent.com/vcuda-coreoperator: Existseffect: NoSchedule# Mark this pod as a critical add-on; when enabled, the critical add-on# scheduler reserves resources for critical add-on pods so that they can# be rescheduled after a failure.# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/priorityClassName: "system-node-critical"# only run node has gpu devicenodeSelector:nvidia-device-enable: enablehostPID: truecontainers:- image: tkestack/gpu-manager:v1.1.5name: gpu-managersecurityContext:privileged: trueports:- containerPort: 5678volumeMounts:- name: device-pluginmountPath: /var/lib/kubelet/device-plugins- name: vdrivermountPath: /etc/gpu-manager/vdriver- name: vmdatamountPath: /etc/gpu-manager/vm- name: logmountPath: /var/log/gpu-manager- name: checkpointmountPath: /etc/gpu-manager/checkpoint- name: run-dirmountPath: /var/run- name: cgroupmountPath: /sys/fs/cgroupreadOnly: true- name: usr-directorymountPath: /usr/local/hostreadOnly: true- name: kube-rootmountPath: /root/.kubereadOnly: trueenv:- name: LOG_LEVELvalue: "4"- name: EXTRA_FLAGSvalue: "--cgroup-driver=systemd"- name: NODE_NAMEvalueFrom:fieldRef:fieldPath: spec.nodeNamevolumes:- name: device-pluginhostPath:type: Directorypath: /var/lib/kubelet/device-plugins- name: vmdatahostPath:type: DirectoryOrCreatepath: /etc/gpu-manager/vm- name: vdriverhostPath:type: DirectoryOrCreatepath: /etc/gpu-manager/vdriver- name: loghostPath:type: DirectoryOrCreatepath: /etc/gpu-manager/log- name: checkpointhostPath:type: DirectoryOrCreatepath: /etc/gpu-manager/checkpoint# We have to mount the whole /var/run directory into container, because of bind mount docker.sock# inode change after host docker is restarted- name: run-dirhostPath:type: Directorypath: /var/run- name: cgrouphostPath:type: Directorypath: /sys/fs/cgroup# We have to mount /usr directory instead of specified library path, because of non-existing# problem for different distro- name: usr-directoryhostPath:type: Directorypath: /usr- name: kube-roothostPath:type: Directorypath: /root/.kube

主要修改了如下:
更换了高版本镜像

去掉–incluster-mode=true,因为高版本没有该选项
其次如果不指定或者将–logtostderr为true,那么日志就会显示在容器的log(命令行)中,按需指定
最后指定–cgroup-driver为systemd(如果你的驱动是cgroupfs则无需指定)


它会创建daemonset,并在对应搭上了一个标签的node上运行。
所以需要给所有需要调度gpu节点打上标签,如下:

kubectl label node <你的GPU节点> nvidia-device-enable=enable
kubectl label node <你的GPU节点> nvidia-device-enable=enable
...
kubectl apply -f gpu-manager.yaml

如果一切正确的话,守护进程应该在给打了label的节点上正常运行:

部署gpu-admission

gpu-admission的部署按照上述教程(https://www.jianshu.com/p/7d795bc226c7)的来没有问题,不过我做了一些小小的改变
创建gpu-admission.yaml如下:

apiVersion: v1
kind: ServiceAccount
metadata:name: gpu-admissionnamespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: gpu-admission-as-kube-scheduler
subjects:
- kind: ServiceAccountname: gpu-admissionnamespace: kube-system
roleRef:kind: ClusterRolename: system:kube-schedulerapiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: gpu-admission-as-volume-scheduler
subjects:
- kind: ServiceAccountname: gpu-admissionnamespace: kube-system
roleRef:kind: ClusterRolename: system:volume-schedulerapiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: gpu-admission-as-daemon-set-controller
subjects:
- kind: ServiceAccountname: gpu-admissionnamespace: kube-system
roleRef:kind: ClusterRolename: system:controller:daemon-set-controllerapiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:component: schedulertier: control-planeapp: gpu-admissionname: gpu-admissionnamespace: kube-system
spec:selector:matchLabels:component: schedulertier: control-planereplicas: 1template:metadata:labels:component: schedulertier: control-planeversion: secondspec:serviceAccountName: gpu-admissioncontainers:- image: thomassong/gpu-admission:47d56ae9name: gpu-admissionenv:- name: LOG_LEVELvalue: "4"ports:- containerPort: 3456dnsPolicy: ClusterFirstWithHostNethostNetwork: truepriority: 2000000000priorityClassName: system-cluster-critical
---
apiVersion: v1
kind: Service
metadata:name: gpu-admissionnamespace: kube-system
spec:ports:- port: 3456protocol: TCPtargetPort: 3456selector:app: gpu-admissiontype: ClusterIP

我为该deploy配置了一个service,之后就配置时就不用通过pod IP访问了(参考了https://cloud.tencent.com/developer/article/1685122):
为deploy再打一个标签

创建service

kubectl create -f gpu-admission.yaml

创建/etc/kubernetes/scheduler-policy-config.json,如下:

{"kind": "Policy","apiVersion": "v1","predicates": [{"name": "PodFitsHostPorts"},{"name": "PodFitsResources"},{"name": "NoDiskConflict"},{"name": "MatchNodeSelector"},{"name": "HostName"}],"priorities": [{"name": "BalancedResourceAllocation","weight": 1},{"name": "ServiceSpreadingPriority","weight": 1}],"extenders": [{"urlPrefix": "http://gpu-admission.kube-system:3456/scheduler","apiVersion": "v1beta1","filterVerb": "predicates","enableHttps": false,"nodeCacheCapable": false}],"hardPodAffinitySymmetricWeight": 10,"alwaysCheckAllPredicates": false
}

之后的过程与上述教程(https://www.jianshu.com/p/7d795bc226c7)完全一致。
创建/etc/kubernetes/scheduler-extender.yaml

apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
clientConnection:kubeconfig: "/etc/kubernetes/scheduler.conf"
algorithmSource:policy:file:path: "/etc/kubernetes/scheduler-policy-config.json"

修改/etc/kubernetes/manifests/kube-scheduler.yaml,修改完后kube-scheduler会自动重启,如下:

apiVersion: v1
kind: Pod
metadata:creationTimestamp: nulllabels:component: kube-schedulertier: control-planename: kube-schedulernamespace: kube-system
spec:containers:- command:- kube-scheduler- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf- --bind-address=0.0.0.0- --feature-gates=TTLAfterFinished=true,ExpandCSIVolumes=true,CSIStorageCapacity=true,RotateKubeletServerCertificate=true- --kubeconfig=/etc/kubernetes/scheduler.conf- --leader-elect=true- --port=0- --config=/etc/kubernetes/scheduler-extender.yamlimage: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler:v1.22.10imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 8httpGet:path: /healthzport: 10259scheme: HTTPSinitialDelaySeconds: 10periodSeconds: 10timeoutSeconds: 15name: kube-schedulerresources:requests:cpu: 100mstartupProbe:failureThreshold: 24httpGet:path: /healthzport: 10259scheme: HTTPSinitialDelaySeconds: 10periodSeconds: 10timeoutSeconds: 15volumeMounts:- mountPath: /etc/kubernetes/scheduler.confname: kubeconfigreadOnly: true- mountPath: /etc/localtimename: localtimereadOnly: true- mountPath: /etc/kubernetes/scheduler-extender.yamlname: extenderreadOnly: true- mountPath: /etc/kubernetes/scheduler-policy-config.jsonname: extender-policyreadOnly: truehostNetwork: truepriorityClassName: system-node-criticalsecurityContext:seccompProfile:type: RuntimeDefaultvolumes:- hostPath:path: /etc/kubernetes/scheduler.conftype: FileOrCreatename: kubeconfig- hostPath:path: /etc/localtimetype: Filename: localtime- hostPath:path: /etc/kubernetes/scheduler-extender.yamltype: FileOrCreatename: extender- hostPath:path: /etc/kubernetes/scheduler-policy-config.jsontype: FileOrCreatename: extender-policy
status: {}

该作者修改了3处地方,如下:
启动命令

挂载配置

卷配置

如果正常,修改完之后,调度器会自动重新创建:

如果没有创建,可以手动apply,然后就可以看到错误原因了。

查看结果

至此,集群中应该有如下几类Pod正常运行:

可以查看节点是否存在vGPU资源:

kubectl describe node <你的GPU节点>


可以自己部署个pod测试,如果成功的话,比如pytorch,应该会有如下输出:

(下图为当前分配了多少资源,与上图无关)

另外,本文安装完后容器内无法使用nvidia-smi,不过感觉不影响使用,如果需要该功能,可以参考https://github.com/tkestack/gpu-manager/issues/89

参考

腾讯开源vgpu方案gpu-manager安装教程
GPUManager虚拟化方案
超全超详细的安装nvidia显卡驱动教程
解决centos下安装显卡驱动出现的unable to find the kernel source tree等关于内核版本问题
如何关闭X Server,以避免在更新nVidia驱动程序时出错?
https://github.com/tkestack/gpu-manager/issues/138
https://github.com/tkestack/gpu-manager/issues/151
https://github.com/tkestack/gpu-manager/issues/89

k8s中GPU虚拟化工具gpu-manager的安装相关推荐

  1. linux 中多进程下载工具,[转载]Linux 下安装多线程下载工具 proz

    学习Linux,就不停的要到网上下载各种各样的源码包或rpm包,只在图形界面下操作,如果网络不是很流畅(比如我的adsl)通常很慢,而且不稳定,有时候会出现下载包不完整的情况,所以建议找个下载工具装上 ...

  2. 揭秘GPU虚拟化,算力隔离,和最新技术突破qGPU

    原文:https://www.toutiao.com/i6969464502689595935/ 〇.本文写作背景 大约 2 年前,在腾讯内网,笔者和很多同事讨论了 GPU 虚拟化的现状和问题.从那以 ...

  3. GPU虚拟化,算力隔离,和qGPU

    作者:jikesong,腾讯 CSIG 腾讯云异构计算研发副总监 〇.本文写作背景 大约 2 年前,在腾讯内网,笔者和很多同事讨论了 GPU 虚拟化的现状和问题.从那以后,出现了一些新的研究方向,并且 ...

  4. 【技术系列】浅谈GPU虚拟化技术(第一章)

    摘要: GPU深度好文系列,阿里云技术专家分享 第一章 GPU虚拟化发展史 GPU的虚拟化发展历程事实上与公有云市场和云计算应用场景的普及息息相关.如果在10年前谈起云计算,大部分人的反应是" ...

  5. 第一章  GPU虚拟化发展史

    第一章  GPU虚拟化发展史 GPU的虚拟化发展历程事实上与公有云市场和云计算应用场景的普及息息相关.如果在10年前谈起云计算,大部分人的反应是"不知所云".但是随着云计算场景的普 ...

  6. VMware中GPU虚拟化的三种模式(1)–vSGA

    VMware中GPU虚拟化的三种模式(1)–vSGA 或者说,三种虚拟化图形加速类型 虚拟共享图形加速 (vSGA , virtual Shared Graphics Acceleration) 虚拟 ...

  7. K8S中使用显卡GPU(N卡) —— 筑梦之路

    前些年做AI项目的时候经常用到显卡,大多数时候都是传统部署,对于资源的利用率并不高,而显卡也不便宜,K8S集群内调用显卡可以更加细致地进行显卡计算资源的分配,提高资源利用率. 之前记录和显卡相关的一些 ...

  8. GPU虚拟化云桌面应用中实际带宽

    云桌面应用设计中往往将带宽作为一个瓶颈来考虑.本人有幸参与了一个以GPU虚拟化为核心的云桌面实施项目,以兹分享. 项目中是采用GPU虚拟化的云桌面来运行以某企业的业务大数据可视化页面.页面是基于阿里大 ...

  9. GPU虚拟化现状及新技术方案XPU

    AI行业现状 随着我国"新基建"的启动,AI(Artificial Intelligence,人工智能)和5G.大数据中心.工业互联网等一起构成了新基建的7大核心突破领域.AI将渗 ...

最新文章

  1. 前端临床手扎——简单易用的fetch
  2. rust(71)-for、while循环表达式
  3. Vim的新一代补全插件:coc.nvim
  4. 经典问题之「分支预测」
  5. mysql group 条件_mysql 的group by 满足的规则要求:
  6. delstr函数python_python3全栈开发-内置函数补充,反射,元类,__str__,__del__,exec,type,__call__方法详解...
  7. 学习 | egg.js 中间件和插件
  8. [转]JDK动态代理
  9. goaheadlinux移植_goahead
  10. aee快递查询 在php_第48课 thinkphp5添加商品库
  11. WEB前端性能优化基本套路
  12. edius隐藏快捷键_Eduis快捷键,大家参考一下!【EDIUS】史上最全的EDIUS快捷键资料分享!...
  13. 灵魂电钢琴音源 Native Instruments George Duke Soul Treasures
  14. 发展恐怖故事–邮件炸弹
  15. Ionic4—JS扩展之ion-refresher下拉刷新
  16. 从 Factor Zoo 到 Factor War,实证资产定价走向何方?
  17. jq输出html 单引号引号转义符,jQuery.parseJSON由于JSON中的单引号转义而引发“无效JSON”错误...
  18. 天津海洋功能区划获批复 排海污水须100%达标-天津海洋功能区划-污水-达标率
  19. weblogic 打补丁
  20. C++中的名字空间和作用域

热门文章

  1. 手机re管理器支持android2.3的,RE管理器安卓版
  2. 根据输入的百分制成绩score,转换成相应的五分制成绩grade后输出。
  3. ESXi中虚拟机使用USB存储设备
  4. mud使用mysql_用linux构建仗剑江湖mud游戏服务器实战经历
  5. 计算机毕业设计asp.net的幼儿园接送信息管理系统(源码+系统+mysql数据库+Lw文档)
  6. 软件测试工程师自我介绍范文_软件测试面试自我介绍范文3篇
  7. 利用简单游戏项目教你如何用java如何画对象
  8. 鼠标停留触发事件,离开时也触发
  9. 五、三层交换机配置服务器
  10. c语言出租车计价编程,[C语言][PTA][2019Fall] 7-18 出租车计价 (15 point(s))