文章目录

  • 前言
  • 一、创建 k8s 集群
  • 二、启用 primary network
  • 三、启用 secondary network
    • k8s-rdma-shared-dev-plugin
    • Multus CNI
    • Secondary CNI
    • Multi-Network CRD
  • 四、启用 pod
  • 五、在 pod 中启动 RoCE 流量
  • 总结

前言

写给自己的入门篇。后续会在原理方面持续更新


一、创建 k8s 集群

k8s 集群的创建有多种方法,可以按照官方文档的说明来操作 https://kubernetes.io/docs/setup/production-environment/tools/
对于新手(比如我)来说,我认为需要结合 k8s 架构来理解集群创建的过程。

(图片来源于 https://www.redhat.com/en/topics/containers/kubernetes-architecture)

需要安装:

  1. container runtime: 用于运行 container 的服务,每个 node 都需要安装并启动
  2. kubectl: 用户 CLI,是集群资源管理、容器部署、调试时的主要工具
  3. kubelet: 运行在 node 上的服务,确保 pod 与 container 启动并运行 (需要关闭 swap),每个 node 都需要安装并启动
  4. kubeadm: 创建与管理集群

初始化 control-plane (master) 节点:

# kubeadm init

这个过程一般会遇到很多报错,Google 是最好的寻求解决方案的地方。初始化成功后,将会有如下输出:

Your Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configAlternatively, if you are the root user, you can run:export KUBECONFIG=/etc/kubernetes/admin.confYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:https://kubernetes.io/docs/concepts/cluster-administration/addons/Then you can join any number of worker nodes by running the following on each as root:kubeadm join 10.7.157.30:6443 --token 2rg4l1.n0rhvdp0uvxdrxjv \--discovery-token-ca-cert-hash sha256:fd7d661ec35868d036761e844597807a3d076daf3c8b71de6e1b55ee01e66a32

此时会发现,如下 Pods 已创建,除了 coredns 处于 Pending 状态,其余都处于 Running 状态:

# export KUBECONFIG=/etc/kubernetes/admin.conf
# kubectl get node -o wide
NAME          STATUS   ROLES           AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                      KERNEL-VERSION           CONTAINER-RUNTIME
node1         Ready    control-plane   2m50s   v1.24.0   10.7.157.30   <none>        Red Hat Enterprise Linux Server 7.7 (Maipo)   3.10.0-1062.el7.x86_64   containerd://1.6.4# kubectl get pods --all-namespaces
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
kube-system   coredns-6d4b75cb6d-752q4              0/1     Pending   0          35s
kube-system   coredns-6d4b75cb6d-7h2g5              0/1     Pending   0          35s
kube-system   etcd-node1                            1/1     Running   5          47s
kube-system   kube-apiserver-node1                  1/1     Running   4          48s
kube-system   kube-controller-manager-node1         1/1     Running   1          47s
kube-system   kube-proxy-px447                      1/1     Running   0          35s
kube-system   kube-scheduler-node1                  1/1     Running   4          48s

二、启用 primary network

先说 k8s 的网络模型。
对于 k8s 网络,核心理念是 - 每个 pod 都有唯一的 IP。 Pod 中所有 container 共享该 IP,并可以与其他 Pod 通信。
通常会在 kubeadm.config.yaml 中设置 pod subnet 作为 CIDR 块,即一系列 IP 地址,在此范围内分配 IP 给 pod:

#### in kubeadm-config.yaml ####
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.24.0
networking:podSubnet: 10.244.0.0/16

Pod 之间的通信,通常会结合管道对与以太网桥来实现:

cni0 本质上是 Linux 网桥,可以发送 ARP request 与解析 ARP response
eno1 作为 node 之间通信的网络接口,启用了 IP 转发,可以依据 Route Table 将收到的数据包转发给 cni0

为了启用 k8s primary network,需要安装 primary network CNI

有多种选择,如 flannel, Calico, WeaveNet 等。 此例选取 flannel,需要设置 flannel 使用的网络接口:

# yum install -y flannel
# vi /etc/sysconfig/flanneld  ## add additional options:
FLANNEL_OPTIONS="-iface=eno1"
# cp /usr/bin/flanneld /opt/bin
# kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

此时 coredns 将会变为 Running 状态

三、启用 secondary network

Primary network 常用于 Pod 之间的基本通信。通常需要为 Pod 提供 secondary network,作为高性能网络供应用程序使用:

需要部署:

  1. k8s-rdma-shared-dev-plugin
  2. Multus CNI
  3. Secondary CNI
  4. Multi-Network CRD

其中 Multus CNI 可以看作一个 meta plugin,与其他 CNI plugin 配合使用,以实现多网络接口的功能:

k8s-rdma-shared-dev-plugin

创建 configmap

# cat k8s-rdma-shared-dev-plugin-config-map.yaml
apiVersion: v1
kind: ConfigMap
metadata:name: rdma-devicesnamespace: kube-system
data:config.json: |{"periodicUpdateInterval": 300,"configList": [{"resourceName": "cx5_bond_shared_devices_a","rdmaHcaMax": 1000,"selectors": {"vendors": ["15b3"],"deviceIDs": ["1017"]}},{"resourceName": "cx6dx_shared_devices_b","rdmaHcaMax": 500,"selectors": {"vendors": ["15b3"],"deviceIDs": ["101d"]}]}# kubectl create -f k8s-rdma-shared-dev-plugin-config-map.yaml
configmap/rdma-devices created

创建 k8s-rdma-shared-dev-plugin daemonset

# kubectl create -f https://raw.githubusercontent.com/Mellanox/k8s-rdma-shared-dev-plugin/master/images/k8s-rdma-shared-dev-plugin-ds.yaml
daemonset.apps/rdma-shared-dp-ds created

Multus CNI

# kubectl create -f https://raw.githubusercontent.com/intel/multus-cni/master/images/multus-daemonset.yml
ustomresourcedefinition.apiextensions.k8s.io/network-attachment-definitions.k8s.cni.cncf.io created
clusterrole.rbac.authorization.k8s.io/multus created
clusterrolebinding.rbac.authorization.k8s.io/multus created
serviceaccount/multus created
configmap/multus-cni-config created
daemonset.apps/kube-multus-ds-amd64 created
daemonset.apps/kube-multus-ds-ppc64le created

Secondary CNI

# mkdir -p /opt/cni/bin
# wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz
# tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.1.1.tgz

查看 /opt/cni/bin,可以看到已有多个 cni 插件:

# ls /opt/cni/bin
bandwidth  bridge  dhcp  firewall  host-device  host-local  ipvlan  loopback  macvlan  portmap  ptp  sbr  static  tuning  vlan  vrf

此例将使用 macvlan CNI

Multi-Network CRD

为 macvlan CNI 创建两个 network attachment,注意 IP 地址范围与 primary network 的 IP 地址范围不可有重合:

# cat macvlan_cx6dx.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:name: macvlan-cx6dx-conf
spec:config: '{"cniVersion": "0.3.1","type": "macvlan","master": "ens2f0","ipam": {"type": "host-local","subnet": "10.56.217.0/24","rangeStart": "10.56.217.171","rangeEnd": "10.56.217.181","routes": [{ "dst": "0.0.0.0/0" }],"gateway": "10.56.217.1"}}'# cat macvlan_cx5_bond.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:name: macvlan-cx5-bond-conf
spec:config: '{"cniVersion": "0.3.1","type": "macvlan","master": "bond0","ipam": {"type": "host-local","subnet": "10.56.217.0/24","rangeStart": "10.56.217.71","rangeEnd": "10.56.217.81","routes": [{ "dst": "0.0.0.0/0" }],"gateway": "10.56.217.1"}}'# kubectl create -f macvlan_cx6dx.yaml
networkattachmentdefinition.k8s.cni.cncf.io/macvlan-cx6dx-conf created# kubectl create -f macvlan_cx5_bond.yaml
networkattachmentdefinition.k8s.cni.cncf.io/macvlan-cx5-bond-conf created

四、启用 pod

本例仅用到 macvlan-cx5-bond-conf,若需要使用 macvlan-cx6dx-conf,可在 test-xxx-pod.yaml 中指定对应的 annotation 与 resources:

# cat test-cx5-bond-pod1.yaml
apiVersion: v1
kind: Pod
metadata:name: mofed-test-cx5-bond-pod1annotations:k8s.v1.cni.cncf.io/networks: default/macvlan-cx5-bond-conf
spec:restartPolicy: OnFailurecontainers:- image: mellanox/rping-testname: mofed-test-ctrsecurityContext:capabilities:add: [ "IPC_LOCK" ]resources:limits:rdma/cx5_bond_shared_devices_a: 1requests:rdma/cx5_bond_shared_devices_a: 1command:- sh- -c- |ls -l /dev/infiniband /sys/class/infiniband /sys/class/netsleep 1000000# kubectl create -f test-cx5-bond-pod1.yaml
pod/mofed-test-cx5-bond-pod1 created# cat test-cx5-bond-pod2.yaml
apiVersion: v1
kind: Pod
metadata:name: mofed-test-cx5-bond-pod2annotations:k8s.v1.cni.cncf.io/networks: default/macvlan-cx5-bond-conf
spec:restartPolicy: OnFailurecontainers:- image: mellanox/rping-testname: mofed-test-ctrsecurityContext:capabilities:add: [ "IPC_LOCK" ]resources:limits:rdma/cx5_bond_shared_devices_a: 1requests:rdma/cx5_bond_shared_devices_a: 1command:- sh- -c- |ls -l /dev/infiniband /sys/class/infiniband /sys/class/netsleep 1000000# kubectl create -f test-cx5-bond-pod2.yaml
pod/mofed-test-cx5-bond-pod2 created

五、在 pod 中启动 RoCE 流量

此时可在 pod 中使用 secondary network (eth1) 启动 RoCE 流量:

# kubectl get pods -A
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
default       mofed-test-cx5-bond-pod1              1/1     Running   0          3m41s
default       mofed-test-cx5-bond-pod2              1/1     Running   0          32s
default       mofed-test-macvlan-pod                1/1     Running   0          4d9h
kube-system   coredns-6d4b75cb6d-752q4              1/1     Running   0          5d3h
kube-system   coredns-6d4b75cb6d-7h2g5              1/1     Running   0          5d3h
kube-system   etcd-node1                            1/1     Running   5          5d3h
kube-system   kube-apiserver-node1                  1/1     Running   4          5d3h
kube-system   kube-controller-manager-node1         1/1     Running   1          5d3h
kube-system   kube-flannel-ds-xwlr2                 1/1     Running   0          5d3h
kube-system   kube-multus-ds-kqhqn                  1/1     Running   0          5d2h
kube-system   kube-proxy-px447                      1/1     Running   0          5d3h
kube-system   kube-scheduler-node1                  1/1     Running   4          5d3h
kube-system   rdma-shared-dp-ds-vps6x               1/1     Running   0          21m

mofed-test-cx5-bond-pod1

# kubectl exec -it mofed-test-cx5-bond-pod1 bash
[root@mofed-test-cx5-bond-pod1 /]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450inet 10.244.0.211  netmask 255.255.255.0  broadcast 10.244.0.255inet6 fe80::e45d:c4ff:fe4c:f3b3  prefixlen 64  scopeid 0x20<link>ether e6:5d:c4:4c:f3:b3  txqueuelen 0  (Ethernet)RX packets 12  bytes 1016 (1016.0 B)RX errors 0  dropped 0  overruns 0  frame 0TX packets 8  bytes 612 (612.0 B)TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536inet 127.0.0.1  netmask 255.0.0.0inet6 ::1  prefixlen 128  scopeid 0x10<host>loop  txqueuelen 1000  (Local Loopback)RX packets 0  bytes 0 (0.0 B)RX errors 0  dropped 0  overruns 0  frame 0TX packets 0  bytes 0 (0.0 B)TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0net1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500inet 10.56.217.71  netmask 255.255.255.0  broadcast 10.56.217.255ether fa:a4:6e:24:3e:ba  txqueuelen 0  (Ethernet)RX packets 0  bytes 0 (0.0 B)RX errors 0  dropped 0  overruns 0  frame 0TX packets 0  bytes 0 (0.0 B)TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0[root@mofed-test-cx5-bond-pod1 /]# ib_write_bw -d mlx5_bond_0 -F --report_gbits
************************************
* Waiting for client to connect... *
************************************

mofed-test-cx5-bond-pod1

# kubectl exec -it mofed-test-cx5-bond-pod2 bash
[root@mofed-test-cx5-bond-pod2 /]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450inet 10.244.0.212  netmask 255.255.255.0  broadcast 10.244.0.255inet6 fe80::20d6:7eff:fec0:4e39  prefixlen 64  scopeid 0x20<link>ether 22:d6:7e:c0:4e:39  txqueuelen 0  (Ethernet)RX packets 12  bytes 1016 (1016.0 B)RX errors 0  dropped 0  overruns 0  frame 0TX packets 8  bytes 612 (612.0 B)TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536inet 127.0.0.1  netmask 255.0.0.0inet6 ::1  prefixlen 128  scopeid 0x10<host>loop  txqueuelen 1000  (Local Loopback)RX packets 0  bytes 0 (0.0 B)RX errors 0  dropped 0  overruns 0  frame 0TX packets 0  bytes 0 (0.0 B)TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0net1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500inet 10.56.217.72  netmask 255.255.255.0  broadcast 10.56.217.255ether a6:46:b9:94:b0:31  txqueuelen 0  (Ethernet)RX packets 0  bytes 0 (0.0 B)RX errors 0  dropped 0  overruns 0  frame 0TX packets 0  bytes 0 (0.0 B)TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0[root@mofed-test-cx5-bond-pod2 /]# ib_write_bw -d mlx5_bond_0 -F --report_gbits 10.56.217.71
---------------------------------------------------------------------------------------RDMA_Write BW TestDual-port       : OFF          Device         : mlx5_bond_0Number of qps   : 1            Transport type : IBConnection type : RC           Using SRQ      : OFFTX depth        : 128CQ Moderation   : 100Mtu             : 1024[B]Link type       : EthernetGID index       : 4Max inline data : 0[B]rdma_cm QPs     : OFFData ex. method : Ethernet
---------------------------------------------------------------------------------------local address: LID 0000 QPN 0x117c PSN 0xbfdcaf RKey 0x00511b VAddr 0x007fdf469fd000GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:56:217:72remote address: LID 0000 QPN 0x117d PSN 0x75cbaa RKey 0x004407 VAddr 0x007f65e74dc000GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:56:217:71
---------------------------------------------------------------------------------------#bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]65536      5000             82.62              82.55              0.157445
---------------------------------------------------------------------------------------

总结

TBD

k8s RoCE 部署: k8s-rdma-shared-dev-plugin + macvlan cni相关推荐

  1. K8S实战集训第一课 Ansible自动化部署k8s、弹性伸缩、Helm包管理、k8s网络模型介绍

    Ansible自动化部署K8S集群 一.Ansible自动化部署K8S集群 1.1 Ansible介绍 Ansible是一种IT自动化工具.它可以配置系统,部署软件以及协调更高级的IT任务,例如持续部 ...

  2. k8s入门:裸机部署 k8s 集群

    系列文章 第一章:✨ k8s入门:裸机部署 k8s 集群 第二章:✨ k8s入门:部署应用到 k8s 集群 第三章:✨ k8s入门:service 简单使用 第四章:✨ k8s入门:StatefulS ...

  3. kubeadm方式部署k8s集群(1.18版本,亲测可用)

    文章目录 一.系统环境 1.设置系统主机名以及Host文件 2.关闭防火墙 3.关闭系统Swap 二.安装Docker 1.安装docker 2.所需镜像 三.安装kubelet.kubeadm 和 ...

  4. centos7 如何安装部署k8s_五步教你如何使用k8s快速部署ES

    前言 今天小编打算用五步教大家如何使用k8s快速部署ES,有兴趣的小伙伴可以了解一下~ 由于是使用本地存储,所以需要先创建pv 1.创建存储类 local-elasticsearch.yaml kin ...

  5. 在 k8s 中部署 Prometheus 和 Grafana

    部署 Prometheus 和 Grafana 到 k8s Intro 上次我们主要分享了 asp.net core 集成 prometheus,以及简单的 prometheus 使用,在实际在 k8 ...

  6. 持续集成之应用k8s自动部署

    持续集成之应用k8s自动部署 Intro 上次我们提到了docker容器化及自动化部署[1],这仅仅适合个人项目或者开发环境部署,如果要部署到生产环境,必然就需要考虑很多因素,比如访问量大了如何调整部 ...

  7. k8s ubuntu cni_手把手教你使用RKE快速部署K8S集群并部署Rancher HA

    作者:杨紫熹 原文链接: https://fs.tn/post/PmaL-uIiQ/ RKE全称为Rancher Kubernetes Engine,是一款经过CNCF认证的开源Kubernetes发 ...

  8. jar k8s 自己的 部署_微服务架构 - 离线部署k8s平台并部署测试实例

    一般在公司部署或者真实环境部署k8s平台,很有可能是内网环境,也即意味着是无法连接互联网的环境,这时就需要离线部署k8s平台.在此整理离线部署k8s的步骤,分享给大家,有什么不足之处,欢迎指正. 1. ...

  9. kubernetes(K8S)容器部署,重新启动后,node节点提示notready无法正常工作。

    打开服务器,查看容器部署k8s组件节点是否正常. [root@k8s-master01 ~]# kubectl get pod -n kube-system NAME READY STATUS RES ...

最新文章

  1. 【系列文章】面向自动驾驶的三维点云处理与学习(2)
  2. mysql insert 数据_MySQL-插入数据(INSERT)
  3. C 语言 cgi 程序简单总结
  4. boost::hana::cartesian_product用法的测试程序
  5. mysql group by 范围_MySQL 按照范围/等级 进行Group By
  6. 利用对象存储多种方式 保障OSS数据安全
  7. python如何用matplotlib绘图_Python绘图的多图控制(使用Matplotlib),python,利用,matplotlib...
  8. rust笔记2 OwnerShip基础概念
  9. 软件定义网络(Software Defined Network )
  10. Java数据结构(1)---顺序表
  11. TSC工业型条码打印机的价格的影响因素有哪些呢?
  12. 高分辨率扫描出来的图片有摩尔纹_【艺术与设计】 摩尔纹的设计
  13. python 多行注释_python 多行注释
  14. 【第三十一期】360后台开发实习面经 - 两轮技术面
  15. linux ubi 分区,linux UBI文件系统简介
  16. Android 在PreferenceActivity 中移除一个Preference
  17. 笔记-微信订阅号开发
  18. 怎么接受对方tp_恋爱关系中怎么和回避型有效沟通
  19. 个人购买云服务器问题 ?
  20. 怎么把备忘录中的视频导到手机相册里

热门文章

  1. 超实用质量管理奖励制度,告别只罚不奖
  2. CPU分析系列--sysstat(mpstat+pidstat)分析系统CPU和I/O负载
  3. android nds模拟器窗口,适用于Android v34的NDS模拟器
  4. [Wi-Fi抓包篇]2. Omnipeek——抓包分析方法
  5. vue + mui-player视频播放器
  6. 李开复万字长文科普人工智能:AI是什么 将带我们去哪儿?
  7. vscode跳转不到函数定义
  8. A+CLUB活动预告 | 2023年5月
  9. 找工作选国企还是选私企?别说外行话,内部人把这8点区别说透了
  10. 那一年,我们在巴塞罗那找到的「ONES 图腾」