Tungsten Fabric SDN — 与 Kubernetes 的集成部署(CN)
目录
文章目录
- 目录
- 部署架构
- 软件版本
- 部署 Kubernetes & TF
- 基础环境设置
- 创建 Deployment instance
- 执行 Playbooks
- 环境检查
- 配置记录
- Control Plane
- Worker Node
- 环境卸载
- 部署 SDNGW
- Troubleshooting
- 问题 1:K8s repo gpg keys 失败
- 问题 2:部署 CNI 后 CoreDNS Pods 起不来
- 问题 3:跨 Nodes 东西向流量不通
部署架构
软件版本
- CentOS 7.9 2009:CentOS-7-x86_64-Minimal-2003.iso
- Kubernetes v1.14.8
- Tungsten fabric R21.3
部署 Kubernetes & TF
基础环境设置
使用 CentOS-7-x86_64-Minimal-2009.iso 镜像,安装操作系统。
配置 Underlay 网络。
- Mgmt Net 互通
- Control & Data Net 互通
配置各个节点的操作系统环境。
# CentOS 7.X (kernel >= 3.10.0-693.17.1)
$ uname -r
3.10.0-1127.el7.x86_64# Python2.7
$ python -V
Python 2.7.5# 分别设置各个节点的 hosts 解析
$ cat >> /etc/hosts << EOF
172.27.10.72 deployer
172.27.10.73 master
172.27.10.74 worker01
EOF# 分别关闭各个节点的 FW
$ sudo systemctl disable firewalld && systemctl stop firewalld# 分别关闭各个节点的 SELinux
$ sudo setenforce 0
$ sudo sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
- 配置 Deployer 节点
# 安装 PIP
$ curl "https://bootstrap.pypa.io/pip/2.7/get-pip.py" -o "get-pip.py"
$ python get-pip.py# 安装 Ansible (==2.7.18)
$ pip install ansible==2.7.18# 下载 tf-ansible-deployer 代码仓库
$ yum install git -y
$ git clone -b R21.3 http://github.com/tungstenfabric/tf-ansible-deployer# 搭建本地 NTP 服务器
$ yum -y install ntp ntpdate
$ vi /etc/ntp.conf
...
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server 127.127.1.0
fudge 127.127.1.0 stratum 10$ systemctl enable ntpd && systemctl restart ntpd && systemctl status ntpd
$ ntpq -p# Deployer 作为 Master 和 Worker 的 SNAT GW
$ iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -o ens192 -j SNAT --to 172.27.10.72
创建 Deployment instance
$ tf-ansible-deployer/config/instances.yamlglobal_configuration:CONTAINER_REGISTRY: tungstenfabricREGISTRY_PRIVATE_INSECURE: Trueprovider_config:bms:ssh_user: rootssh_pwd: 1qaz@WSXntpserver: deployerdomainsuffix: cluster02instances:master:provider: bmsip: 172.27.10.73roles:config:config_database:control:webui:analytics:analytics_database:analytics_alarm:analytics_snmp:k8s_master:kubemanager:worker01:provider: bmsip: 172.27.10.74roles:vrouter:PHYSICAL_INTERFACE: ens224VROUTER_GATEWAY: 192.168.1.128k8s_node:contrail_configuration:KUBERNETES_CLUSTER_PROJECT: {}KUBERNETES_API_NODES: 192.168.1.73KUBERNETES_API_SERVER: 192.168.1.73KUBEMANAGER_NODES: 192.168.1.73KUBERNETES_POD_SUBNETS: 10.0.1.0/24KUBERNETES_SERVICE_SUBNETS: 10.0.2.0/24CONTRAIL_VERSION: R21.3-latestCONTRAIL_CONTAINER_TAG: R21.3-latestCONTROLLER_NODES: 172.27.10.73CONTROL_NODES: 192.168.1.73ENCAP_PRIORITY: VXLAN,MPLSoUDP,MPLSoGRECONFIG_NODEMGR__DEFAULTS__minimum_diskGB: 2DATABASE_NODEMGR__DEFAULTS__minimum_diskGB: 2CONFIG_DATABASE_NODEMGR__DEFAULTS__minimum_diskGB: 2ANALYTICS_DATA_TTL: 1ANALYTICS_STATISTICS_TTL: 1ANALYTICS_FLOW_TTL: 1DEVICE_MANAGER__DEFAULTS__push_mode: 0LOG_LEVEL: SYS_DEBUG
执行 Playbooks
$ yum install sshpass -y
$ cd /root/tf-ansible-deployeransible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/configure_instances.yml
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_k8s.yml
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_contrail.yml
环境检查
登陆 WebGUI(初始密码为 contrail123):https://172.27.10.73:8143
查看 K8s 集群状态。
$ kubectl get node -A -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master NotReady master 6h12m v1.14.8 172.27.10.73 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://18.3.1
worker01 Ready <none> 6h11m v1.14.8 192.168.1.74 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://18.3.1$ kubectl get namespace -A
NAME STATUS AGE
contrail Active 20h
default Active 20h
kube-node-lease Active 20h
kube-public Active 20h
kube-system Active 20h$ kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-6dcc67dcbc-4wklk 1/1 Running 0 20h
kube-system pod/coredns-6dcc67dcbc-m4lc4 1/1 Running 0 20h
kube-system pod/etcd-master 1/1 Running 0 20h
kube-system pod/kube-apiserver-master 1/1 Running 0 20h
kube-system pod/kube-controller-manager-master 1/1 Running 0 20h
kube-system pod/kube-proxy-fg2b6 1/1 Running 0 20h
kube-system pod/kube-proxy-rmp7j 1/1 Running 0 20h
kube-system pod/kube-scheduler-master 1/1 Running 0 20h
kube-system pod/kubernetes-dashboard-7d7d775b7b-gp7cc 1/1 Running 0 20hNAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.0.2.1 <none> 443/TCP 20h
kube-system service/kube-dns ClusterIP 10.0.2.10 <none> 53/UDP,53/TCP,9153/TCP 20h
kube-system service/kubernetes-dashboard ClusterIP 10.0.2.140 <none> 443/TCP 20hNAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 <none> 20hNAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 20h
kube-system deployment.apps/kubernetes-dashboard 1/1 1 1 20hNAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-6dcc67dcbc 2 2 2 20h
kube-system replicaset.apps/kubernetes-dashboard-7d7d775b7b 1 1 1 20h
配置记录
Control Plane
- kube-apiserver:其中,
--advertise-address
选项为 “对外发布地址”。向 K8s Cluster 的所有 Members 发布 kube-apiserver 的 Listen IP 地址,该地址必须能够被这些 members 访问到。如果为空,则使用--bind-address
。如果--bind-address
仍未指定,那么则使用 Host 的默认路由接口。
kube-apiserver--advertise-address=192.168.1.73--allow-privileged=true--authorization-mode=Node,RBAC--client-ca-file=/etc/kubernetes/pki/ca.crt--enable-admission-plugins=NodeRestriction--enable-bootstrap-token-auth=true--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key--etcd-servers=https://127.0.0.1:2379--insecure-port=0--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key--requestheader-allowed-names=front-proxy-client--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt--requestheader-extra-headers-prefix=X-Remote-Extra---requestheader-group-headers=X-Remote-Group--requestheader-username-headers=X-Remote-User--secure-port=6443--service-account-key-file=/etc/kubernetes/pki/sa.pub--service-cluster-ip-range=10.0.2.0/24--tls-cert-file=/etc/kubernetes/pki/apiserver.crt--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- kube-controller-manager
kube-controller-manager--allocate-node-cidrs=true--authentication-kubeconfig=/etc/kubernetes/controller-manager.conf--authorization-kubeconfig=/etc/kubernetes/controller-manager.conf--bind-address=127.0.0.1--client-ca-file=/etc/kubernetes/pki/ca.crt--cluster-cidr=10.0.1.0/24--cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt--cluster-signing-key-file=/etc/kubernetes/pki/ca.key--controllers=*,bootstrapsigner,tokencleaner--kubeconfig=/etc/kubernetes/controller-manager.conf--leader-elect=true--node-cidr-mask-size=24--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt--root-ca-file=/etc/kubernetes/pki/ca.crt--service-account-private-key-file=/etc/kubernetes/pki/sa.key--use-service-account-credentials=true
- kube-manager
$ cat /etc/contrail/contrail-kubernetes.conf[DEFAULTS]
host_ip=192.168.1.73
orchestrator=kubernetes
...[KUBERNETES]
kubernetes_api_server=192.168.1.73
kubernetes_api_port=8080
kubernetes_api_secure_port=6443
...
- contrail config api
$ cat /etc/contrail/contrail-api-0.conf[DEFAULTS]
listen_ip_addr=172.27.10.73
listen_port=8082
http_server_port=8084
http_server_ip=0.0.0.0
- contrail control node
$ cat /etc/contrail/contrail-control.conf[DEFAULT]
hostip=192.168.1.73
Worker Node
- kubelet
$ cat /var/lib/kubelet/config.yaml
address: 0.0.0.0
- cni
$ cat /etc/cni/net.d/10-contrail.conf
{"cniVersion": "0.3.1","contrail" : {"cluster-name" : "k8s","meta-plugin" : "multus","vrouter-ip" : "127.0.0.1","vrouter-port" : 9091,"config-dir" : "/var/lib/contrail/ports/vm","poll-timeout" : 5,"poll-retries" : 15,"log-file" : "/var/log/contrail/cni/opencontrail.log","log-level" : "4"},"name": "contrail-k8s-cni","type": "contrail-k8s-cni"
}
- vrouter-agent
$ cat /etc/contrail/contrail-vrouter-agent.conf[CONTROL-NODE]
servers=192.168.1.73:5269[DEFAULT]
http_server_ip=0.0.0.0
collectors=172.27.10.73:8086
log_file=/var/log/contrail/vrouter-agent/contrail-vrouter-agent.log
physical_interface_mac=00:50:56:88:43:77
...[NETWORKS]
control_network_ip=192.168.1.74[DNS]
servers=192.168.1.73:53[VIRTUAL-HOST-INTERFACE]
name=vhost0
ip=192.168.1.74/24
compute_node_address=192.168.1.74
physical_interface=ens224
gateway=192.168.1.128
环境卸载
# 卸载 Contrail
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/contrail_destroy.yml
部署 SDNGW
- sdngw.conf
---
HOST:identifier : sdngwhost-management-interface : enp6s0routing-engine-image : "/root/vmx/images/junos-vmx-x86-64-22.1R1.10.qcow2"routing-engine-hdd : "/root/vmx/images/vmxhdd.img"forwarding-engine-image : "/root/vmx/images/vFPC-20220223.img"---
BRIDGES:- type : externalname : br-mgmt-sdngw---
CONTROL_PLANE:vcpus : 4memory-mb : 8192console_port: 8601interfaces :- type : staticipaddr : 172.27.10.129macaddr : "0A:00:DD:C0:DE:0E"---
FORWARDING_PLANE:vcpus : 4memory-mb : 8192console_port: 8602device-type : virtiointerfaces :- type : staticipaddr : 172.27.10.130macaddr : "0A:00:DD:C0:DE:10"---
JUNOS_DEVICES:- interface : ge-0/0/0mac-address : "02:06:0A:0E:FF:F0"description : "ge-0/0/0 interface"- interface : ge-0/0/1mac-address : "02:06:0A:0E:FF:F1"description : "ge-0/0/1 interface"
- 执行安装:
# 编译好部署配置文件之后,开始执行 VCP/VFP 安装脚本
$ cd vmx
$ nohup ./vmx.sh -lv --install --cfg /root/sdngw.conf &
# 观察日志
$ tailf nohup.out
- 初始化配置:
# 刚启动 VCP 时,需要等待 VCP 启动完成。
$ ./vmx.sh --console vcp sdngw# root 登陆,缺省没有密码。
login: root# 进入 CLI 模式。
$ cli# 确认 VCP 与 VFP 之间的连通性状态(需要等待一段时间,Online 即连通)。
root> show chassis fpcTemp CPU Utilization (%) CPU Utilization (%) Memory Utilization (%)
Slot State (C) Total Interrupt 1min 5min 15min DRAM (MB) Heap Buffer0 Online Testing 8 0 2 0 0 511 31 0
... # 进入配置模式。
root> configure# 关闭自动更新。
root# delete chassis auto-image-upgrade# 设置 VCP 的主机名。
root# set system host-name sdngw# 设置允许 root SSH VCP 并设定 root 的密码。
root# set system root-authentication plain-text-password
root# set system services ssh root-login allow# 保存所修改的配置。
root# commit
- sdngw-junosdev.conf
interfaces :- link_name : vmx_link1mtu : 1500endpoint_1 :- type : junos_devvm_name : sdngwdev_name : ge-0/0/0endpoint_2 :- type : bridge_devdev_name : br-tenant-net- link_name : vmx_link2mtu : 1500endpoint_1 :- type : junos_devvm_name : sdngwdev_name : ge-0/0/1endpoint_2 :- type : bridge_devdev_name : br-external-net
- 执行自动化脚本
# NOTE:脚本执行的是临时配置,每次重启 VM instance 都需要执行一次,或手动的进行静态配置。
$ cd vmx
$ ./vmx.sh --bind-dev --cfg /root/sdngw-junosdev.conf# 手动将 Bridge 绑定到 pNIC。
$ brctl addif br-tenant-net enp11s0
$ brctl addif br-external-net enp26s0f0$ brctl show
bridge name bridge id STP enabled interfaces
br-external-net 8000.90e2ba8a532c no enp26s0f0ge-0.0.1-sdngw
br-int-sdngw 8000.525400571469 yes br-int-sngw-nicvcp-int-sdngwvfp-int-sdngw
br-mgmt-sdngw 8000.40f2e9352cf8 yes br-mgmt-ngw-nicenp6s0vcp-ext-sdngwvfp-ext-sdngw
br-tenant-net 8000.40f2e9352cf9 no enp11s0ge-0.0.0-sdngw
virbr0 8000.525400addfe1 yes virbr0-nic
- 配置接口的 IP 地址。
# 为 ge interface 配置 IP 地址。
$ ssh root@172.27.10.129# 登陆并进入 CLI 模式。
$ cliroot> configure
root# delete interfaces fxp0 unit 0 family inet dhcp
root# set interfaces fxp0 unit 0 family inet address 172.27.10.129/24
root# set interfaces ge-0/0/0 unit 0 family inet address 192.168.1.128/24
root# set interfaces ge-0/0/1 unit 0 family inet address 172.37.10.128/24root# commit
root# exit# 查看接口 IP 地址
root@sdngw> show interfaces terse | grep fxp0
fxp0 up up
fxp0.0 up up inet 172.27.10.129/24root@sdngw> show interfaces terse | grep ge-0/0/0
ge-0/0/0 up up
ge-0/0/0.0 up up inet 192.168.1.128/24root@sdngw> show interfaces terse | grep ge-0/0/1
ge-0/0/1 up up
ge-0/0/1.0 up up inet 172.37.10.128/24
- 测试:
- master node 可以 ping 通 fxp0、ge-0/0/0。
- worker node 可以 ping 通 ge-0/0/0。
- External GW 可以 ping 通 ge-0/0/1。
Troubleshooting
问题 1:K8s repo gpg keys 失败
TASK [k8s : make cache to import gpg keys] ***************************************************************************************************************************************************************************************************[WARNING]: Consider using the yum module rather than running 'yum'. If you need to use command because yum is insufficient you can add 'warn: false' to this command task or set 'command_warnings=False' in ansible.cfg to get rid of this
message.fatal: [172.27.10.74]: FAILED! => {"changed": true, "cmd": ["yum", "-q", "makecache", "-y", "--disablerepo=*", "--enablerepo=Kubernetes"], "delta": "0:00:00.943399", "end": "2022-06-13 12:14:42.859957", "msg": "non-zero return code", "rc": 1, "start": "2022-06-13 12:14:41.916558", "stderr": "https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno 14] curl#60 - \"Peer's Certificate has expired.\"\n正在尝试其它镜像。\nIt was impossible to connect to the CentOS servers.\nThis could mean a connectivity issue in your environment, such as the requirement to configure a proxy,\nor a transparent proxy that tampers with TLS security, or an incorrect system clock.\nYou can try to solve this issue by using the instructions on https://wiki.centos.org/yum-errors\nIf above article doesn't help to resolve this issue please use https://bugs.centos.org/.\n\n\n\n One of the configured repositories failed (k8s repo),\n and yum doesn't have enough cached data to continue. At this point the only\n safe thing yum can do is fail. There are a few ways to work \"fix\" this:\n\n 1. Contact the upstream for the repository and get them to fix the problem.\n\n 2. Reconfigure the baseurl/etc. for the repository, to point to a working\n upstream. This is most often useful if you are using a newer\n distribution release than is supported by the repository (and the\n packages for the previous distribution release still work).\n\n 3. Run the command with the repository temporarily disabled\n yum --disablerepo=Kubernetes ...\n\n 4. Disable the repository permanently, so yum won't use it by default. Yum\n will then just ignore the repository until you permanently enable it\n again or use --enablerepo for temporary usage:\n\n yum-config-manager --disable Kubernetes\n or\n subscription-manager repos --disable=Kubernetes\n\n 5. Configure the failing repository to be skipped, if it is unavailable.\n Note that yum will try to contact the repo. when it runs most commands,\n so will have to try and fail each time (and thus. yum will be be much\n slower). If it is a very temporary problem though, this is often a nice\n compromise:\n\n yum-config-manager --save --setopt=Kubernetes.skip_if_unavailable=true\n\nfailure: repodata/repomd.xml from Kubernetes: [Errno 256] No more mirrors to try.\nhttps://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno 14] curl#60 - \"Peer's Certificate has expired.\"", "stderr_lines": ["https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno 14] curl#60 - \"Peer's Certificate has expired.\"", "正在尝试其它镜像。", "It was impossible to connect to the CentOS servers.", "This could mean a connectivity issue in your environment, such as the requirement to configure a proxy,", "or a transparent proxy that tampers with TLS security, or an incorrect system clock.", "You can try to solve this issue by using the instructions on https://wiki.centos.org/yum-errors", "If above article doesn't help to resolve this issue please use https://bugs.centos.org/.", "", "", "", " One of the configured repositories failed (k8s repo),", " and yum doesn't have enough cached data to continue. At this point the only", " safe thing yum can do is fail. There are a few ways to work \"fix\" this:", "", " 1. Contact the upstream for the repository and get them to fix the problem.", "", " 2. Reconfigure the baseurl/etc. for the repository, to point to a working", " upstream. This is most often useful if you are using a newer", " distribution release than is supported by the repository (and the", " packages for the previous distribution release still work).", "", " 3. Run the command with the repository temporarily disabled", " yum --disablerepo=Kubernetes ...", "", " 4. Disable the repository permanently, so yum won't use it by default. Yum", " will then just ignore the repository until you permanently enable it", " again or use --enablerepo for temporary usage:", "", " yum-config-manager --disable Kubernetes", " or", " subscription-manager repos --disable=Kubernetes", "", " 5. Configure the failing repository to be skipped, if it is unavailable.", " Note that yum will try to contact the repo. when it runs most commands,", " so will have to try and fail each time (and thus. yum will be be much", " slower). If it is a very temporary problem though, this is often a nice", " compromise:", "", " yum-config-manager --save --setopt=Kubernetes.skip_if_unavailable=true", "", "failure: repodata/repomd.xml from Kubernetes: [Errno 256] No more mirrors to try.", "https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno 14] curl#60 - \"Peer's Certificate has expired.\""], "stdout": "", "stdout_lines": []}$ yum -q makecache -y --disablerepo=* --enablerepo=Kubernetes
https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno 14] curl#60 - "Peer's Certificate has expired."
正在尝试其它镜像。
It was impossible to connect to the CentOS servers.
This could mean a connectivity issue in your environment, such as the requirement to configure a proxy,
or a transparent proxy that tampers with TLS security, or an incorrect system clock.
You can try to solve this issue by using the instructions on https://wiki.centos.org/yum-errors
If above article doesn't help to resolve this issue please use https://bugs.centos.org/.One of the configured repositories failed (k8s repo),and yum doesn't have enough cached data to continue. At this point the onlysafe thing yum can do is fail. There are a few ways to work "fix" this:1. Contact the upstream for the repository and get them to fix the problem.2. Reconfigure the baseurl/etc. for the repository, to point to a workingupstream. This is most often useful if you are using a newerdistribution release than is supported by the repository (and thepackages for the previous distribution release still work).3. Run the command with the repository temporarily disabledyum --disablerepo=Kubernetes ...4. Disable the repository permanently, so yum won't use it by default. Yumwill then just ignore the repository until you permanently enable itagain or use --enablerepo for temporary usage:yum-config-manager --disable Kubernetesorsubscription-manager repos --disable=Kubernetes5. Configure the failing repository to be skipped, if it is unavailable.Note that yum will try to contact the repo. when it runs most commands,so will have to try and fail each time (and thus. yum will be be muchslower). If it is a very temporary problem though, this is often a nicecompromise:yum-config-manager --save --setopt=Kubernetes.skip_if_unavailable=truefailure: repodata/repomd.xml from Kubernetes: [Errno 256] No more mirrors to try.
https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno 14] curl#60 - "Peer's Certificate has expired."
解决:
$ grep gpgcheck -inR *playbooks/roles/k8s/tasks/RedHat.yml:8: gpgcheck: no
playbooks/roles/k8s/tasks/RedHat.yml:17: repo_gpgcheck: no
playbooks/roles/k8s/tasks/RedHat.yml:18: gpgcheck: no
问题 2:部署 CNI 后 CoreDNS Pods 起不来
Warning Unhealthy 47s (x2 over 57s) kubelet Readiness probe failed: Get "http://10.47.255.246:8181/ready": dial tcp 10.47.255.246:8181: connect: no route to hostWarning Unhealthy 6s (x9 over 99s) kubelet Readiness probe failed: Get "http://10.47.255.246:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Warning Unhealthy 4s (x4 over 35s) kubelet Liveness probe failed: Get "http://10.47.255.246:8080/health": dial tcp 10.47.255.246:8080: connect: no route to host
原因:运行在 Host OS 网络上的 kukelet 访问不了处于 Overlay VN 中的 CoreDNS Pods 的 IP 地址。
解决:vRouter vhost0 必须作为 Default Route 的 Interface。在 Host OS 上可以 ping 通 CoreDNS Pods 的 IP 地址。
问题 3:跨 Nodes 东西向流量不通
以下部署场景中会可能会存在跨 Nodes 东西向流量不通的情况:
- ESXi Hypervisor 嵌套部署环境
- VM 嵌套部署环境
- NIC 不支持 IP checksum calculation 并生成了一个 Incorrect checksum。例如:Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe NIC cards。
现象:
# 1、dropstats 反馈 Checksum errors
(contrail-tools)[root@TF-node02 /]$ dropstats
Invalid IF 10
IF Drop 90392
Flow Action Drop 95221
Flow Unusable (Eviction) 10
Invalid NH 9
Duplicate 166225
Checksum errors 635# 2、tcpdump 反馈 cksum incorrect
$ tcpdump -v -nn -l -i enp2s0f1.1162 host 10.0.2.162 | grep -i incorrect
...
tcpdump: listening on enp2s0f1.1162, link-type EN10MB (Ethernet), capture size 262144 bytes
10.254.19.231.80 > 192.168.100.3.45506: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
seq 1901889431, ack 1081063811, win 28960, options [mss 1420,sackOK,\
TS val 456361578 ecr 41455995,nop,wscale 7], length 0
10.254.19.231.80 > 192.168.100.3.45506: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
seq 1901889183, ack 1081063811, win 28960, options [mss 1420,sackOK,\
TS val 456361826 ecr 41455995,nop,wscale 7], length 0
10.254.19.231.80 > 192.168.100.3.45506: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
seq 1901888933, ack 1081063811, win 28960, options [mss 1420,sackOK,\
TS val 456362076 ecr 41455995,nop,wscale 7], length 0# 3、flow -l command that shows the information about a drop for unknown reason.
$ flow -l
关闭所有 Worker Nodes 中 vhost0 所挂载的 NIC 的 offload 功能。
$ ethtool --offload ens192 tx off rx off$ ethtool -k ens192
...
tx-checksumming: off
tx-checksum-ipv4: off
tx-checksum-ipv6: off
tx-checksum-sctp: off
tcp-segmentation-offload: off
tx-tcp-segmentation: off [requested on]
tx-tcp6-segmentation: off [requested on]
Tungsten Fabric SDN — 与 Kubernetes 的集成部署(CN)相关推荐
- Tungsten Fabric SDN — 与 OpenStack 的集成部署
目录 文章目录 目录 部署架构 资源配置 软件版本 Tungsten Fabric 与 OpenStack 的集成部署 Action1. 基础环境设置 Action2. 安装软件依赖 Action3. ...
- Tungsten Fabric SDN — 与 OpenStack 的集成架构
目录 文章目录 目录 Tungsten Fabric 与 OpenStack 的集成架构 OpenStack Instance 的实例化流程 Tungsten Fabric 与 OpenStack 的 ...
- Tungsten Fabric SDN — Orchestrator 集成部署模式 — with Kubernetes
目录 文章目录 目录 Tungsten Fabric with Kubernetes Tungsten Fabric with Kubernetes Kubernetes 的标准网络模型是扁平的,并没 ...
- Tungsten Fabric SDN — 与 Bare Metal 的集成架构
目录 文章目录 目录 与 Bare Metal 的集成架构 与 Bare Metal 的集成架构 与 Bare Metal 的集成原理是:利用 vRouter 对 EVPN 的支持,同时 Contro ...
- Tungsten Fabric SDN — Netronome Agilio SmartNIC vRouter
目录 文章目录 目录 Netronome Agilio vRouter 软件架构 SmartNIC SR-IOV / XVIO 部署架构 性能测试 OpenStack 集成 参考文档 Netronom ...
- Tungsten Fabric SDN — for Akraino Based Network Edges
目录 文章目录 目录 Tungsten Fabric as SDN for Akraino Based Network Edges Deployment Tungsten Fabric as SDN ...
- Tungsten Fabric SDN — 网络架构
目录 文章目录 目录 Tungsten Fabric 的网络架构设计思路 - EVPN on the Host Tungsten Fabric 的网络架构 云网融合的网络架构 OpenStack Ne ...
- Tungsten Fabric SDN — Service Chain — Heat Templates
目录 文章目录 目录 TF Heat Templates Use NFV Service Chain by Heat 通过 TF Heat Templates 编排 Service Chain TF ...
- Tungsten Fabric SDN — 软件架构
目录 文章目录 目录 Tungsten Fabric 的软件架构 Tungsten Fabric 的部署架构 Controller Configuration Node Config Config D ...
最新文章
- 在3ds Max中使用V-Ray 5渲染引擎视频教程
- 2019年上半年收集到的人工智能强化学习干货文章
- [离散时间信号处理学习笔记] 8. z逆变换
- LeetCode 448. Find All Numbers Disappeared in an Array 442. Find All Duplicates in an Array
- kettle获取当前日期_kettle中通过 时间戳(timestamp)方式 来实现数据库的增量同步操作(一)...
- 进程的并发与并行,三种状态
- 一个关于c++ list迭代器的问题
- SpringBoot的优势
- leetcode347. 前 K 个高频元素
- 【渝粤题库】陕西师范大学292391 金融机构管理 作业(专升本)
- 2019年12月份最热门 Github 开源项目整理
- Sharepoint 2013 发布功能(Publishing features)
- web浏览器_2019版 web 浏览器现状
- 将html对象转换成jq,2js对象与jq对象之间互转.html
- pdf factory pro7序列号教你如何打印转换PDF教程
- Spring MVC学习(6)—Spring数据类型转换机制全解【一万字】
- ClickHouse 深度解析第二篇
- wps怎么恢复成单页_WPS怎么调回单页显示
- 穹顶之下-善恶是非谁来负责
- SQLi lab: Equivalent to information schema on Oracle