关联博客:《kubernetes/k8s CRI 分析-容器运行时接口分析》
《kubernetes/k8s CRI 分析-kubelet创建pod分析》

之前的博文先对 CRI 做了介绍,然后对 kubelet CRI 相关源码包括 kubelet 组件 CRI 相关启动参数分析、CRI 相关 interface/struct 分析、CRI 相关初始化分析、kubelet调用CRI创建pod分析 4 个部分进行了分析,没有看的小伙伴,可以点击上面的链接去看一下。

把之前博客分析到的 CRI 架构图再贴出来一遍。

本篇博文将对 kubelet 调用 CRI 删除 pod 做分析。

kubelet中CRI相关的源码分析

kubelet的CRI源码分析包括如下几部分:
(1)kubelet CRI相关启动参数分析;
(2)kubelet CRI相关interface/struct分析;
(3)kubelet CRI初始化分析;
(4)kubelet调用CRI创建pod分析;
(5)kubelet调用CRI删除pod分析。

上两篇博文先对前四部分做了分析,本篇博文将对kubelet调用CRI删除pod做分析。

基于tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4

5.kubelet调用CRI删除pod分析

kubelet CRI删除pod调用流程

下面以kubelet dockershim删除pod调用流程为例做一下分析。

kubelet通过调用dockershim来停止容器,而dockershim则调用docker来停止容器,并调用CNI来删除pod网络。

图1:kubelet dockershim删除pod调用图示

dockershim属于kubelet内置CRI shim,其余remote CRI shim的创建pod调用流程其实与dockershim调用基本一致,只不过是调用了不同的容器引擎来操作容器,但一样由CRI shim调用CNI来删除pod网络。

下面进行详细的源码分析。

直接看到kubeGenericRuntimeManagerKillPod方法,调用CRI删除pod的逻辑将在该方法里触发发起。

从该方法代码也可以看出,kubelet删除一个pod的逻辑为:
(1)先停止属于该pod的所有containers;
(2)然后再停止pod sandbox容器。

注意点:这里只是停止容器,而删除容器的操作由kubelet的gc来做。

// pkg/kubelet/kuberuntime/kuberuntime_manager.go
// KillPod kills all the containers of a pod. Pod may be nil, running pod must not be.
// gracePeriodOverride if specified allows the caller to override the pod default grace period.
// only hard kill paths are allowed to specify a gracePeriodOverride in the kubelet in order to not corrupt user data.
// it is useful when doing SIGKILL for hard eviction scenarios, or max grace period during soft eviction scenarios.
func (m *kubeGenericRuntimeManager) KillPod(pod *v1.Pod, runningPod kubecontainer.Pod, gracePeriodOverride *int64) error {err := m.killPodWithSyncResult(pod, runningPod, gracePeriodOverride)return err.Error()
}// killPodWithSyncResult kills a runningPod and returns SyncResult.
// Note: The pod passed in could be *nil* when kubelet restarted.
func (m *kubeGenericRuntimeManager) killPodWithSyncResult(pod *v1.Pod, runningPod kubecontainer.Pod, gracePeriodOverride *int64) (result kubecontainer.PodSyncResult) {killContainerResults := m.killContainersWithSyncResult(pod, runningPod, gracePeriodOverride)for _, containerResult := range killContainerResults {result.AddSyncResult(containerResult)}// stop sandbox, the sandbox will be removed in GarbageCollectkillSandboxResult := kubecontainer.NewSyncResult(kubecontainer.KillPodSandbox, runningPod.ID)result.AddSyncResult(killSandboxResult)// Stop all sandboxes belongs to same podfor _, podSandbox := range runningPod.Sandboxes {if err := m.runtimeService.StopPodSandbox(podSandbox.ID.ID); err != nil {killSandboxResult.Fail(kubecontainer.ErrKillPodSandbox, err.Error())klog.Errorf("Failed to stop sandbox %q", podSandbox.ID)}}return
}

5.1 m.killContainersWithSyncResult

m.killContainersWithSyncResult作用:停止属于该pod的所有containers。

主要逻辑:起与容器数量相同的goroutine,调用m.killContainer来停止容器。

// pkg/kubelet/kuberuntime/kuberuntime_container.go
// killContainersWithSyncResult kills all pod's containers with sync results.
func (m *kubeGenericRuntimeManager) killContainersWithSyncResult(pod *v1.Pod, runningPod kubecontainer.Pod, gracePeriodOverride *int64) (syncResults []*kubecontainer.SyncResult) {containerResults := make(chan *kubecontainer.SyncResult, len(runningPod.Containers))wg := sync.WaitGroup{}wg.Add(len(runningPod.Containers))for _, container := range runningPod.Containers {go func(container *kubecontainer.Container) {defer utilruntime.HandleCrash()defer wg.Done()killContainerResult := kubecontainer.NewSyncResult(kubecontainer.KillContainer, container.Name)if err := m.killContainer(pod, container.ID, container.Name, "", gracePeriodOverride); err != nil {killContainerResult.Fail(kubecontainer.ErrKillContainer, err.Error())}containerResults <- killContainerResult}(container)}wg.Wait()close(containerResults)for containerResult := range containerResults {syncResults = append(syncResults, containerResult)}return
}

5.1.1 m.killContainer

m.killContainer方法主要是调用m.runtimeService.StopContainer

runtimeService即RemoteRuntimeService,实现了CRI shim客户端-容器运行时接口RuntimeService interface,持有与CRI shim容器运行时服务端通信的客户端。所以调用m.runtimeService.StopContainer,实际上等于调用了CRI shim服务端的StopContainer方法,来进行容器的停止操作。

// pkg/kubelet/kuberuntime/kuberuntime_container.go
// killContainer kills a container through the following steps:
// * Run the pre-stop lifecycle hooks (if applicable).
// * Stop the container.
func (m *kubeGenericRuntimeManager) killContainer(pod *v1.Pod, containerID kubecontainer.ContainerID, containerName string, message string, gracePeriodOverride *int64) error {...klog.V(2).Infof("Killing container %q with %d second grace period", containerID.String(), gracePeriod)err := m.runtimeService.StopContainer(containerID.ID, gracePeriod)if err != nil {klog.Errorf("Container %q termination failed with gracePeriod %d: %v", containerID.String(), gracePeriod, err)} else {klog.V(3).Infof("Container %q exited normally", containerID.String())}m.containerRefManager.ClearRef(containerID)return err
}
m.runtimeService.StopContainer

m.runtimeService.StopContainer方法,会调用r.runtimeClient.StopContainer,即利用CRI shim客户端,调用CRI shim服务端来进行停止容器的操作。

分析到这里,kubelet中的CRI相关调用就分析完毕了,接下来将会进入到CRI shim(以kubelet内置CRI shim-dockershim为例)里进行停止容器的操作分析。

// pkg/kubelet/remote/remote_runtime.go
// StopContainer stops a running container with a grace period (i.e., timeout).
func (r *RemoteRuntimeService) StopContainer(containerID string, timeout int64) error {// Use timeout + default timeout (2 minutes) as timeout to leave extra time// for SIGKILL container and request latency.t := r.timeout + time.Duration(timeout)*time.Secondctx, cancel := getContextWithTimeout(t)defer cancel()r.logReduction.ClearID(containerID)_, err := r.runtimeClient.StopContainer(ctx, &runtimeapi.StopContainerRequest{ContainerId: containerID,Timeout:     timeout,})if err != nil {klog.Errorf("StopContainer %q from runtime service failed: %v", containerID, err)return err}return nil
}

5.1.2 r.runtimeClient.StopContainer

接下来将会以dockershim为例,进入到CRI shim来进行停止容器操作的分析。

前面kubelet调用r.runtimeClient.StopContainer,会进入到dockershim下面的StopContainer方法。

// pkg/kubelet/dockershim/docker_container.go
// StopContainer stops a running container with a grace period (i.e., timeout).
func (ds *dockerService) StopContainer(_ context.Context, r *runtimeapi.StopContainerRequest) (*runtimeapi.StopContainerResponse, error) {err := ds.client.StopContainer(r.ContainerId, time.Duration(r.Timeout)*time.Second)if err != nil {return nil, err}return &runtimeapi.StopContainerResponse{}, nil
}
ds.client.StopContainer

主要是调用d.client.ContainerStop

// pkg/kubelet/dockershim/libdocker/kube_docker_client.go
// Stopping an already stopped container will not cause an error in dockerapi.
func (d *kubeDockerClient) StopContainer(id string, timeout time.Duration) error {ctx, cancel := d.getCustomTimeoutContext(timeout)defer cancel()err := d.client.ContainerStop(ctx, id, &timeout)if ctxErr := contextError(ctx); ctxErr != nil {return ctxErr}return err
}
d.client.ContainerStop

构建请求参数,向docker指定的url发送http请求,停止容器。

// vendor/github.com/docker/docker/client/container_stop.go
// ContainerStop stops a container. In case the container fails to stop
// gracefully within a time frame specified by the timeout argument,
// it is forcefully terminated (killed).
//
// If the timeout is nil, the container's StopTimeout value is used, if set,
// otherwise the engine default. A negative timeout value can be specified,
// meaning no timeout, i.e. no forceful termination is performed.
func (cli *Client) ContainerStop(ctx context.Context, containerID string, timeout *time.Duration) error {query := url.Values{}if timeout != nil {query.Set("t", timetypes.DurationToSecondsString(*timeout))}resp, err := cli.post(ctx, "/containers/"+containerID+"/stop", query, nil, nil)ensureReaderClosed(resp)return err
}

5.2 m.runtimeService.StopPodSandbox

m.runtimeService.StopPodSandbox中的runtimeService即RemoteRuntimeService,其实现了CRI shim客户端-容器运行时接口RuntimeService interface,持有与CRI shim容器运行时服务端通信的客户端。所以调用m.runtimeService.StopPodSandbox,实际上等于调用了CRI shim服务端的StopPodSandbox方法,来进行pod sandbox的停止操作。

分析到这里,kubelet中的CRI相关调用就分析完毕了,接下来将会进入到CRI shim(以kubelet内置CRI shim-dockershim为例)里进行停止pod sandbox的分析。

// pkg/kubelet/remote/remote_runtime.go
// StopPodSandbox stops the sandbox. If there are any running containers in the
// sandbox, they should be forced to termination.
func (r *RemoteRuntimeService) StopPodSandbox(podSandBoxID string) error {ctx, cancel := getContextWithTimeout(r.timeout)defer cancel()_, err := r.runtimeClient.StopPodSandbox(ctx, &runtimeapi.StopPodSandboxRequest{PodSandboxId: podSandBoxID,})if err != nil {klog.Errorf("StopPodSandbox %q from runtime service failed: %v", podSandBoxID, err)return err}return nil
}

5.2.1 r.runtimeClient.StopPodSandbox

接下来将会以dockershim为例,进入到CRI shim来进行停止pod sandbox的分析。

前面kubelet调用r.runtimeClient.StopPodSandbox,会进入到dockershim下面的StopPodSandbox方法。

停止pod sandbox主要有2个步骤:
(1)调用ds.network.TearDownPod:删除pod网络;
(2)调用ds.client.StopContainer:停止pod sandbox容器。

需要注意的是,上面的2个步骤只有都成功了,停止pod sandbox的操作才算成功,且上面2个步骤成功的先后顺序没有要求。

// pkg/kubelet/dockershim/docker_sandbox.go
// StopPodSandbox stops the sandbox. If there are any running containers in the
// sandbox, they should be force terminated.
// TODO: This function blocks sandbox teardown on networking teardown. Is it
// better to cut our losses assuming an out of band GC routine will cleanup
// after us?
func (ds *dockerService) StopPodSandbox(ctx context.Context, r *runtimeapi.StopPodSandboxRequest) (*runtimeapi.StopPodSandboxResponse, error) {var namespace, name stringvar hostNetwork boolpodSandboxID := r.PodSandboxIdresp := &runtimeapi.StopPodSandboxResponse{}// Try to retrieve minimal sandbox information from docker daemon or sandbox checkpoint.inspectResult, metadata, statusErr := ds.getPodSandboxDetails(podSandboxID)if statusErr == nil {namespace = metadata.Namespacename = metadata.NamehostNetwork = (networkNamespaceMode(inspectResult) == runtimeapi.NamespaceMode_NODE)} else {checkpoint := NewPodSandboxCheckpoint("", "", &CheckpointData{})checkpointErr := ds.checkpointManager.GetCheckpoint(podSandboxID, checkpoint)// Proceed if both sandbox container and checkpoint could not be found. This means that following// actions will only have sandbox ID and not have pod namespace and name information.// Return error if encounter any unexpected error.if checkpointErr != nil {if checkpointErr != errors.ErrCheckpointNotFound {err := ds.checkpointManager.RemoveCheckpoint(podSandboxID)if err != nil {klog.Errorf("Failed to delete corrupt checkpoint for sandbox %q: %v", podSandboxID, err)}}if libdocker.IsContainerNotFoundError(statusErr) {klog.Warningf("Both sandbox container and checkpoint for id %q could not be found. "+"Proceed without further sandbox information.", podSandboxID)} else {return nil, utilerrors.NewAggregate([]error{fmt.Errorf("failed to get checkpoint for sandbox %q: %v", podSandboxID, checkpointErr),fmt.Errorf("failed to get sandbox status: %v", statusErr)})}} else {_, name, namespace, _, hostNetwork = checkpoint.GetData()}}// WARNING: The following operations made the following assumption:// 1. kubelet will retry on any error returned by StopPodSandbox.// 2. tearing down network and stopping sandbox container can succeed in any sequence.// This depends on the implementation detail of network plugin and proper error handling.// For kubenet, if tearing down network failed and sandbox container is stopped, kubelet// will retry. On retry, kubenet will not be able to retrieve network namespace of the sandbox// since it is stopped. With empty network namespcae, CNI bridge plugin will conduct best// effort clean up and will not return error.errList := []error{}ready, ok := ds.getNetworkReady(podSandboxID)if !hostNetwork && (ready || !ok) {// Only tear down the pod network if we haven't done so alreadycID := kubecontainer.BuildContainerID(runtimeName, podSandboxID)err := ds.network.TearDownPod(namespace, name, cID)if err == nil {ds.setNetworkReady(podSandboxID, false)} else {errList = append(errList, err)}}if err := ds.client.StopContainer(podSandboxID, defaultSandboxGracePeriod); err != nil {// Do not return error if the container does not existif !libdocker.IsContainerNotFoundError(err) {klog.Errorf("Failed to stop sandbox %q: %v", podSandboxID, err)errList = append(errList, err)} else {// remove the checkpoint for any sandbox that is not found in the runtimeds.checkpointManager.RemoveCheckpoint(podSandboxID)}}if len(errList) == 0 {return resp, nil}// TODO: Stop all running containers in the sandbox.return nil, utilerrors.NewAggregate(errList)
}
ds.client.StopContainer

主要是调用d.client.ContainerStop

// pkg/kubelet/dockershim/libdocker/kube_docker_client.go
// Stopping an already stopped container will not cause an error in dockerapi.
func (d *kubeDockerClient) StopContainer(id string, timeout time.Duration) error {ctx, cancel := d.getCustomTimeoutContext(timeout)defer cancel()err := d.client.ContainerStop(ctx, id, &timeout)if ctxErr := contextError(ctx); ctxErr != nil {return ctxErr}return err
}
d.client.ContainerStop

构建请求参数,向docker指定的url发送http请求,停止pod sandbox容器。

// vendor/github.com/docker/docker/client/container_stop.go
// ContainerStop stops a container. In case the container fails to stop
// gracefully within a time frame specified by the timeout argument,
// it is forcefully terminated (killed).
//
// If the timeout is nil, the container's StopTimeout value is used, if set,
// otherwise the engine default. A negative timeout value can be specified,
// meaning no timeout, i.e. no forceful termination is performed.
func (cli *Client) ContainerStop(ctx context.Context, containerID string, timeout *time.Duration) error {query := url.Values{}if timeout != nil {query.Set("t", timetypes.DurationToSecondsString(*timeout))}resp, err := cli.post(ctx, "/containers/"+containerID+"/stop", query, nil, nil)ensureReaderClosed(resp)return err
}

总结

CRI架构图

在 CRI 之下,包括两种类型的容器运行时的实现:
(1)kubelet内置的 dockershim,实现了 Docker 容器引擎的支持以及 CNI 网络插件(包括 kubenet)的支持。dockershim代码内置于kubelet,被kubelet调用,让dockershim起独立的server来建立CRI shim,向kubelet暴露grpc server;
(2)外部的容器运行时,用来支持 rktcontainerd等容器引擎的外部容器运行时。

kubelet调用CRI删除pod流程分析

kubelet删除一个pod的逻辑为:
(1)先停止属于该pod的所有containers;
(2)然后再停止pod sandbox容器(包括删除pod网络)。

注意点:这里只是停止容器,而删除容器的操作由kubelet的gc来做。

kubelet CRI删除pod调用流程

下面以kubelet dockershim删除pod调用流程为例做一下分析。

kubelet通过调用dockershim来停止容器,而dockershim则调用docker来停止容器,并调用CNI来删除pod网络。

图1:kubelet dockershim删除pod调用图示

dockershim属于kubelet内置CRI shim,其余remote CRI shim的创建pod调用流程其实与dockershim调用基本一致,只不过是调用了不同的容器引擎来操作容器,但一样由CRI shim调用CNI来删除pod网络。

关联博客:《kubernetes/k8s CRI 分析-容器运行时接口分析》
《kubernetes/k8s CRI 分析-kubelet创建pod分析》

kubernetes/k8s CRI 分析-kubelet删除pod分析相关推荐

  1. k8s通过命令批量删除pod

    k8s批量删除失败的pod 查看所有的pod kubectl get pod --all-namespaces 选择STATUS列,查看某namespace的非Running的记录,比如: kubec ...

  2. Kubernetes(K8s)基本概念:HPA(Pod横向自动扩容)、StatefulSet

    Kubernetes基本概念:HPA.StatefulSet 一.水平扩展:HPA 二.StatefulSet 1)有状态服务的理解 2)StatefulSet的特性 一.水平扩展:HPA HPA全程 ...

  3. k8s添加pod,k8常用命令,k8s删除pod

    一 添加pod测试 创建deployment kubectl create deployment nginx(pod名称)–image=nginx 创建后查看:kubectl get deployme ...

  4. Kubernetes K8S之资源控制器Daemonset详解

    Kubernetes的资源控制器Daemonset详解与示例 主机配置规划 服务器名称(hostname) 系统版本 配置 内网IP 外网IP(模拟) k8s-master CentOS7.7 2C/ ...

  5. kubelet启动pod源码分析(三)

    之前的blog分析了kubelet启动pod的流程,那么pod一旦启动了,谁去上报状态呢?还是回到之前代码syncLoopIteration,这个里面有四个输入源,第一次创建接受到configch a ...

  6. k8s之滚动更新及pod流量分析

    一.LB Service流量 LoadBalancer 负责接收 K8s 集群外部流量并转发到 Node 节点上,ipvs/iptables 再负责将节点接收到的流量转发到 Pod 中(kube-pr ...

  7. kubernetes kubelet挂掉问题分析

    环境描述 kubernetes 组建的运行方式 kubelet : systemd 运行 其他都是docker起的容器 问题描述 1.有pod状态处于Unknow状态 [root@master-64 ...

  8. K8S架构设计及工作流程分析

    Kubernetes架构设计 核心组件 api server 功能 controller manager 负责维护集群的状态 scheduler 负责资源的调度按照预定的调度策略将Pod调度到相应的机 ...

  9. 在 Kubernetes 实施混沌工程—— Chaos Mesh® 原理分析与控制面开发

    Chaos Mesh® 是由 TiDB 背后的 PingCAP 公司开发,运行在 Kubernetes 上的混沌工程(Chaos Engineering)系统.简而言之,Chaos Mesh® 通过运 ...

  10. kubelet源码 删除pod(一)

    k8s版本为1.25.3版本 整体流程图如下 kubectl delete pod name 当删除一个pod的时候会经历一下流程 kubectl会发pod消息给api server. apiserv ...

最新文章

  1. react 时刻表插件_React“啊哈”的时刻
  2. python assert 断言的作用
  3. ISA Server 2007 beta TAP 开始招人
  4. Java虚拟内存区域介绍
  5. Hbuilderx编辑器介绍(00)
  6. ASP.NET网站防止SQL注入攻击
  7. UVA12043 Divisors【欧拉筛法】
  8. IJCAI2021论文:MEDA:一种为小样本文本分类设计的结合数据增强的元学习框架
  9. Atitit  java jsp 新的tag技术
  10. 大数据平台基础架构hadoop安全分析
  11. M3DGIS三维电子沙盘大数据人工智能元宇宙地理信息系统第5课
  12. 7月22日自助装机配置专家点评
  13. H型钢的尺寸的设计与使用(市场上可以轻松采购)
  14. html5水涟漪动画,CSS3水波涟漪动画定位样式制作教程
  15. 中国境内哪个chatGPT最好用
  16. LVS 负载均衡集群(一)| 超详细!一文带你了解 LVS 负载均衡集群
  17. [转载]Matlab绘图-很详细,很全面(包含各种标示符的输入方法)
  18. 矩阵的等价,相似及合同
  19. 【Java】输出打印正直角三角形和倒直角三角形
  20. 水果店要什么设备,开水果店的设备

热门文章

  1. 领域驱动设计实践(战术篇)
  2. 微波射频学习笔记9--------品质因数Q值的意义
  3. ibm tivoli_使用Tivoli®Composite Application Manager监视Tivoli®Access Manager WebSEAL服务器事务以进行响应时间跟踪
  4. CSS动画案例--天空中云朵变化效果
  5. 火狐老是跳出提示“Firefox正在安装组件,以便播放此页面上......”
  6. 超好用的PC端录屏软件推荐
  7. css实现毛玻璃效果——backdrop-filter
  8. 3.28layui添加商品功能和显示所有商品功能
  9. Hystrix熔断器
  10. VS2017使用Visual Leak Detector