自己动手写一个仿Docker虚拟容器

本项目参照书籍《自己动手写Docker》

作者：陈显鹭（花名：遥鹭）-阿里云高级研发工程师等

项目地址：https://gitee.com/ShengHua666/can

项目环境：Linux 虚拟机 Ubuntu20.04 内核5.10.x，Go 1.17.1

项目运行:
go mod init Bocker
go mod tidy
go build mydocker/main.go
mv main can
cp can /bin/
mv busybox.tar /root/
can --help
can run -ti --name c1 busybox sh

一、项目整体架构

通过学习书籍《自己动手写Docker》，自主搭建can（自己命名）虚拟容器，实现容器相互隔离、容器卷挂载技术、打包镜像、网络通信等技术，如图所示是整个项目的项目结构。

Cgroups文件夹主要用于存储Cgroups相关的文件，如cpu、memory等子系统的限制信息，用于限制容器资源，如图2所示Cgroups文件结构。

Command文件夹主要存储容器相关的指令，其中包含"run"：用于运行指定镜像成容器，“ps”：查询正在运行的所有容器，“logs”：打印某一容器的日志信息，“exec”：进入某一正在后台运行的容器，“stop”：停止某一正在运行的容器，“rm”：删除某一已停止的容器，“commit”：将某一运行的容器打包形成镜像，“network”：为一个正在运行的容器添加通信网络。

Container文件夹用于存储容器的相关信息，如容器的init进程在宿主机上的 PID、容器Id、容器名、容器内init运行命令、容器创建时间、容器的状态、容器的数据卷信息、容器端口映射等。

Network文件夹是容器网络通信的关键，存储了网络通信的所有信息，如bridge网桥信息、Endpoint网络端点、NetworkDriver网络驱动、IPAM等相关组件。

Nsenter文件夹存储了exec指令的相关文件，使我们可以进入一个正在后台运行的容器，它里面包含一个C语言程序，因为 Go每启动一个程序就会进入多线程状态，因此无法简简单单地在Go里面直接调用系统调用，使当前的进程进入对应的Mount Namespace。这里需要借助C来实现这个功能。

Run文件夹存储容器运行时的相关信息，主要于Command文件夹内的指令相对应，当指令输入时，Command文件会调用Run文件夹中的相关函数来实现操作。主要的函数如run、ps、exec、commit、log、stop等。

Main.go文件是整个项目的主函数，用于启动整个项目。

二、项目已实现功能

注：运行之前一定要将busybox.tar 放在根目录/root/
NAME:can - can is a simple container runtime implementation.The purpose of this project is to learn how docker works and how to write a docker by ourselvesEnjoy it, just for fun.USAGE:can [global options] command [command options] [arguments...]COMMANDS:init     Init container process run user's process in container. Do not call it outsiderun      Create a container with namespace and cgroups limit ie: can run -ti [image] [command]ps       list all the containerslogs     print logs of a containerexec     exec a command into containerstop     stop a containerrm       remove unused containerscommit   commit a container into imagenetwork  container network commandshelp, h  Shows a list of commands or help for one commandGLOBAL OPTIONS:--help, -h  show help

1 简介

can是一个简易的运行时容器，这个项目的目的是学习docker是如何工作的，以及如何自己写一个docker，享受它带给你的欢乐吧。

2 目前以及实现的命令

can run -ti/d [image] [command] 可以创建一个拥有独立命名空间和Cgroups限制的容器
can ps 可以查看正在运行中的所有容器
can logs 打印某个容器的信息到日志中
can exec 可以进入某个正在后台运行的容器
can stop 停止某个正在运行的容器
can rm 删除某个已停止的容器
can commit 打包某个容器变成镜像
can network 容器网络相关指令
help 展示帮助指令

三、项目演示

1 实现容器隔离性

容器运行指定镜像，实现容器与宿主机的简易隔离，并通过ps -ef 指令来查看容器内的进程id是否已经实现了隔离性。

注：运行之前一定要将busybox.tar 放在根目录/root/

# 使用 can run -ti --name c1 busybox sh 指定运行busybox镜像，并命名为c1容器，sh为容器内运行的第一个进程
root:# can run -ti --name c1 busybox sh
{"level":"info","msg":"createTty true","time":"2022-04-17T00:12:18-07:00"}
{"level":"info","msg":"init come on","time":"2022-04-17T00:12:18-07:00"}
{"level":"info","msg":"command all is sh","time":"2022-04-17T00:12:18-07:00"}
{"level":"info","msg":"Current location is /root/mnt/c1","time":"2022-04-17T00:12:18-07:00"}
{"level":"info","msg":"Find path /bin/sh","time":"2022-04-17T00:12:18-07:00"}
/ # ps -ef
PID   USER     TIME  COMMAND1 root      0:00 sh7 root      0:00 ps -ef
# 在容器运行ps -ef时，可以发现sh进程是容器内的第一个进程，PID为1，而ps -ef是PID为1的父进程创建出来的。
/ # ls ls指令可以查看当前目录下的所有文件
bin   dev   etc   home  proc  root  sys   tmp   usr   var
/ # ps -ef 在宿主机内用ps -ef 查看正在运行的进程
PID   USER     TIME  COMMAND1 root      0:00 sh9 root      0:00 ps -ef
/ # 在宿主机内输入pstree -pl 查看进程树
root:# pstree -pl  main(29469)───sh(29476)  ├─{main}(29470)  ├─{main}(29471)  ├─{main}(29472)  ├─{main}(29473)  └─{main}(29474)
/ # 可以看到，容器对于宿主机内的pid 与 容器内部对应的pid不一样，即实现了容器与宿主机的隔离性。

流程图如图所示。

2 实现容器卷挂载技术

实现容器卷挂载技术，用busybox.tar镜像启动两个容器c1和c2。c1容器把宿主机/root/f1 挂载到容器/t1目录下。c2容器把宿主机/root/f2挂载到容器/t2目录下。

//启动c1

root:# can run -d --name c1 -v /root/f1:/t1 busybox top
{"level":"info","msg":"createTty false","time":"2022-04-13T16:13:58+08:00"}
{"level":"info","msg":"Mkdir parent dir /root/f1 error. mkdir /root/f1: file exists","time":"2022-04-13T16:13:58+08:00"}
{"level":"info","msg":"NewWorkSpace volume urls [\"/root/f1\" \"/t1\"]","time":"2022-04-13T16:13:58+08:00"}
{"level":"info","msg":"command all is top","time":"2022-04-13T16:13:58+08:00"}

//启动c2

root:# can run -d --name c2 -v /root/f2:/t2 busybox top
{"level":"info","msg":"createTty false","time":"2022-04-13T16:14:07+08:00"}
{"level":"info","msg":"Mkdir parent dir /root/f2 error. mkdir /root/f2: file exists","time":"2022-04-13T16:14:07+08:00"}
{"level":"info","msg":"NewWorkSpace volume urls [\"/root/f2\" \"/t2\"]","time":"2022-04-13T16:14:07+08:00"}
{"level":"info","msg":"command all is top","time":"2022-04-13T16:14:08+08:00"}
{"level":"warning","msg":"remove cgroup fail unlinkat /sys/fs/cgroup/cpu,cpuacct/2251256839/cgroup.procs: operation not permitted","time":"2022-04-13T16:14:08+08:00"}

//查看容器是否正常运行

root@raspberrypi:~/go/src/Bocker# can ps
ID           NAME        PID         STATUS      COMMAND     CREATED
0555186751   c1          29547       running     top         2022-04-13 16:13:58
2251256839   c2          29564       running     top         2022-04-13 16:14:07

另外，打开一个会话，查看宿主机/root目录下的内容，发现多了f1和f2两个挂载文件、mnt 这个所有容器的文件系统总入口，以及所有容器读写层的总入口writeLayer目录。在mnt和 writeLayer 的目录下，都分别创建了c1和 c2两个子目录。mnt/containerName目录就是整个容器的文件系统。writeLayer/ {containerName}是容器的可读写层，可以看到，里面还有挂载数据卷到容器的挂载点目录。

root:~# ls
busybox  busybox.tar  f1  f2  go  mnt  snap  writeLayer
root:~# ls mnt
c1  c2
root:~# ls  mnt/c1
bin  dev  etc  home  proc  root  sys  t1  tmp  usr  var
root:~# ls  mnt/c2
bin  dev  etc  home  proc  root  sys  t2  tmp  usr  var
//查看 writeLayer 目录结构
root:~# tree writeLayer/
writeLayer/  ├── c1  │   └── t1  └── c2  └── t2  4 directories, 0 files

接下来，用exec命令进入到cl容器中。创建/t1/test1.txt文件，写入"hello cl"(写入数据卷的操作）。创建/t1-1/test1.txt 文件，写入" hello t1-1"。

root:# can exec c1 sh
{"level":"info","msg":"container pid 29547","time":"2022-04-13T16:20:20+08:00"}
{"level":"info","msg":"command sh","time":"2022-04-13T16:20:20+08:00"}
/ # ls
bin   dev   etc   home  proc  root  sys   t1    tmp   usr   var
/ # echo "hello c1" >> /t1/test1.txt
/ # cat /t1/test1.txt
hello c1
/ # mkdir t1-1
/ # echo "hello t1-1" >> /t1-1/test1.txt
/ # cat /t1-1/test1.txt
hello t1-1

在另外一个会话中，查看宿主机上 writeLayer目录的内容。多了/c1/t1-1目录和/c1/t1-1/test1.txt文件，并通过cat命令查看是否挂载成功。

root:~# tree writeLayer/
writeLayer/  ├── c1  │   ├── root  │   ├── t1  │   └── t1-1  │       └── test1.txt  └── c2  └── t2  6 directories, 1 file
root@raspberrypi:~#  ls f1/
test1.txt
root@raspberrypi:~# cat f1/test1.txt
hello c1

从以上操作可以看到容器卷技术已经实现，容器停止后也可以长久存储一些重要信息，下面停止容器，并查看f1中的信息是否存在。

//容器c1中

/ # exit
root:# can ps
ID           NAME        PID         STATUS      COMMAND     CREATED
0555186751   c1          29547       running     top         2022-04-13 16:13:58
2251256839   c2          29564       running     top         2022-04-13 16:14:07
root:# can stop c1
root:r# can ps
ID           NAME        PID         STATUS      COMMAND     CREATED
0555186751   c1                      stopped     top         2022-04-13 16:13:58
2251256839   c2          29564       running     top         2022-04-13 16:14:07
root:# can rm c1
root:# can ps
ID           NAME        PID         STATUS      COMMAND     CREATED
2251256839   c2          29564       running     top         2022-04-13 16:14:07

现在容器已经停止并删除，查看f1中的信息。

root:~# tree writeLayer/
writeLayer/  └── c2  └── t2  2 directories, 0 files
root:~# ls mnt
c2
root:~# cat f1/test1.txt
hello c1

测试成功，说明数据卷挂载正常。

3 实现容器间网络通信

实现容器网络通信技术，即可以通过创建bridge网桥模式，实现两个容器直接相互ping，以及容器共享宿主机网络，实现对外部网站进行ping操作。

首先，建一个供容器连接的网络，用于让容器挂载。

//创建一个驱动为网桥模式的网络，设置网络子网为192.168.10.1/24，并命名为testbridge

root:# can network create --driver bridge --subnet 192.168.10.1/24 testbridge

容器与容器互联

分别在创建的网络上启动两个容器，并拿到第一个容器的 IP ，在第二个容器中去访问。

root:# can run -ti -net testbridge busybox sh
{"level":"info","msg":"createTty true","time":"2022-04-13T16:33:16+08:00"}
{"level":"info","msg":"init come on","time":"2022-04-13T16:33:16+08:00"}
{"level":"info","msg":"command all is sh","time":"2022-04-13T16:33:16+08:00"}
{"level":"info","msg":"Current location is /root/mnt/4431661713","time":"2022-04-13T16:33:16+08:00"}
{"level":"info","msg":"Find path /bin/sh","time":"2022-04-13T16:33:16+08:00"}
//查看容器网络ip
/ # ifconfig
cif-44316 Link encap:Ethernet  HWaddr 1E:84:07:5C:61:C8  inet addr:192.168.10.5  Bcast:192.168.10.255  Mask:255.255.255.0inet6 addr: fe80::1c84:7ff:fe5c:61c8/64 Scope:LinkUP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1RX packets:8 errors:0 dropped:0 overruns:0 frame:0TX packets:6 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000 RX bytes:736 (736.0 B)  TX bytes:516 (516.0 B)lo        Link encap:Local Loopback  inet addr:127.0.0.1  Mask:255.0.0.0inet6 addr: ::1/128 Scope:HostUP LOOPBACK RUNNING  MTU:65536  Metric:1RX packets:0 errors:0 dropped:0 overruns:0 frame:0TX packets:0 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

这个容器的IP地址是192.168.10.5，下面尝试一下在另外一个容器中连接这个容器。

root:#  can run -ti -net testbridge busybox sh
{"level":"info","msg":"createTty true","time":"2022-04-13T16:35:01+08:00"}
{"level":"info","msg":"init come on","time":"2022-04-13T16:35:01+08:00"}
{"level":"info","msg":"command all is sh","time":"2022-04-13T16:35:01+08:00"}
{"level":"info","msg":"Current location is /root/mnt/4805249089","time":"2022-04-13T16:35:01+08:00"}
{"level":"info","msg":"Find path /bin/sh","time":"2022-04-13T16:35:01+08:00"}
/ # ifconfig
cif-48052 Link encap:Ethernet  HWaddr DA:A0:3D:E6:28:DE  inet addr:192.168.10.7  Bcast:192.168.10.255  Mask:255.255.255.0inet6 addr: fe80::d8a0:3dff:fee6:28de/64 Scope:LinkUP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1RX packets:6 errors:0 dropped:0 overruns:0 frame:0TX packets:6 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000 RX bytes:516 (516.0 B)  TX bytes:516 (516.0 B)lo        Link encap:Local Loopback  inet addr:127.0.0.1  Mask:255.0.0.0inet6 addr: ::1/128 Scope:HostUP LOOPBACK RUNNING  MTU:65536  Metric:1RX packets:0 errors:0 dropped:0 overruns:0 frame:0TX packets:0 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

这个容器的网络IP为192.168.10.7，尝试去ping上面的网络。

/ # ping 192.168.10.5
PING 192.168.10.5 (192.168.10.5): 56 data bytes
64 bytes from 192.168.10.5: seq=0 ttl=64 time=0.536 ms
64 bytes from 192.168.10.5: seq=1 ttl=64 time=0.502 ms
64 bytes from 192.168.10.5: seq=2 ttl=64 time=0.404 ms
64 bytes from 192.168.10.5: seq=3 ttl=64 time=0.488 ms
64 bytes from 192.168.10.5: seq=4 ttl=64 time=0.439 ms
64 bytes from 192.168.10.5: seq=5 ttl=64 time=0.436 ms
64 bytes from 192.168.10.5: seq=6 ttl=64 time=0.432 ms
^C
--- 192.168.10.5 ping statistics ---
7 packets transmitted, 7 packets received, 0% packet loss

由以上结果可以看到，两个容器可以通过这个网络互相连通。

容器访问外部网络

在刚刚创建的网络192.168.10.5中进行操作，通过ping百度的IP地址(14.215.177.39)来实现访问外部网络。

/ # ping 14.215.177.39
PING 14.215.177.39 (14.215.177.39): 56 data bytes
64 bytes from 14.215.177.39: seq=0 ttl=50 time=37.318 ms
64 bytes from 14.215.177.39: seq=1 ttl=50 time=35.473 ms
64 bytes from 14.215.177.39: seq=2 ttl=50 time=38.993 ms
64 bytes from 14.215.177.39: seq=3 ttl=50 time=36.837 ms
64 bytes from 14.215.177.39: seq=4 ttl=50 time=37.521 ms
64 bytes from 14.215.177.39: seq=5 ttl=50 time=45.154 ms
64 bytes from 14.215.177.39: seq=6 ttl=50 time=36.690 ms
^C
--- 14.215.177.39 ping statistics ---
7 packets transmitted, 7 packets received, 0% packet loss
round-trip min/avg/max = 35.473/38.283/45.154 ms

由以上结果可以看到，容器也可以访问外部网络。

如图所示，为整体的网络模型结构图，其中包含了整个模型的组件和流程。首先，需要抽象出容器网络的两个对象——网络和网络端点。

四、存在问题及采取的措施

1 容器技术迭代更新过快

由于《自己动手写Docker》是17年编著的，所用的环境是Ubuntu14.04，Go 1.7.1 版本，Linux 内核3.13.x，而现在常用版本已升级至Ubuntu20.04，Go 1.17.1,Linux 内核5.10.x,因此当时使用的部分函数现在已经被淘汰，这是我在观看源码是遇到的问题，经过查询资料已经查看一些优秀博客得以解决。搭建环境我认为还是非常重要的，有了环境基础，后面的实验就有了依托。

优秀博客链接：

https://blog.csdn.net/weixin_43988498/article/details/121044780

https://gitee.com/free-love/docker-demo

2 陌生的编程语言

由于Docker是由Go语言编写的，我之前没有系统学习过Go语言，因此在观看源码时有点困难，Go语言的语法比较生疏，因此需要多多学习多多练习。

Go语言学习文档：https://m.runoob.com/go/go-tutorial.html

总结

以上就是这段时间对《自己动手写Docker》这本书的学习成果，目前大致实现了Docker的基本功能，之后会对can进行提升和完善，例题实现pull、push以及跨主机通信等功能。

项目的地址在：

https://gitee.com/ShengHua666/can

觉得不错的话，请点赞关注呦～～你的关注就是博主的动力！