linux capability详解

一、capability概述
- 1.1 查看当前用户的权限
- 1.2 进程的权限
- 1.3 在进程内部进行用户切换（进程内调用setuid和setgid）
- - 1.3.1 测试内核代码
- 1.4 文件权限
- - 1.4.1 查看某个文件的权限
  - 1.4.2 为某个文件赋权
- 1.5 进程创建子进程的时候的权限
capability在docker中
二、 docker 启用userns-remap
- 2.1 容器内部为root用户
- - 2.1.1 在容器侧的权限
  - 2.1.2 在主机侧的权限
- 2.2 容器内部为普通用户
- - 2.2.1 在容器侧的权限
  - 2.2.2 在主机侧的权限
- 2.3 容器中的chmod
- - 2.3.1 使用--cap-drop和--cap-add配合分配capabili
  - - 2.3.1.1 容器内为root用户
    - - 2.3.1.1.1 不使用no-new-privileges
      - 2.3.1.1.2 使用no-new-privileges
    - 2.3.1.2 容器内为普通用户
    - - 2.3.1.2.1 不开启no-new-privileges
      - 2.3.1.2.2 开启no-new-privileges
三、总结
- 3.1 docker容器启动过程权限变化
- 3.2 限制容器的权限
- - 3.2.1 启用userns-remap
  - 3.2.2 限制容器内的用户
  - 3.2.3 使用cap-add和cap-drop

一、capability概述

在许多文章中都有讲到这部分，本文不做过多解释。自行百度。

capabilities(7) — Linux manual page——官方权威！！！
Linux Capabilities 入门教程：概念篇——米开朗基杨
Linux Capabilities 入门教程：基础实战篇——米开朗基杨
Linux Capabilities 入门教程：进阶实战篇——米开朗基杨
Linux capability详解——弥敦道人-CSDN

在Linux内核2.2之前，为了检查进程权限，将进程区分为两类：特权进程(euid=0)和非特权进程。特权进程(通常为带有suid的程序)可以获取完整的root权限来对系统进行操作。

在linux内核2.2之后引入了capabilities机制，来对root权限进行更加细粒度的划分。如果进程不是特权进程，而且也没有root的有效id，系统就会去检查进程的capabilities，来确认该进程是否有执行特权操作的的权限。

可以通过man capabilities来查看具体的capabilities。

linux一共由5种权限集合。

Permitted ——可以赋予别人的权限。在下文中用大写P简称该权限
Effective ——当前有限的权限（真正实行权限的东西）。在下文中用大写E简称该权限
Inheritable ——可继承的权限。在下文中用大写I简称该权限
Bounding ——边界权限。在下文中用大写B简称该权限
Ambient——环境权限。在下文中用大写A简称该权限

1.1 查看当前用户的权限

查看/proc/$$/status文件中的Cap部分

普通用户

ubuntu@ubuntu-standard-pc:~$ cat /proc/$$/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

root用户

root@ubuntu-standard-pc:~# cat /proc/$$/status | grep Cap
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

CapInh对应上文的I
CapPrm对应上文的P
CapEff对应上文的E
CapBnd对应上文的B
CapAmb对应上文的A

1.2 进程的权限

下文中的进程权限用pP、pI、pE、pB、pA来分别对应进程的P、I、E、B、A

首先创建一个进程，sleep进程。sleep 100秒。并且在后台运行。（末尾 &表示后台运行）

ubuntu@ubuntu-standard-pc:~$ sleep 100 &
[1] 1968

可以看到该进程的pid为1968，查看该进程的状态，（位置在/proc/"pid"/status）抓取capability部分。
/proc/pid号/status中记录了该pid进程的状态，包括了该进程的权限（capability）

如果不知道进程号，可以使用ps -ef命令来输出所有的进程，然后通过grep命令来搜索想要的信息。
例如本例子中，则可以

ubuntu@ubuntu-standard-pc:~$ ps -ef | head -1; ps -ef | grep sleep
UID        PID  PPID  C STIME TTY          TIME CMD
root      1595  1638  0 10:35 ?        00:00:00 sleep 60
1030775+  1968  1896  0 10:35 pts/1    00:00:00 sleep 100
1030775+  2065 59302  0 10:35 ?        00:00:00 sleep 5
1030775+  2175  1896  0 10:35 pts/1    00:00:00 grep --color=auto sleep

head -1的意思是输出表头，就是UID PID PPID C STIME TTY TIME CMD那一行。

ubuntu@ubuntu-standard-pc:~$ cat /proc/1968/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

可以看到，该进程的只有B有权限，其他所有集合均没有权限。与该用户的权限是一致的。至于为什么，下文会说。（不是简单的全部复制过来哦~）

我们继续看root用户的。

ubuntu@ubuntu-standard-pc:~$ sudo -i
root@ubuntu-standard-pc:~# cat /proc/$$/status | grep Cap
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

可以看到root用户的权限，只有I和A没有，其他权限都有。与root用户本身的权限是一致的。至于为什么，下文会说。（同样不是简单的全部复制过来哦~）

1.3 在进程内部进行用户切换（进程内调用setuid和setgid）

当一个进程在执行过程中发生用户切换的时候（在进程的执行代码中，调用了系统调用setuid和setgid）那么进程的capability也会发生相应的变化。
内核代码阅读——一定要收藏啊啊啊！！！

在内核中处理这部分的代码如下：

内核代码位置/security/commoncap.c：1087行

static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
{kuid_t root_uid = make_kuid(old->user_ns, 0);if ((uid_eq(old->uid, root_uid) ||uid_eq(old->euid, root_uid) ||uid_eq(old->suid, root_uid))                 //这3个，表示进程原来的用户是root用户&&(!uid_eq(new->uid, root_uid) &&!uid_eq(new->euid, root_uid) &&!uid_eq(new->suid, root_uid)))            //这3个，表示进程限制的用户不是root用户{if (!issecure(SECURE_KEEP_CAPS)) {           //如果没有设置KEEP_CAPS标志，则清除P和E权限集合cap_clear(new->cap_permitted);cap_clear(new->cap_effective);}/** Pre-ambient programs expect setresuid to nonroot followed* by exec to drop capabilities.  We should make sure that* this remains the case.*/cap_clear(new->cap_ambient);             //不管是不是root，统统清除A}if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))cap_clear(new->cap_effective);                //曾经是root，现在切换成非root，则清除Eif (!uid_eq(old->euid, root_uid) && uid_eq(new->euid, root_uid))new->cap_effective = new->cap_permitted;    //曾经是非root，现在切换成root，则E=P
}

上述内核代码主要的功能总结如下：

进程以前是root，切换成非root用户以后。如果没有设置KEEP_CAPS标志，则清除E和P权限集。
如果设置了KEEP_CAPS标志，则保留P权限集。

总而言之，只要发生了从root到普通用户切换，E的权限都会被清除掉，P的权限则视是否设置了KKEP_CAPS标志情况而定。

1.3.1 测试内核代码

本文例子中使用golang编程语言。

代码文件名：setid.go

package mainimport ("fmt""syscall""time"
)//SetKeepCaps 表示设置保留权限(capability)标志
func SetKeepCaps() error {if _, _, err := syscall.RawSyscall(syscall.SYS_PRCTL, syscall.PR_SET_KEEPCAPS, 1, 0); err != 0 {return err}return nil
}//ClearKeepCaps 表示设置不保留权限(capability)标志
func ClearKeepCaps() error {if _, _, err := syscall.RawSyscall(syscall.SYS_PRCTL, syscall.PR_SET_KEEPCAPS, 0, 0); err != 0 {return err}return nil
}func main() {fmt.Println("Hello world!")fmt.Println("before set, the uid is ", syscall.Getuid())fmt.Println("before set, the gid is ", syscall.Getgid())fmt.Println("before set, the effective uid is ", syscall.Geteuid())fmt.Println("|***********************************|")if err := SetKeepCaps(); err != nil {fmt.Println(err)return} else {fmt.Println("*     secessfully set keep caps     *")}fmt.Println("|***********************************|")syscall.Setgid(1000)syscall.Setuid(1000)//syscall.Setgid(0)//syscall.Setuid(0)fmt.Println("after set, the uid is ", syscall.Getuid())fmt.Println("after set, the gid is ", syscall.Getgid())fmt.Println("after set, the effective uid is ", syscall.Geteuid())// if err := ClearKeepCaps(); err != nil {//  return// }// fmt.Println("after Clear, the uid is ", syscall.Getuid())// fmt.Println("after Clear, the gid is ", syscall.Getgid())time.Sleep(100 * time.Second)
}

上述代码实现的功能：

首先，设置KEEP_CAPS标志

在程序内部调用setgid和setuid系统调用，完成子进程的用户切换，从root用户切换到普通用户

程序休眠100s，在这个时间内，可以用ps命令查找该程序，查看该程序的权限capability

使用方法：

#bash命令
ubuntu@ubuntu-standard-pc:~/codes/go/capability$ go build setid.go

使用go build命令生成可执行文件，文件名为setid，没有后缀

然后使用root用户执行setid，这个setid可执行文件则是从root切换到1000用户上。

ubuntu@ubuntu-standard-pc:~/codes/go/capability$ sudo ./setid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000

可以看到程序运行正常。ctrl+C退出程序，重新以后台运行的方式运行程序

ubuntu@ubuntu-standard-pc:~/codes/go/capability$ sudo ./setid &
[1] 3778
ubuntu@ubuntu-standard-pc:~/codes/go/capability$ Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000

运行以后再敲以下回车！

该程序的pid为54152，去/proc/3778/status 文件中查找权限（Cap）。

ubuntu@ubuntu-standard-pc:~/codes/go/capability$ cat /proc/3778/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000

可以看到，该程序从root用户切换到普通用户以后，权限（capability）只有P和B，E被内核清理了。与内核代码一致。

这里的权限（capability）是进程的权限，保留的是pP和pB。

用户	是否设置KEEP_CAPS	切换前权限集合	切换后权限集合
root->root	是/否	E、I、P、B、A	E、I、P、B、A(不清除)
root->普通	是	E、I、P、B、A	I、P、B(清除E、A)
root->普通	否	E、I、P、B、A	I、B(清除E、P、A)
普通->root	是/否	E、I、P、B、A	E、I、P、B、A(E=P)

1.4 文件权限

文件只用E、I、P权限，没有A、B权限！！！
文件只用E、I、P权限，没有A、B权限！！！
文件只用E、I、P权限，没有A、B权限！！！

1.4.1 查看某个文件的权限

下文中使用fI、fP、fE来分别表示文件的I、P、E权限

每个文件同样有权限，这些权限决定了某个用户执行该文件时可以进行哪些敏感操作。一般是看可执行文件的权限。

例如，我们的终端就是一个可执行文件，位置是/bin/bash。可以去查看该文件的权限。

ubuntu@ubuntu-standard-pc:~$ getcap /bin/bash
ubuntu@ubuntu-standard-pc:~$

可以看到该文件的权限为空。

查看我们刚刚的setid可执行文件的权限：

ubuntu@ubuntu-standard-pc:~/codes/go/capability$ getcap setid
ubuntu@ubuntu-standard-pc:~/codes/go/capability$

可以看到setid可执行文件的文件权限fI、fE、fP也为空

getcap看到的文件权限是普通用户的权限！！！
getcap看到的文件权限是普通用户的权限！！！
getcap看到的文件权限是普通用户的权限！！！
重要的事情说3遍！

对于root用户而言，系统默认为root用户设置的权限为所有权限。即fE、fI、fP均为1。这里的1是指后文进行计算时候使用的1，实际拥有哪些权限还是取决于用户root权限B（边界权限）。（cat /proc/$$/status | grep CapBnd）

root用户下的官方解释

   1. If the real or effective user ID of the process is 0 (root),then the file inheritable and permitted sets are ignored;instead they are notionally considered to be all ones (i.e.,all capabilities enabled).  (There is one exception to thisbehavior, described below in Set-user-ID-root programs thathave file capabilities.)2. If the effective user ID of the process is 0 (root) or thefile effective bit is in fact enabled, then the file effectivebit is notionally defined to be one (enabled).

1.4.2 为某个文件赋权

以上文的可执行文件setid为例。setid的普通用户文件权限为空，我们来为setid赋予一点权限。

使用命令setcap来进行赋权。

root@ubuntu-standard-pc:/home/ubuntu/codes/go/capability# setcap CAP_SYS_ADMIN+eip setid
root@ubuntu-standard-pc:/home/ubuntu/codes/go/capability# getcap setid
setid = cap_sys_admin+eip

命令中的+eip(也可以用=eip)表示，在fE集合中添加cap_sys_admin权限，在fI集合中添加cap_sys_admin权限，在fP集合中添加cap_sys_admin权限

可以看到赋权成功，setid可执行文件的E、I、P权限集中都有了cap_sys_admin这个权限。

1.5 进程创建子进程的时候的权限

当我们在一个进程中创建一个子进程的时候，权限就会发生变化。

进程在进行fork()调用的时候，权限不会发生变化，子进程完全继承父进程的权限。

但是进程在进行exec()调用的时候，权限就会发生变化，具体的权限变化规则遵从以下公式：

如果子进程是root用户，则权限变化规则如下：

       p'P = pI | pBp'E = p'Pp'I = pIp'B = pB

如果子进程是普通用户，则权限变化规则如下：

       p'A = (file is privileged) ? 0 : pAp'P= (pI & fI) | (fP & pB) | p'Ap'E = fE ? p'P : p'Ap'I = pIp'B = pB

capability在docker中

docker runc启动一个容器的过程如下：

先用root用户启动runc init进程，用户为root
然后设置pB，此时的pB已经是docker的默认capability集合了。而其他的pE、pP、pI都还是原本的capability。pA为默认的空
设置KEEP_CAPS标志。保留pP
setuid和gid。此时由root->普通用户，掉权，只剩pB、pI、pP。pE为空。
此时已经是普通用户，重新设置所有权限，pB、pI、pP、pE。此时所有权限都有。
普通用户调用系统调用exec()。掉权。实行普通用户的权限变化规则

这部分代码位置

func finalizeNamespace(config *initConfig) error {// Ensure that all unwanted fds we may have accidentally// inherited are marked close-on-exec so they stay out of the// containerif err := utils.CloseExecFrom(config.PassedFilesCount + 3); err != nil {return err}capabilities := config.Config.Capabilitiesif config.Capabilities != nil {capabilities = config.Capabilities}w, err := newCapWhitelist(capabilities)if err != nil {return err}// drop capabilities in bounding set before changing userif err := w.dropBoundingSet(); err != nil {return err}// preserve existing capabilities while we change usersif err := system.SetKeepCaps(); err != nil {return err}if err := setupUser(config); err != nil {return err}if err := system.ClearKeepCaps(); err != nil {return err}// drop all other capabilitiesif err := w.drop(); err != nil {return err}if config.Cwd != "" {if err := syscall.Chdir(config.Cwd); err != nil {return fmt.Errorf("chdir to cwd (%q) set in config.json failed: %v", config.Cwd, err)}}return nil
}

设置权限的代码如下：

func (c *capsV3) Set(which CapType, caps ...Cap) {for _, what := range caps {var i uintif what > 31 {i = uint(what) >> 5what %= 32}if which&EFFECTIVE != 0 {c.data[i].effective |= 1 << uint(what)}if which&PERMITTED != 0 {c.data[i].permitted |= 1 << uint(what)}if which&INHERITABLE != 0 {c.data[i].inheritable |= 1 << uint(what)}if which&BOUNDING != 0 {c.bounds[i] |= 1 << uint(what)}}
}

runc capability设置中，没有对权限集A进行设置，也没有对权限A进行删除。所以A一直为空。

runc顶层过程代码如下：

func (l *linuxSetnsInit) Init() error {if !l.config.Config.NoNewKeyring {// do not inherit the parent's session keyringif _, err := keys.JoinSessionKeyring(l.getSessionRingName()); err != nil {return err}}if l.config.NoNewPrivileges {if err := system.Prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); err != nil {return err}}if l.config.Config.Seccomp != nil {if err := seccomp.InitSeccomp(l.config.Config.Seccomp); err != nil {return err}}if err := finalizeNamespace(l.config); err != nil {return err}if err := apparmor.ApplyProfile(l.config.AppArmorProfile); err != nil {return err}if err := label.SetProcessLabel(l.config.ProcessLabel); err != nil {return err}// close the statedir fd before exec because the kernel resets dumpable in the wrong order// https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318syscall.Close(l.stateDirFD)return system.Execv(l.config.Args[0], l.config.Args[0:], os.Environ())
}

调用execv以后，发生掉权。
计算过程：

       p'A = (file is privileged) ? 0 : pAp'P= (pI & fI) | (fP & pB) | p'Ap'E = fE ? p'P : p'Ap'I = pIp'B = pB

由于所有f的capability都为0，pA也为0，所以p’A=0。
p’A = 0
p’P = 0
p’E = 0
p’I = pI
p’B = pB

二、 docker 启用userns-remap

2.1 容器内部为root用户

先在主机侧创建用户

groupadd -g 10000 dockeruser
useradd -u 10000 -g dockeruser -d /home/dockeruser -m dockeruser

启用userns-remap

#vim /etc/docker/daemon.json
{..."userns-remap":"dockeruser",...
}

systemctl stop docker
systemctl daemon-reload
systemctl start docker

启用userns-remap以后。

Dockerfile如下：

FROM centos
ADD setid .
RUN setcap cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip setid

docker build

root@ubuntu-standard-pc:~# docker build -t centos:host-root-origin .

docker run

root@ubuntu-standard-pc:~# docker run -it --name centos-host-root-origin centos:host-root-origin /bin/bash

2.1.1 在容器侧的权限

[root@e72c2e81500e /]# capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0secure-noroot: no (unlocked)secure-no-suid-fixup: no (unlocked)secure-keep-caps: no (unlocked)secure-no-ambient-raise: no (unlocked)
uid=0(root)
gid=0(root)
groups=

权限与没有开启user-remap一致。且可以进入容器的root目录。

2.1.2 在主机侧的权限

查找docker进程在主机侧的pid

ubuntu@ubuntu-standard-pc:~$ ps -ef | grep e72c2e81500e
root        4253       1  0 23:12 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id e72c2e81500ec485c5664216ead98eaf5e7b7fd71b4521d4748ea6e87dbac2a3 -address /run/containerd/containerd.sock
ubuntu      4342    4334  0 23:13 pts/2    00:00:00 grep --color=auto e72c2e81500e
ubuntu@ubuntu-standard-pc:~$ ps -ef | grep 4253
root        4253       1  0 23:12 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id e72c2e81500ec485c5664216ead98eaf5e7b7fd71b4521d4748ea6e87dbac2a3 -address /run/containerd/containerd.sock
165536      4274    4253  0 23:12 pts/0    00:00:00 /bin/bash
ubuntu      4344    4334  0 23:13 pts/2    00:00:00 grep --color=auto 4253

查看主机侧docker的权限

ubuntu@ubuntu-standard-pc:~$ cat /proc/4274/status | grep Cap
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
ubuntu@ubuntu-standard-pc:~$ capsh --decode=00000000a80425fb
WARNING: libcap needs an update (cap=40 should have a name).
0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap

可以看到，在主机侧，权限与在容器内看到的权限是一致的。

2.2 容器内部为普通用户

Dockerfile

FROM centos
ADD setid .
ADD helloworld .
ADD setid-chmod .
ADD setrootid .
RUN chmod +s setid-chmod
RUN chmod +s setrootid
RUN groupadd -g 20000 dockercentos
RUN useradd -u 20000 -g dockercentos -d /home/dockercentos -m dockercentos
USER dockercentos

docker build

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-usr# docker build -t centos:host-usr-chmod .
Sending build context to Docker daemon  7.139MB
Step 1/10 : FROM centos---> 5d0da3dc9764
Step 2/10 : ADD setid .---> Using cache---> 01aae451ee4a
Step 3/10 : ADD helloworld .---> Using cache---> 91b5dc55ce84
Step 4/10 : ADD setid-chmod .---> Using cache---> a2b67950e1b2
Step 5/10 : ADD setrootid .---> Using cache---> af10aae2cff3
Step 6/10 : RUN chmod +s setid-chmod---> Using cache---> 62b534a30c89
Step 7/10 : RUN chmod +s setrootid---> Using cache---> 9afa21fd8b32
Step 8/10 : RUN groupadd -g 20000 dockercentos---> Running in d233ce98d0f0
Removing intermediate container d233ce98d0f0---> a29952a344b4
Step 9/10 : RUN useradd -u 20000 -g dockercentos -d /home/dockercentos -m dockercentos---> Running in 9bee1a5b9ad6
Removing intermediate container 9bee1a5b9ad6---> 3ccf8e7c7b7d
Step 10/10 : USER dockercentos---> Running in 05feed2c8819
Removing intermediate container 05feed2c8819---> 21b170f459fb
Successfully built 21b170f459fb
Successfully tagged centos:host-usr-chmod

docker run

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-usr# docker run -it --name centos-df-usr centos:host-usr-chmod /bin/bash

2.2.1 在容器侧的权限

[dockercentos@b4b1eccfcd6c /]$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0secure-noroot: no (unlocked)secure-no-suid-fixup: no (unlocked)secure-keep-caps: no (unlocked)secure-no-ambient-raise: no (unlocked)
uid=20000(dockercentos)
gid=20000(dockercentos)
groups=

2.2.2 在主机侧的权限

ubuntu@ubuntu-standard-pc:~$ ps -ef | grep b4b1eccfcd6c
root       12587       1  0 00:57 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id b4b1eccfcd6ce1daadbe5fb6059cb9a6fea631f81eaaf0b3ae97ba839e41f64b -address /run/containerd/containerd.sock
ubuntu     12656    8418  0 00:58 pts/2    00:00:00 grep --color=auto b4b1eccfcd6c
ubuntu@ubuntu-standard-pc:~$ ps -ef | grep 12587
root       12587       1  0 00:57 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id b4b1eccfcd6ce1daadbe5fb6059cb9a6fea631f81eaaf0b3ae97ba839e41f64b -address /run/containerd/containerd.sock
185536     12609   12587  0 00:57 pts/0    00:00:00 /bin/bash
ubuntu     12658    8418  0 00:58 pts/2    00:00:00 grep --color=auto 12587

查看主机侧的权限

ubuntu@ubuntu-standard-pc:~$ cat /proc/12609/status | grep Cap
CapInh: 00000000a80425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000

可以看到，容器内为普通用户的时候，在主机侧的权限只有I和E。

2.3 容器中的chmod

如果容器内部的某个文件，在dockerfile中设置了权限，但是容器本身没有这个权限，则无法运行该文件。
如下：

Dockerfile

FROM centos
ADD setid .
ADD helloworld .
RUN setcap cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip setid
RUN setcap cap_sys_admin+eip helloworld

文件helloworld拥有权限cap_sys_admin，但是容器默认权限中没有该权限。
设置的setid文件的权限=容器默认权限集。

创建docker镜像

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root-cap# docker build -t centos:host-root-captest .
Sending build context to Docker daemon  3.562MB
Step 1/5 : FROM centos---> 5d0da3dc9764
Step 2/5 : ADD setid .---> Using cache---> fff6dba319f3
Step 3/5 : ADD helloworld .---> Using cache---> e22e26214e9d
Step 4/5 : RUN setcap cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip setid---> Running in 2094b4999f5f
Removing intermediate container 2094b4999f5f---> 44ce5251d8b7
Step 5/5 : RUN setcap cap_sys_admin+eip helloworld---> Running in 871c82fa3e98
Removing intermediate container 871c82fa3e98---> d2bfa0175e88
Successfully built d2bfa0175e88
Successfully tagged centos:host-root-captest

运行镜像

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root-cap# docker run -it --rm centos:host-root-captest /bin/bash[root@339688eb874d /]# ls
bin  dev  etc  helloworld  home  lib  lib64  lost+found  media mnt  opt  proc  root  run  sbin  setid  srv  sys  tmp  usr  var
[root@339688eb874d /]# ./helloworld
bash: ./helloworld: Operation not permitted
[root@339688eb874d /]# ./setid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000

可以看到，helloworld文件无权限运行，setid文件有权限运行。

2.3.1 使用–cap-drop和–cap-add配合分配capabili

2.3.1.1 容器内为root用户

Dockerfile如下

FROM centos
ADD setid .
ADD helloworld .
ADD setid-chmod .
ADD setrootid .
RUN chmod +s setid-chmod
RUN chmod +s setrootid

setrootid是setid.go中，把setuid和setgid的值改为0。

docker build

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker build -t centos:host-chmod .
Sending build context to Docker daemon  7.139MB
Step 1/7 : FROM centos---> 5d0da3dc9764
Step 2/7 : ADD setid .---> 01aae451ee4a
Step 3/7 : ADD helloworld .---> 91b5dc55ce84
Step 4/7 : ADD setid-chmod .---> a2b67950e1b2
Step 5/7 : ADD setrootid .---> af10aae2cff3
Step 6/7 : RUN chmod +s setid-chmod---> Running in 5cd11d90e4ee
Removing intermediate container 5cd11d90e4ee---> 62b534a30c89
Step 7/7 : RUN chmod +s setrootid---> Running in 39a322c185dd
Removing intermediate container 39a322c185dd---> 9afa21fd8b32
Successfully built 9afa21fd8b32
Successfully tagged centos:host-chmod

2.3.1.1.1 不使用no-new-privileges

运行docker

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker run -it --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --rm centos:host-chmod /bin/bash
[root@07b6e17cb6d7 /]# ls
bin  etc     home  lib64       media  opt   root  sbin   setid-chmod  srv  tmp  var
dev  helloworld  lib   lost+found  mnt   proc  run   setid  setrootid    sys  usr
[root@07b6e17cb6d7 /]# ./setid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
[root@07b6e17cb6d7 /]# ./setid-chmod
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
[root@07b6e17cb6d7 /]# ./setrootid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  0
after set, the gid is  0
after set, the effective uid is  0

可以看到，可以使用setuid和setgid等。且容器内部，在seuid以前，实际的euid用户是0，root用户。

2.3.1.1.2 使用no-new-privileges

docker run

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker run -it --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --rm --security-opt=no-new-privileges centos:host-chmod /bin/bash
[root@1c3a94e2c741 /]# ls
bin  etc     home  lib64       media  opt   root  sbin   setid-chmod  srv  tmp  var
dev  helloworld  lib   lost+found  mnt   proc  run   setid  setrootid    sys  usr
[root@1c3a94e2c741 /]# ./setid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
[root@1c3a94e2c741 /]# ./setid-chmod
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
[root@1c3a94e2c741 /]# ./setrootid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  0
after set, the gid is  0
after set, the effective uid is  0

可以看到，与不使用no-new-privileges效果一样。euid用户依然是0(root用户)

2.3.1.2 容器内为普通用户

2.3.1.2.1 不开启no-new-privileges

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker run -it --rm --user 10000:10000 --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} centos:host-chmod /bin/bash
bash-4.4$ ./setid
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  10000
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  10000
after set, the gid is  10000
after set, the effective uid is  10000
^C
bash-4.4$ ./setid-chmod
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
bash-4.4$ ./setrootid
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  0
after set, the gid is  0
after set, the effective uid is  0

进入容器的uid和gid都指定了，且都不为root时，进入容器是完全的普通用户，setid可执行文件由于没有进行chmod提权行为，所以没有setuid和setgid的权限，无法进行setuid和setgid操作。
而setid-chmod可执行文件在dockerfile中使用了chmod +s进行提权，使得setid-chmod文件在执行的时候拥有root权限，（euid为0），所以setid-chmod文件可以进行setuid和setgid操作。该文件在容器内为root。
经过了chmod +s提权以后的文件，可以通过调用setuid和setgid，使得该文件(setrootid)可以切换为root用户。

2.3.1.2.2 开启no-new-privileges

docker run

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker run -it --user 10000:10000 --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --rm --security-opt=no-new-privileges centos:host-chmod /bin/bash
bash-4.4$ ls
bin  etc     home  lib64       media  opt   root  sbin   setid-chmod  srv  tmp  var
dev  helloworld  lib   lost+found  mnt   proc  run   setid  setrootid    sys  usr
bash-4.4$ ./setid
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  10000
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  10000
after set, the gid is  10000
after set, the effective uid is  10000
^C
bash-4.4$ ./setid-chmod
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  10000
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  10000
after set, the gid is  10000
after set, the effective uid is  10000
^C
bash-4.4$ ./setrootid
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  10000
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  10000
after set, the gid is  10000
after set, the effective uid is  10000

当开启了no-new-privileges时，无法通过chmod提权的方式，让普通用户的进程切换到root用户。
可以看到，当进入容器的uid和gid都指定了，且都不为root时，进入容器是完全的普通用户，没有setuid和setgid的权限，无法进行setuid和setgid操作。

docker run -it --name centos-host-runroot-nonewprivileges --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --security-opt=no-new-privileges centos:host-root-origin /bin/bash

[root@LIN-29076BB8489 centos-root]# docker run -it --name centos-host-root-nonewprivileges --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --security-opt=no-new-privileges centos:host-root-origin /bin/bash
[root@3215a191e737 /]# capsh --print
Current: = cap_setgid,cap_setuid,cap_setfcap+eip
Bounding set =cap_setgid,cap_setuid,cap_sys_admin,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0secure-noroot: no (unlocked)secure-no-suid-fixup: no (unlocked)secure-keep-caps: no (unlocked)secure-no-ambient-raise: no (unlocked)
uid=0(root)
gid=0(root)
groups=

在主机侧查看该容器的capability

[root@LIN-29076BB8489 centos-org]# ps -ef | grep 3215a191e737
root      8080     1  0 17:11 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3215a191e73770dd26b393ba99acf183d0380d717da6e2c68e167f554c57a418 -address /run/containerd/containerd.sock
root      9644 58947  0 17:13 pts/1    00:00:00 grep --color=auto 3215a191e737
[root@LIN-29076BB8489 centos-org]# ps -ef | grep 8080
root      8080     1  0 17:11 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3215a191e73770dd26b393ba99acf183d0380d717da6e2c68e167f554c57a418 -address /run/containerd/containerd.sock
100000    8100  8080  0 17:11 pts/0    00:00:00 /bin/bash
root      9697 58947  0 17:13 pts/1    00:00:00 grep --color=auto 8080

47753则是容器主进程/bin/bash的pid。

[root@LIN-29076BB8489 centos-org]# cat /proc/8100/status | grep Cap
CapInh: 00000000800000c0
CapPrm: 00000000800000c0
CapEff: 00000000800000c0
CapBnd: 00000000800000c0
CapAmb: 0000000000000000
[root@LIN-29076BB8489 centos-org]# capsh --decode=00000000800000c0
0x00000000800000c0=cap_setgid,cap_setuid,cap_setfcap

主机侧，容器主进程只有赋予的有限权限。

三、总结

3.1 docker容器启动过程权限变化

容器内如果为普通用户，容器中的权限为
p’A = 0
p’P = 0
p’E = 0
p’I = pI
p’B = pB

3.2 限制容器的权限

限制容器权限的方法有3种。

docker启用userns-remap，将docker中的root用户映射到主机上的普通用户
容器中使用普通用户，docker在启动容器的时候会进行setuid直接掉权，只剩权限集合I，在主机侧无任何权限。
使用cap-drop all 和cap-add指定权限，docker容器只拥有cap-add指定的个别权限。

启用userns-remap

#vim /etc/docker/daemon.json
{...    "userns-remap":"用户名",...
}

通过启用user-remap，将容器内的用户映射到主机的指定用户上，当指定的主机用户不是root用户时，容器内的root用户则映射为主机上的普通用户。

如果没有使用cap-drop，则容器拥有docker默认权限。且容器内为root用户时，有权限集合E、I、P、B。

3.2.1 启用userns-remap

是否开启userns-remap	效果
是	主机侧是`普通用户`，如果需要使用主机侧的某些权限，需要使用cap-add增加容器对应的权限，否则容器只有docker默认的权限
否	主机侧是`root用户`，拥有root用户权限

3.2.2 限制容器内的用户

容器内的用户	效果
普通用户	容器内和主机侧的权限集合都只有`I`，如果要执行任何需要权限的操作，都需要提前在dockerfile中对对应的程序赋权
root用户	容器内和主机侧的权限集合有`E、I、P、B`，如果需要执行需要权限的操作，只要使用cap-add对应权限即可操作，不需要在dockerfile中赋权

3.2.3 使用cap-add和cap-drop

一般使用cap-drop=all来删除docker默认的权限，然后使用cap-add添加自定义的权限。

是否使用cap-drop和cap-add	效果
是	容器拥有的权限只有cap-drop删掉以后，cap-add增加的指定权限，没有其他权限
否	容器拥有的权限docker默认的权限

是否使用–security-opt=no-new-privileges	效果
是	限制容器内通过chmod提权的普通用户进程`(uid != 0, euid = 0)`，无法进行`setuid`等操作，`对root用户无效`
否	容器内通过chmod提权的普通用户进程`(uid != 0, euid = 0)`，可以进行`setuid`等操作，将uid切换为0，可以获取root用户完整权限

4 建议

4.1 指导原则
容器拥有的权限必须>=程序的权限，否则无法运行该程序。

4.1 建议实施方案1
开启userns-remap，容器内部为root用户，使用cap-drop all cap-add={指定权限}，且设置no-new-privileges。

优点：dockerfile中不需要setcap，构建的镜像少一层。

缺点：容器内用户为root用户，在主机侧有E、I、P权限集合，可以进行某些需要权限的操作。

4.2 建议实施方案2
开启userns-remap，容器内部为普通用户，使用cap-drop all cap-add={指定权限}，在dockerfile中setcap，使用no-new-privileges

优点：容器内用户为普通用户，在主机侧只有权限I，如果容器内进程没有在dockerfile中setcap，则无法进行需要权限的操作。

缺点：需要在dockerfile中setcap。构建的镜像多一层，且无法使用提权小程序进行提权。

linux capability详解与容器中的capability相关推荐

如何安装新linux内核,详解Debian系统中安装Linux新内核的流程
一直对Linux内核很有兴趣,但苦于入门不易,认真看了ldd前5章突然就来感觉了,光看不练不顶用,首先就需要环境搭建. 使用的是Debian 5.0,内核2.6.26,欲安装的新内核为2.6.28,这 ...
linux命令详解--eval
linux命令详解--eval shell中的eval 功能说明:重新运算求出参数的内容. 语法:eval [参数] 补充说明:eval可读取一连串的参数,然后再依参数本身的特性来执行. 参数:参 ...
html子布局不超出父布局,详解flex布局中保持内容不超出容器的解决办法
在移动端,flex 布局很好用,它能够根据设备宽度来自动调整容器的宽度,用起来很方便,已经越来越离不开它,但是最近在做项目的时候发现一个问题. 就是在一个设置了 flex:1 的容器中,如果文字很长, ...
docker ps命令详解列出运行中的容器
docker ps命令详解列出运行中的容器使用docker ps命令即可列出运行中的容器,执行该命令后,会出现如下7列表格 CONTAINER_ID 表示容器ID IMAGE ...
linux中date使用方法,linux命令详解date使用方法(计算母亲节和父亲节日期脚本示例)...
linux命令详解date使用方法(计算母亲节和父亲节日期脚本示例) 发布于 2016-02-07 15:58:40 | 108 次阅读 | 评论: 0 | 来源: 网友投递 LinuxLinux是一 ...
linux系统中的挂载有什么用,linux 挂载详解
linux 挂载详解发布时间:2009-06-10 00:16:54 作者:佚名我要评论 linux是一个优秀的开放源码的操作系统,可以运行在大到巨型小到掌上型各类计算机系统上,随着lin ...
linux中权限详解,linux 权限详解
转载自博客园: 用户组在linux中的每个用户必须属于一个组,不能独立于组外.在linux中每个文件有所有者.所在组.其它组的概念 - 所有者 - 所在组 - 其它组 - 改变用户所在的组所有者 ...
安装linux子系统报错,详解win10电脑中安装linux子系统出现错误0x80070057方法
现在很多的小伙伴在安装系统的时候都是选择安装双系统的操作的,那其实可以根据需求安装 Win10的系统还有linux系统的是很常见的开发用户的程序的选择,win10电脑就内置有linux子系统的,安装出 ...
linux exec 脚本之家,详解Shell脚本中调用另一个Shell脚本的三种方式
主要以下有几种方式: Command Explanation fork 新开一个子 Shell 执行,子 Shell 可以从父 Shell 继承环境变量,但是子 Shell 中的环境变量不会带回给父 ...
云原生存储详解：容器存储与 K8s 存储卷
作者 | 阚俊宝阿里云技术专家导读:云原生存储详解系列文章将从云原生存储服务的概念.特点.需求.原理.使用及案例等方面,和大家一起探讨云原生存储技术新的机遇与挑战.本文为该系列文章的第二篇,会对容 ...

linux capability详解与容器中的capability

linux capability详解

一、capability概述

1.1 查看当前用户的权限

1.2 进程的权限

1.3 在进程内部进行用户切换（进程内调用setuid和setgid）

1.3.1 测试内核代码

1.4 文件权限

1.4.1 查看某个文件的权限

1.4.2 为某个文件赋权

1.5 进程创建子进程的时候的权限

capability在docker中

二、 docker 启用userns-remap

2.1 容器内部为root用户

2.1.1 在容器侧的权限

2.1.2 在主机侧的权限

2.2 容器内部为普通用户

2.2.1 在容器侧的权限

2.2.2 在主机侧的权限

2.3 容器中的chmod

2.3.1 使用–cap-drop和–cap-add配合分配capabili

2.3.1.1 容器内为root用户

2.3.1.1.1 不使用no-new-privileges

2.3.1.1.2 使用no-new-privileges

2.3.1.2 容器内为普通用户

2.3.1.2.1 不开启no-new-privileges

2.3.1.2.2 开启no-new-privileges

三、总结

3.1 docker容器启动过程权限变化

3.2 限制容器的权限

3.2.1 启用userns-remap

3.2.2 限制容器内的用户

3.2.3 使用cap-add和cap-drop

linux capability详解与容器中的capability相关推荐

最新文章

热门文章