Kubernetes SecurityContext Capabilities Explained [Examples]

Kubernetes SecurityContext Capabilities Introduction

With Kubernetes you can control the level of privilege assigned to each Pod and container. We can utilize Kubernetes SecurityContext Capabilities to add or remove Linux Capabilities from the Pod and Container so the container can be made more secure from any kind of intrusion. The Kubernetes SecurityContext Capabilities is tightly coupled with Pod Security Policy which defines the policy for the entire cluster. Later we use these policies with PSP (Pod Security Policy) to map the Pods and control the privilege.

In this tutorial we will give a brief overview on Pod Security Policy (for detailed understanding on PSP you can read my older article Create Pod Security Policy Kubernetes [Step-by-Step]). Then we will explore Kubernetes SecurityContext Capabilities in detail with multiple examples covering different scenarios.

Create Pod Security Policy

First we will create our Pod Security Policy which we will use through out this article. Here is my PSP definition file along with Cluster Role and Cluster Role Binding:

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: testns-psp-01
spec:
  privileged: true
  allowPrivilegeEscalation: true
  requiredDropCapabilities:
  allowedCapabilities:
  - '*'
  defaultAddCapabilities:
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  volumes:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: testns-psp-01
rules:
- apiGroups:
  - policy
  resourceNames:
  - testns-psp-01
  resources:
  - podsecuritypolicies
  verbs:
  - use

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: testns-psp-01
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: testns-psp-01
subjects:
  - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:authenticated
  - kind: Group
    name: system:serviceaccounts
    apiGroup: rbac.authorization.k8s.io

Here is the output of my installed PSP:

]# kubectl get psp | grep -E 'PRIV|testns'
NAME                                PRIV    CAPS               SELINUX    RUNASUSER          FSGROUP     SUPGROUP    READONLYROOTFS   VOLUMES
testns-psp-01                       false   *                  RunAsAny   MustRunAsNonRoot   RunAsAny    RunAsAny    false            *

In our Pod Security Policy we have not added any restrictions and everything is allowed basically.

How to create a privileged container inside a Kubernetes Pod

In this example first we will create a privileged pod which should have all the capabilities. In most of the cases following Kubernetes SecurityContext Capability definition should be enough to start a privileged pod:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-statefulset
  namespace: testns
spec:
  selector:
    matchLabels:
      app: dev
  serviceName: test-pod
  replicas: 2
  template:
    metadata:
      labels:
        app: dev
    spec:
      containers:
      - name: test-statefulset
        image: golinux-registry:8090/secure-context-img:latest
        command: ["supervisord", "-c", "/etc/supervisord.conf"]
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1025
          ## enable privileged mode
          privileged: true

Create this statefulset:

]# kubectl create -f test-statefulset.yaml 
statefulset.apps/test-statefulset created

Check the list of allowed capabilities:

]# kubectl exec -it test-statefulset-0 -n testns -- capsh --print
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37+i
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)

As you can see, all the capabilities are allowed in our container.

In some cases, if you don’t see all the capabilities added to your container then you can use below Kubernetes SecurityContext Capabilities:

...
        securityContext:
          runAsUser: 1025
          privileged: true
          allowPrivilegeEscalation: true
          capabilities:
            add:
             - ALL
...

This YAML file expects the respective Pod Security Policy has allowed all capabilities.

How to create a non-privileged container inside a Kubernetes Pod

Now you may wonder that by using privileged as true enables all the privilege so just by making it false, the pod should execute as no-privilege?

Let’s try this theory using this practical example, we have updated our statefulset definition file with the following Kubernetes SecurityContext Capabilities field:

...
      containers:
      - name: test-statefulset
        image: golinux-registry:8090/secure-context-img:latest
        command: ["supervisord", "-c", "/etc/supervisord.conf"]
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1025
          privileged: false
          allowPrivilegeEscalation: false
...

So, basically I have disabled privilege and any kind of privilege escalation inside the container. Once we create this statefulset, let’s verify the available capabilities on the pod:

]# kubectl exec -it test-statefulset-0 -n testns -- capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=

As you can see, even with privileged: false, the container still has multiple capabilities enabled so it is actually not a non-privileged pod.

Solution-1: Drop all capabilities using requiredDropCapabilities inside Pod Security Policy

I would not recommend this solution because PSP are created for whole cluster and it does not make sense to disable all the privilege in the PSP just for one pod. Although you can use RBAC to limit the usage of this PSP only for certain user, in which case this method can be used.

But either way, I will share the steps to drop all the privileges using a Pod Security Policy and you may choose your preferred method.

We will edit our testns-psp-01 using kubectl edit psp testns-psp-01 -n testns command which will open the PSP definition file using your default editor. After updating the same, this is what my Kubernetes SecurityContext Capabilities looks like for the PSP:

...
spec:
  allowPrivilegeEscalation: false
  fsGroup:
    rule: RunAsAny
  requiredDropCapabilities:
  - ALL
  runAsUser:
    rule: MustRunAsNonRoot
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - '*'

So, basically I have removed the allowedCapabilities section and added requiredDropCapabilities field which will drop all the default capabilities from the container inside the Pod.

We will re-deploy our statefulset to pick up the new changes. Next verify the available capabilities inside the container:

]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print
Current: =
Bounding set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=

Now as you can see, we cant see any capabilities assigned to our container. So now this is a proper non-privileged container inside a Kubernetes Pod

Solution-2: Using Kubernetes SecurityContext Capabilities in the Pod definition file

Next we will use the Pod definition file to start a non-privileged container by using Kubernetes SecurityContext Capabilities field. In addition to privileged: false, we must explicitly drop all the capabilities as shown below:

...
      containers:
      - name: test-statefulset
        image: golinux-registry:8090/secure-context-img:latest
        command: ["supervisord", "-c", "/etc/supervisord.conf"]
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1025
          privileged: false
          allowPrivilegeEscalation: false
          capabilities:
            drop:
             - ALL
...

Let us re-deploy our statefulset and verify the applied Linux capabilities inside the container:

]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print
Current: =
Bounding set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=

So as expected, the container has dropped all the capabilities and can be used as a non-privileged container in Kubernetes Pod.

How to assign limited Linux capabilities to a container inside Kubernetes Pod

Now that we know how to have a privileged and non-privileged pod, let me show you some example to create a pod with limited privilege.

In this example we will only add SYS_TIME capability to our container inside the Kubernetes Pod. To achieve this, I have modified my Pod Security Policy to allow privileged pods and allow all capabilities to be added. We don’t want to restrict this at PSP level, rather we will control this at Pod level.

]# kubectl get psp | grep -E 'PRIV|testns'
NAME                                PRIV    CAPS               SELINUX    RUNASUSER          FSGROUP     SUPGROUP    READONLYROOTFS   VOLUMES
testns-psp-01                       true    *                  RunAsAny   MustRunAsNonRoot   RunAsAny    RunAsAny    false            *

Here is the snippet of my Kubernetes SecurityContext Capabilities which I will use to first drop all the capabilities and then only add SYS_TIME capability

IMPORTANT

Here the order is very important, if you first provide the add field with SYS_TIME and then later provide the drop ALL field then all the capabilities would be dropped from the container. So, make sure you use drop first followed by add.

...
    spec:
      containers:
      - name: test-statefulset
        image: golinux-registry:8090/secure-context-img:latest
        command: ["supervisord", "-c", "/etc/supervisord.conf"]
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1025
          privileged: false
          allowPrivilegeEscalation: true
          capabilities:
            drop:
             - ALL
            add:
             - SYS_TIME
...

Let us re-deploy our statefulset and check the applied capabilities:

]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print
Current: = cap_sys_time+i
Bounding set =cap_sys_time
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=

As expected, the container has dropped all the other capabilities and only applied SYS_TIME.

How to check the list of capabilities applied to a container inside Kubernetes Pod

Let me show you different ways to get the list of capabilities applied to your Kubernetes Pod’s container:

Method-1: Check the list of Linux capabilities in a container using capsh –print command

We will use capsh command to print the list of applied capabilities to any container.

[user1@test-statefulset-1 /]$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap

Here, we have two fields:
Current: This field contains the list of capabilities currently in use by the system process
Bounding Set: Tis field contains the list of capabilities which can be used if required by any of the system or application process

You may also notice +i at the end of Current set of capabilities. These are Thread Capability Set, there are three different types of thread capability set which can be defined or allocated:

Effective - the capabilities used by the kernel to perform permission checks for the thread.
Permitted - the capabilities that the thread may assume (i.e., a limiting superset for the effective and inheritable sets). If a thread drops a capability from its permitted set, it can never re-acquire that capability (unless it exec()s a set-user-ID-root program).
inheritable - the capabilities preserved across an execve(2). A child created via fork(2) inherits copies of its parent’s capability sets. See below for a discussion of the treatment of capabilities during exec(). Using capset(2), a thread may manipulate its own capability sets, or, if it has the CAP_SETPCAP capability, those of a thread in another process.

Method-2: Check applied capabilities per process

The above command was showing us system wide Linux capabilities, we can also list the capabilities which are being used by individual process. For example, on my container I have the following process running:

[user1@test-statefulset-1 /]$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
user1          1     0  0 17:38 ?        00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf
user1          9     1  0 17:38 ?        00:00:00 /usr/sbin/rsyslogd -n -f /tmp/rsyslog.conf -i /tmp/rsyslog.pid
root        10     1  0 17:38 ?        00:00:00 /usr/sbin/sshd -D -f /opt/ssh/sshd_config -p 5022 -E /tmp/sshd.log
user1        643     0  0 17:48 pts/0    00:00:00 bash
user1       1214   643  0 17:58 pts/0    00:00:00 ps -ef

Now I want to check the list of capabilities used by my SSHD process which has PID 10.

[user1@test-statefulset-1 /]$ grep Cap /proc/10/status 
CapInh: 00000000a82425fb
CapPrm: 00000000a82425fb
CapEff: 00000000a82425fb
CapBnd: 00000000a82425fb
CapAmb: 0000000000000000

Here,

CapInh = Inherited capabilities
CapPrm – Permitted capabilities
CapEff = Effective capabilities
CapBnd = Bounding set
CapAmb = Ambient capabilities set

So we get some hex code value for different capabilities. To convert the hexcode into actual human readable format of capabilities we will use following command:

[user1@test-statefulset-1 /]$ capsh --decode=00000000a82425fb
0x00000000a82425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap

So, now we have the list of capabilities used by the SSHD process.

How to assign Linux capability to individual file or binary (setcap)

By default many Linux system binaries will have some capabilities assigned to them. You can check this using getcap command. For example to check the list of capability assigned to ping command we can use:

[user1@test-statefulset-1 /]$ getcap `which ping`
/usr/bin/ping = cap_net_admin,cap_net_raw+p

So ping command requires cap_net_admin and cap_net_raw to be able to function properly.

Let’s use ping with the default capabilities:

[user1@test-statefulset-1 /]$ capsh -- -c "/bin/ping -c 1 localhost"
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.018 ms

--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.018/0.018/0.018/0.000 ms

This seems to be working, let’s try the same command but without cap_net_admin capability:

[user1@test-statefulset-1 /]$ capsh --drop=cap_net_admin -- -c "/bin/ping -c 1 localhost"
unable to raise CAP_SETPCAP for BSET changes: Operation not permitted

As you can see, ping command fails to execute with Operation not permitted error.

To add capability to any file we can use setcap command. Let us add some capability to /usr/sbin/sshd binary, currently as you can see there are no capabilities assigned to this binary:

[user1@test-statefulset-1 /]$ getcap /usr/sbin/sshd

Next I will add NET_ADMIN capability to this binary file:

[user1@test-statefulset-1 /]$ setcap cap_net_admin+i /usr/sbin/sshd

Verify the same again:

[user1@test-statefulset-1 /]$ getcap /usr/sbin/sshd
/usr/sbin/sshd = cap_net_admin+i

Summary

In this tutorial we explored different areas related to Kubernetes SecurityContext Capabilities. We covered following topics in this article:

Create a privileged and non-privileged container inside a Kubernetes Pod.
How to add or drop all the capabilities from a Pod.
How to add single or pre-defined set of capabilities to a container
Understanding more about Linux Capabilities
How to check if capabilities are assigned to a container