Beginners guide on Kubernetes StatefulSets with examples

In this tutorial we will learn about Kubernetes StatefulSets using different examples. StatefulSets was introduced in Kubernetes 1.5; it consists of a bond between the pod and the Persistent Volume.

Overview on Kubernetes StatefulSets

We learned about ReplicaSets which creates multiple pod replicas from a single pod template. These replicas don’t differ from each other, apart from their name and IP address. If the pod template includes a volume, which refers to a specific PersistentVolumeClaim, all replicas of the ReplicaSet will use the exact same PersistentVolumeClaim and therefore the same PersistentVolume bound by the claim

Beginners guide on Kubernetes Statefulsets with examples

All pods from the same ReplicaSet always use the same PersistentVolumeClaim and PersistentVolume.

Instead of using a ReplicaSet to run these types of pods, we can create a StatefulSet resource, which is specifically tailored to applications where instances of the application must be treated as completely alike individuals, with each one having a stable name and state.

Each pod created by a StatefulSet is assigned an ordinal index (zero-based), which is then used to derive the pod’s name and hostname, and to attach stable storage to the pod. The names of the pods are thus predictable, because each pod’s name is derived from the StatefulSet’s name and the ordinal index of the instance. Rather than the pods having random names, they’re nicely organized,

Pods created by a StatefulSet have predictable names (and hostnames), unlike those created by a ReplicaSet

When a pod instance managed by a StatefulSet disappears (because the node the pod was running on has failed, it was evicted from the node, or someone deleted the pod object manually), the StatefulSet makes sure it’s replaced with a new instance—similar to how ReplicaSets do it. But in contrast to ReplicaSets, the replacement pod gets the same name and hostname as the pod that has disappeared.

A StatefulSet replaces a lost pod with a new one with the same identity, whereas a ReplicaSet replaces it with a completely new unrelated pod.

To summarise, Kubernetes StatefulSet manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.

Limitations

The storage for a given Pod must either be provisioned by a PersistentVolume Provisioner based on the requested storage class, or pre-provisioned by an admin.
Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources.
StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service.
StatefulSets do not provide any guarantees on the termination of pods when a StatefulSet is deleted. To achieve ordered and graceful termination of the pods in the StatefulSet, it is possible to scale the StatefulSet down to 0 prior to deletion.
When using Rolling Updates with the default Pod Management Policy (OrderedReady), it’s possible to get into a broken state that requires manual intervention to repair.

Creating a StatefulSet resource

It makes sense to use a dynamic provisioning and storage class with StatefulSet because without this a cluster administrator to provision the actual storage up front. Kubernetes can also perform this job automatically through dynamic provisioning of PersistentVolumes.

Currently (at the time of writing this tutorial) dynamic provisioning is possible only with following providers:

Cloud Provider	Default StorageClass Name	Default Provisioner
Amazon Web Services	gp2	aws-ebs
Microsoft Azure	standard	azure-disk
Google Cloud Platform	standard	gce-pd
OpenStack	standard	cinder
VMware vSphere	thin	vsphere-volume

Configure NFS Server

Since I am using Virtual Machines to demonstrate this tutorial, I will use NFS server as the backend Persistent Volume. The downside is that I must manually create all the PV required for the number of replicas in the StatefulSets. I had already configured my NFS server on the controller node in the previous article while learning about Kubernetes Persistent Volumes.

Following are the shares which I have exported for the 3 replicas which I plan to create with StatefulSets:

[root@controller ~]# exportfs -v
/share1         (sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/share2         (sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/share3         (sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)

Create Persistent Volume

Next I need to create 3 persistent volumes for respective shares. Again I would like to repeat myself that if you are using dynamic provisioning then you just need to create a storage class and don’t have to worry about creating volumes for the Pods. Here since I am manually creating the Persistent Volumes, the StaefulSets will not be scalable unless I keep extra Persistent Volumes available.

I have assigned a storage size of 1 GB for each of the shares. We have already covered the different sections in this YAML file. Following is a sample YAML file to create PV:

[root@controller ~]# cat nfs-pv-share1.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv-share1
spec:
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
   - ReadWriteMany
  persistentVolumeReclaimPolicy: Recycle
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /share1
    server: controller

Similarly I have 2 more YAML files to create Persistent Volumes for /share2 and /share3

Let’s create these PV:

[root@controller ~]# kubectl create -f nfs-pv-share1.yml -f nfs-pv-share2.yml -f nfs-pv-share3.yml
persistentvolume/nfs-pv-share1 created
persistentvolume/nfs-pv-share2 created
persistentvolume/nfs-pv-share3 created

Create StatefulSets

I will configure a basic nginx server using StatefulSets just to give you an overview of how statefulsets works. To get the KIND and apiVersion of Stateful sets you can refer api-resources:

[root@controller ~]# kubectl api-resources | grep -iE 'KIND|stateful'
NAME                              SHORTNAMES   APIGROUP                       NAMESPACED   KIND
statefulsets                      sts          apps                           true         StatefulSet

To get the apiVersion:

[root@controller ~]# kubectl explain StatefulSet | head -n 2
KIND:     StatefulSet
VERSION:  apps/v1

Now that we have our KIND and apiVersion, we can create our YAMl file:

[root@controller ~]# cat nfs-stateful.yml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nginx-statefulset
spec:
  selector:
    matchLabels:
      name: nginx-statefulset
  serviceName: nginx-statefulset
  replicas: 3
  template:
    metadata:
      labels:
        name: nginx-statefulset
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx-statefulset
        image: nginx
        ports:
        - containerPort: 80
          name: "web"
        volumeMounts:
        - name: db-data
          mountPath: /var/www
  volumeClaimTemplates:
  - metadata:
      name: db-data
    spec:
      accessModes: [ "ReadWriteMany" ]
      storageClassName: ""
      resources:
        requests:
          storage: 1Gi

Here we plan to create 3 replicas which is why we created 3 Persistent Volumes earlier. If storageClassName is not specified in the PVC, the default storage class will be used for provisioning. Since we don’t have a storage class, I have set to an empty string ("") in the PVC, no storage class will be used. The StatefulSets will create the Persistent Volume Claim using the values from volumeClaimTemplates. It is important that accessModes matches the value from PersistentVolume or else the PVC will not bind to PV. We are using ReadWriteMany as our accessMode in the PV which is why the same is mentioned here.

Next lets’ go ahead and create this StatefulSet:

[root@controller ~]# kubectl create -f nfs-stateful.yml
statefulset.apps/nginx-statefulset created

List the available StatefulSets

To get the list of available Kubernetes StatefulSets use:

[root@controller ~]# kubectl get statefulsets
NAME                READY   AGE
nginx-statefulset   0/3     36s

Since we have just create this StatefulSet, there are 0 ready Pods out of total 3. Next look out for available PVC as it is expecte that StatefulSet will create Persistent Volume Claim for all the volumes we created earlier:

[root@controller ~]# kubectl get pvc
NAME                          STATUS   VOLUME          CAPACITY   ACCESS MODES   STORAGECLASS    AGE
db-data-nginx-statefulset-0   Bound    nfs-pv-share1   1Gi        RWX                            4s

Here as you see, we have one PVC created with status as BOUND which means it has successfully bound to one of the Persistent Volumes which can be checked under VOLUME i.e. nfs-pv-share-1.

You can also check the list of available PV, here nfs-pv-share1 is claimed by default/db-data-nginx-statefulset-0

[root@controller ~]# kubectl get pv
NAME            CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                 STORAGECLASS    REASON   AGE
nfs-pv-share1   1Gi        RWX            Recycle          Bound       default/db-data-nginx-statefulset-0                            2m3s
nfs-pv-share2   1Gi        RWX            Recycle          Available                                                                  2m3s
nfs-pv-share3   1Gi        RWX            Recycle          Available                                                                  2m3s

NOTE

StatefulSet will create the resources in sequential order unlike ReplicaSets or other types of resources. This is the reason you only see single PVC at this stage, once the Pod mapped to this PVC is up and running then the next PVC would be created.

Next we check the status of Pods:

[root@controller ~]# kubectl get pods
NAME                            READY   STATUS              RESTARTS   AGE
nginx-statefulset-0             1/1     Running             0          10s
nginx-statefulset-1             0/1     ContainerCreating   0          1s

Here the first Pod is created and you can check the naming convention, it doesn’t contain any random strings as with Deployments or ReplicaSets. Once the second Pod is created then the third one would be started.

After waiting for some time, we have 2 PVC and Pods up and running and all our Persistent Volumes are claimed:

[root@controller ~]# kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
nginx-statefulset-0             1/1     Running   0          98s
nginx-statefulset-1             1/1     Running   0          89s
nginx-statefulset-2             1/1     Running   0          80s

[root@controller ~]# kubectl get pv
NAME            CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                 STORAGECLASS    REASON   AGE
nfs-pv-share1   1Gi        RWX            Recycle          Bound    default/db-data-nginx-statefulset-0                            46m
nfs-pv-share2   1Gi        RWX            Recycle          Bound    default/db-data-nginx-statefulset-1                            46m
nfs-pv-share3   1Gi        RWX            Recycle          Bound    default/db-data-nginx-statefulset-2                            46m

[root@controller ~]# kubectl get pvc
NAME                          STATUS   VOLUME          CAPACITY   ACCESS MODES   STORAGECLASS    AGE
db-data-nginx-statefulset-0   Bound    nfs-pv-share1   1Gi        RWX                            44m
db-data-nginx-statefulset-1   Bound    nfs-pv-share2   1Gi        RWX                            44m
db-data-nginx-statefulset-2   Bound    nfs-pv-share3   1Gi        RWX                            44m

Deleting a Pod

Let us play around with our Pods to make sure what we learned above, actually works. So as per the definition of StatefulSet, the pod’s hostname, IP address, name etc should not change even if a Pod gets deleted.

So to verify this let’s first check the details of our Pods:

[root@controller ~]# kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP          NODE                   NOMINATED NODE   READINESS GATES
nginx-statefulset-0             1/1     Running   0          45m     10.36.0.1   worker-1.example.com   <none>           <none>
nginx-statefulset-1             1/1     Running   0          45m     10.44.0.1   worker-2.example.com   <none>           <none>
nginx-statefulset-2             1/1     Running   0          45m     10.36.0.3   worker-1.example.com   <none>           <none>

Next let’s create a dummy file on nginx-statefulset-2 Pod:

[root@controller ~]# kubectl exec -it nginx-statefulset-2 -c nginx-statefulset -- touch /var/www/pod3-file

The same file should appear on our NFS share which is shared with nginx-statefulset-2

[root@controller ~]# ls -l /share3/
total 0
-rw-r--r-- 1 root root 0 Jan  9 16:44 pod3-file

Next let’s delete this Pod:

[root@controller ~]# kubectl delete pod nginx-statefulset-2
pod "nginx-statefulset-2" deleted

As expected, a new Pod is automatically created with the same IP Address and nodename:

[root@controller ~]# kubectl get pods -o wide
NAME                            READY   STATUS              RESTARTS   AGE     IP          NODE                   NOMINATED NODE   READINESS GATES
nginx-statefulset-0             1/1     Running             0          48m     10.36.0.1   worker-1.example.com   <none>           <none>
nginx-statefulset-1             1/1     Running             0          48m     10.44.0.1   worker-2.example.com   <none>           <none>
nginx-statefulset-2             0/1     ContainerCreating   0          2s      <none>      worker-1.example.com   <none>           <none>

The IP is not yet assigned, so let’s check the status again:

[root@controller ~]# kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP          NODE                   NOMINATED NODE   READINESS GATES
nginx-statefulset-0             1/1     Running   0          49m     10.36.0.1   worker-1.example.com   <none>           <none>
nginx-statefulset-1             1/1     Running   0          48m     10.44.0.1   worker-2.example.com   <none>           <none>
nginx-statefulset-2             1/1     Running   0          22s     10.36.0.3   worker-1.example.com   <none>           <none>

Next let’s verify the file if it is still present within the Pod:

[root@controller ~]# kubectl exec -it nginx-statefulset-2 -c nginx-statefulset -- ls -l /var/www/
total 0
-rw-r--r-- 1 root root 0 Jan  9 11:14 pod3-file

So the Pod seems to working as expected. Even if the Pod is deleted then the nodename, hostname and IP will remain same unlike Deployments and ReplicaSets.

Conclusion

In this tutorial we learned about Kubernetes StatefulSets and how it’s comparison with ReplicaSets and Deployments. We learned that like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.

If you want to use storage volumes to provide persistence for your workload, you can use a StatefulSet as part of the solution. Although individual Pods in a StatefulSet are susceptible to failure, the persistent Pod identifiers make it easier to match existing volumes to the new Pods that replace any that have failed