Recently, I worked on resizing a statefulset. One of our workloads was constantly restarting due to the disk space filling up. The process of resizing the persistent volume of a statefulset is straightforward. You scale down the statefulset, modify the size of the underlying PVC, and then scale up the statefulset however, in this case, there were multiple complications and constraints. To begin with, our PVCs were configured to be deleted when they were not bound to any pod. To complicate matters further, this was deployed in production, and we did not want to risk a loss of user data. The only solution was to figure out a way to scale the storage of the statefulsets without having to restart the underlying pods. After some research I found a way to do this.

StatefulSet: Link to heading

StatefulSet is the workload API object used to manage stateful applications. A StatefulSet runs a group of Pods, and maintains a sticky identity for each of those Pods. This is useful for managing applications that need persistent storage or a stable, unique network identity.

Stateful vs Stateless: Link to heading

A stateful application, saves data to persistent disk storage for use by the server, by clients, and other applications. Examples of stateful applications could be a database or key-value store to which data is saved and retrieved by other applications. A stateless application does not read nor store information about its state. This is best for containers as they, by design work best with stateless applications, as Kubernetes is able to create and remove containers in a rapid and dynamic manner.

Storage Abstractions in Kubernetes: Link to heading

Kubernetes handles storage through various abstractions that allow applications to use persistent or ephemeral storage seamlessly. These abstractions decouple storage provisioning and usage, enabling portability and flexibility in deploying workloads. The primary storage abstractions in Kubernetes include:

Volumes: A directory that is accessible to a container and is managed at the pod level. Volumes are tied to the pod’s lifecycle but can survive container restarts.

Persistent Volumes (PVs): A cluster-wide storage resource that is provisioned either dynamically or statically. It abstracts the underlying storage, whether it is a cloud provider, local disk, or network file system.

Persistent Volume Claims (PVCs): A request for storage by a user. PVCs claim PVs and provide an abstraction for users to consume storage without worrying about implementation details.

Storage Classes: Define storage provisioners and parameters for dynamic provisioning of PVs. They allow administrators to configure policies like replication, IOPS, and disk type.

CSI (Container Storage Interface): A standard interface that enables Kubernetes to interact with different storage systems uniformly. CSI allows storage vendors to create plugins for Kubernetes.

This illustrates the flow:

  1. Pods request storage via PVCs.
  2. PVCs bind to suitable PVs.
  3. Storage classes and provisioners manage the creation and provisioning of PVs dynamically.
  4. PVs interface with backend storage systems (e.g., cloud storage, local storage).
+-----------------------------------------------------------+
|                        Kubernetes Cluster                 |
|                                                           |
|  +-----------------+                                      |
|  | Storage Classes |--------------------------------------|  
|  +-----------------+                                      |
|          |                                                |
|   +------v------+                                         |
|   | Provisioner |                                         |
|   +-------------+                                         |
|          |                                                |
|   +------v-------+     +-------------------------+        |
|   |  Persistent  |<--->| Backend Storage Systems |        |
|   |  Volumes (PV)|     | (EBS, GCE PD, NFS, etc.)|        |
|   +--------------+     +-------------------------+        |
|          ^                                                |
|          |                                                |
|   +------v------+                                         |
|   | Persistent  |                                         |
|   | Volume Claim|                                         |
|   +-------------+                                         |
|          ^                                                |
|          |                                                |
|   +------v-------+                                        |
|   |    Pods      |                                        |
|   | (Containers) |                                        |
|   +--------------+                                        |
+-----------------------------------------------------------+

Resizing the statefulset Link to heading

As I mentioned at the beginning of this post, I had to ensure that the statefulset was upgraded without deleting the pods that were currently running so that the underlying volumes were not deleted. Here are the steps i followed,

  1. Identify the PVC being used by the statefulset. Make a copy of the stateful set configuration.

kubectl get sts <sts> -n <ns> -o yaml > sts

Make sure you have a way to recreate the StatefulSet object. Remove any unnecessary fields from the configuration - this could include the timestamp when the statefulset was created and other metadata.

  1. Delete the StatefulSet without deleting the pods.

kubectl delete sts <sts> -n <sts> --cascade=orphan

By passing --cascade=orphan to kubectl delete, the Pods managed by the StatefulSet are left behind even after the StatefulSet object itself is deleted.

  1. Modify the PVC object with the desired storage.

kubectl edit pvc <pvc-name> -n <namespace>

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: disk.csi.azure.com
    volume.kubernetes.io/selected-node: <redacted-node>
    volume.kubernetes.io/storage-provisioner: disk.csi.azure.com
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: <redacted-app>
    service: <redacted-service>
    version: v1
  name: <redacted-name>
  namespace: <redacted-namespace>
  resourceVersion: <redacted-resource-version>
  uid: <redacted-uid>
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 128Gi ===> 256Gi (Edit)
  storageClassName: managed-premium
  volumeMode: Filesystem
  volumeName: <redacted-volume-name>
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 128Gi ===> 256Gi (Edit)
  phase: Bound

Save this configuration once you have modified the storage.

  1. Recreate the StatefulSet with the new storage request: The StatefulSet will take charge of the orphan pods again, and will update the storage spec without recreating them.

kubectl apply -f sts -n <ns>

  Events:
  Type     Reason                      Age    From                                 Message
  ----     ------                      ----   ----                                 -------
  Normal   Resizing                    3m11s  external-resizer disk.csi.azure.com  External resizer is resizing volume pvc-abcd-efg-hijk-lmno-pqrst123
  Warning  ExternalExpanding           3m11s  volume_expand                        waiting for an external controller to expand this PVC
  Normal   FileSystemResizeRequired    37s    external-resizer disk.csi.azure.com  Require file system resize of volume on node
  Normal   FileSystemResizeSuccessful  19s    kubelet                              MountVolume.NodeExpandVolume succeeded for volume "pvc-abcd-efg-hijk-lmno-pqrst123" 

While working with a StatefulSet stored in our repository, I encountered an issue with mounting ConfigMaps for environment variables. The source code lacked the hash of the ConfigMap, so when the StatefulSet configuration was applied, the controller looked for a ConfigMap named configmap-name. However, the actual ConfigMap had a name like configmap-name-12fef3, and the mismatch caused the controller to fail in mounting it.

During this process, one pod in the StatefulSet restarted. While the other two pods continued running without issues, the newly restarted pod couldn’t mount the ConfigMap and repeatedly failed to start.

This could have been avoided if I had used the stored StatefulSet configuration from Step (1), as it included the correct ConfigMap name with the hash. To resolve the issue, I re-applied the saved StatefulSet configuration, which corrected the ConfigMap reference and allowed the pod to start successfully.