3.2K Followers
·
Follow

Kubernetes Persistent Storage Process

Image for post
Image for post

By Sun Zhiheng (Huizhi), Development Engineer at Alibaba Cloud

Kubernetes Persistent Storage: Basic Concepts

Before explaining the Kubernetes storage process, let’s review the basic concepts of persistent storage in Kubernetes.

1) Terms

  • In-tree: The code logic is in the Kubernetes repository.

2) Components

  • PV Controller binds PVs and PVCs and manages their lifecycles. It also performs the Provision and Delete operations on data volumes as needed.

3) How to Use PVs

Kubernetes introduces PVs and PVCs to allow applications and developers to request storage resources properly without concerning storage device details. Use one of the following ways to create a PV:

  • A cluster administrator manually and statically creates the PV required by an application.

Let’s use the shared storage of a network file system (NFS) as an example to explain the differences between the two PV creation methods.

Statically Create a PV

The following figure shows the process of statically creating a PV.

Image for post
Image for post

Step 1) A cluster administrator creates an NFS PV. NFS is a type of in-tree storage natively supported by Kubernetes. The YAML file is as follows:

apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 192.168.4.1
path: /nfs_storage

Step 2) A user creates a PVC. The YAML file is as follows:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi

Run the kubectl get pv command to check that the PV and PVC are bound.

[root@huizhi ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pvc Bound nfs-pv-no-affinity 10Gi RWO 4s

Step 3) The user creates an application and uses the PVC created in Step 2.

apiVersion: v1
kind: Pod
metadata:
name: test-nfs
spec:
containers:
- image: nginx:alpine
imagePullPolicy: IfNotPresent
name: nginx
volumeMounts:
- mountPath: /data
name: nfs-volume
volumes:
- name: nfs-volume
persistentVolumeClaim:
claimName: nfs-pvc

The NFS remote storage is mounted to the /data directory of the NGINX container in the pod.

Dynamically Create a PV

To dynamically create a PV, ensure that the cluster is deployed with an NFS client provisioner and the corresponding StorageClass.

Compared with static PV creation, dynamic PV creation requires no intervention from the cluster administrator. The following figure shows the process of dynamically creating a PV.

Image for post
Image for post

The cluster administrator only needs to ensure that the environment contains an NFS-related StorageClass.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: nfs-sc
provisioner: example.com/nfs
mountOptions:
- vers=4.1

Step 1) The user creates a PVC and sets storageClassName to the name of the NFS-related StorageClass.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfs
annotations:
volume.beta.kubernetes.io/storage-class: "example-nfs"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Mi
storageClassName: nfs-sc

Step 2) The NFS client provisioner in the cluster dynamically creates the corresponding PV. A PV is created in the environment and bound to the PVC.

[root@huizhi ~]# kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
pvc-dce84888-7a9d-11e6-b1ee-5254001e0c1b 10Mi RWX Delete Bound default/nfs 4s

Step 3) The user creates an application and uses the PVC created in Step 2. This step is the same as Step 3 of statically creating a PV.

Process of Kubernetes Persistent Storage

1) Overview

The following figure shows the process of Kubernetes persistent storage. This figure is taken from the cloud native storage courses given by Junbao.

Image for post
Image for post

Let’s take a look at the steps involved in the process.

1) A user creates a pod that contains a PVC, which uses a dynamic PV.
2) Scheduler schedules the pod to an appropriate worker node based on the pod configuration, node status, and PV configuration.
3) PV Controller watches that the pod-used PVC is in the Pending state and calls Volume Plugin (in-tree) to create a PV and PV object. The out-of-tree process is implemented by External Provisioner.
4) AD Controller detects that the pod and PVC are in the ‘To Be Attached’ state and calls Volume Plugin to attach the storage device to the target worker node.
5) On the worker node, Volume Manager in the kubelet waits until the storage device is attached and uses Volume Plugin to mount the device to the global directory `/var/lib/kubelet/pods/[pod uid]/volumes/kubernetes.io~iscsi/[PV
name]` (iscsi is used as an example).
6) The kubelet uses Docker to start the containers in the pod and uses the bind mount method to map the volume that is mounted to the local-global directory to the containers.

The following diagram shows a detailed process:

Image for post
Image for post

2) Process Explanation

The persistent storage process varies slightly depending on different Kubernetes versions. This article uses Kubernetes 1.14.8 as an example.

The preceding process map shows the three stages from when a volume is created to when it is used by applications: Provision/Delete, Attach/Detach, and Mount/Unmount.

Provisioning Volumes

Image for post
Image for post

PV Controller Workers

  • ClaimWorker processes the Add, Update, and Delete events of PVCs and the status changes of PVCs.

PV Status Changes (UpdatePVStatus)

  • The PV starts in the Available state and changes to the Bound state after being bound to the PVC.

PVC Status Changes (UpdatePVCStatus)

  • The PVC changes to the Pending state when the cluster does not include any PV that matches the PVC. The PVC changes from the Pending to the Bound state after it is bound to a PV.

Provisioning Process (Assuming a User Creates a PVC)

Static volume process (FindBestMatch): PV Controller selects a PV in the Available state in the environment to match to the new PVC.

  • DelayBinding: PV Controller determines whether to delay PVC binding. First, the PV Controller checks whether the PVC’s annotation includes volume.kubernetes.io/selected-node. If yes, the PVC is scheduled to a node by the scheduler (the PVC belongs to ProvisionVolume). In this case, binding is not delayed. Secondly, if the PVC's annotation does not include volume.kubernetes.io/selected-node and no StorageClass exists, binding is not delayed. If a StorageClass exists, PV Controller checks the VolumeBindingMode field. If it is set to WaitForFirstConsumer, binding is delayed. If it is set to Immediate, binding is not delayed.

Dynamic volume process (ProvisionVolume): The dynamic provisioning process is initiated if no appropriate PV exists in the environment.

  • Before Provisioning: 1) PV Controller determines whether the StorageClass used by the PVC is in-tree or out-of-tree. Therefore, PV Controller checks whether the Provisioner field of the StorageClass contains the kubernetes.io/ prefix. 2) PV Controller updates the PVC's annotation as follows:

claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] = storageClass.Provisioner.

  • In-tree Provisioning (Internal Provisioning): 1) The in-tree provisioner implements the NewProvisioner method of the ProvisionableVolumePlugin interface to return a new provisioner. 2) PV Controller calls the Provision function of the provisioner to return a PV object. 3) PV Controller creates the returned PV object and binds it to the PVC. Spec.ClaimRef is set to PVC, .Status.Phase is set to Bound, and .Spec.StorageClassName is set to the StorageClassName that is the same as the name of the PVC’s StorageClass. The following annotations are added:

"pv.kubernetes.io/bound-by-controller"="yes" and "pv.kubernetes.io/provisioned-by"=plugin.GetPluginName()".

  • Out-of-tree Provisioning (External Provisioning): 1) External Provisioner checks whether claim.Spec.VolumeName in the PVC is empty. If not, the PVC is skipped. 2) External Provisioner checks whether claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] in the PVC is the same as its provisioner name. External Provisioner passes in the --provisioner parameter to determine its provisioner name upon startup. 3) If VolumeMode of the PVC is set to Block, External Provisioner checks whether it supports block devices. 4) External Provisioner calls the Provision function and calls the CreateVolume interface of the CSI storage plug-in through gRPC. External Provisioner creates a PV to represent the volume and binds the PV to the PVC.

Deleting Volumes

The deleting volumes process is the reverse of the provisioning volumes process.

When a user deletes a PVC, PV Controller changes PV.Status.Phase to Released.

When PV.Status.Phase is set to Released, PV Controller checks the value of Spec.PersistentVolumeReclaimPolicy. If it is set to Retain, it is skipped. If it is set to Delete, then either of the following options is executed:

  • In-tree Deleting: 1) The in-tree provisioner implements the NewDeleter method of the DeletableVolumePlugin interface to return a new deleter. 2) PV Controller calls the Delete function of the deleter to delete the corresponding volume. 3) PV Controller deletes the PV object after the volume is deleted.

Attaching Volumes

Both the kubelet and AD Controller perform the Attach and Detach operations. These operations are performed when kubelet if — enable-controller-attach-detach is specified in the startup parameters of the kubelet. Otherwise, these operations are performed by AD Controller. The following section explains the Attach and Detach operations using AD Controller as an example.

Image for post
Image for post

Two Core Variables of AD Controller

  • DesiredStateOfWorld (DSW) indicates the expected volume attachment status in the cluster, including information about nodes->volumes->pods.

The Attaching Process

AD Controller initializes DSW and ASW based on the resource information in the cluster.

AD Controller has three components that periodically update DSW and ASW.

  • The Reconciler component runs a GoRoutine periodically to ensure that the volume is attached or detached. During this period, ASW is continuously updated.

Detaching Volumes

Detaching Process

  • When a pod is deleted, the AD Controller watches this event. It checks whether the node where the pod is located contains the volumes.kubernetes.io/keep-terminated-pod-volumes label. If yes, no operations are performed. If no, the volume is removed from DSW.

a) In-tree Detaching: 1) AD Controller implements the NewDetacher method of the AttachableVolumePlugin interface to return a new detacher. 2) AD Controller calls the Detach function of the detacher to perform the Detach operation on the volume. 3) AD Controller updates ASW.
b) Out-of-tree Detaching: 1) AD Controller calls the in-tree CSIAttacher to delete the related VolumeAttachement object. 2) External Attacher watches the VolumeAttachement (VA) resource in the cluster. If a data volume needs to be deleted, External Attacher calls the Detach function and calls the ControllerUnpublishVolume interface of the CSI plug-in through gRPC. 3) AD Controller updates ASW.

Image for post
Image for post

Volume Manager

It has two core variables:

  • DesiredStateOfWorld (DSW) indicates the expected volume mount status in the cluster, including information about volumes->pods.

The mounting and unmounting processes are as follows:

The global directory (global mount path) is a block device mounted to the Linux system only once. In Kubernetes, a PV may be mounted to multiple pod instances on a node. A formatted block device is mounted to a temporary global directory on a node. Then, the global directory is mounted to the corresponding directory of the pod by using the bind mount technology of Linux. In the preceding process map, the global directory is /var/lib/kubelet/pods/[pod uid]/volumes/kubernetes.io~iscsi/[PVname].

VolumeManager initializes DSW and ASW based on resource information in the cluster.

VolumeManager has two components that periodically update DSW and ASW.

  • DesiredStateOfWorldPopulator periodically runs a GoRoutine to update DSW.

UnmountVolumes ensures that the volumes are unmounted after the pod is deleted. All the pods in ASW are traversed. If any pod is not in DSW (indicating this pod has been deleted), the following operations are performed (VolumeMode=FileSystem is used as an example):

1) Remove all bind-mounts by calling the TearDown interface of Unmounter, or calling the NodeUnpublishVolume interface of the CSI plug-in in out-of-tree mode.
2) Unmount volume by calling the UnmountDevice function of DeviceUnmounter, or calling the NodeUnstageVolume interface of the CSI plug-in in out-of-tree mode.
3) ASW is updated.

MountAttachVolumes ensures that the volumes to be used by the pod are successfully mounted. All the pods in DSW are traversed. If any pod is not in ASW (the directory is to be mounted and mapped to the pod), the following operations are performed (VolumeMode=FileSystem is used as an example):

1) Wait until the volume is attached to the node by External Attacher or the kubelet.
2) Mount the volume to the global directory by calling the MountDevice function of DeviceMounter or calling the NodeStageVolume interface of the CSI plug-in in out-of-tree mode.
3) Update ASW if the volume is mounted to the global directory.
4) Mount the volume to the pod through bind-mount by calling the SetUp interface of Mounter or calling the NodePublishVolume interface of the CSI plug-in in out-of-tree mode.
5) Update ASW.

UnmountDetachDevices ensures that volumes are unmounted. All UnmountedVolumes in ASW are traversed. If any UnmountedVolumes do not exist in DSW (indicating these volumes are no longer used), the following operations are performed:

1) Unmount volume by calling the UnmountDevice function of DeviceUnmounter, or calling the NodeUnstageVolume interface of the CSI plug-in in out-of-tree mode.
2) ASW is updated.

Summary

This article introduces the basics and usage of Kubernetes persistent storage and analyzes the internal storage process of Kubernetes. In Kubernetes, all storage types require the preceding processes, but the Attach and Detach operations are not performed in certain scenarios. Any storage problem in an environment can be attributed to a fault in one of these processes.

Container storage is complex, especially in private cloud environments. However, through this process, it’s possible to seize more opportunities while braving more challenges. Currently, competition is fierce in the storage landscape of China’s private cloud market. Our agile PaaS container team is always looking for talented professionals to join us and help build a better future.

References

1) Source code of the Kubernetes community
2) kubernetes-design-proposals volume-provisioning
3) kubernetes-design-proposals CSI Volume Plugins in Kubernetes Design Doc

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store