Kubernetes Persistent Storage Process

Image for post
Image for post

By Sun Zhiheng (Huizhi), Development Engineer at Alibaba Cloud

Kubernetes Persistent Storage: Basic Concepts

Before explaining the Kubernetes storage process, let’s review the basic concepts of persistent storage in Kubernetes.

1) Terms

2) Components

3) How to Use PVs

Kubernetes introduces PVs and PVCs to allow applications and developers to request storage resources properly without concerning storage device details. Use one of the following ways to create a PV:

Let’s use the shared storage of a network file system (NFS) as an example to explain the differences between the two PV creation methods.

Statically Create a PV

The following figure shows the process of statically creating a PV.

Image for post
Image for post

Step 1) A cluster administrator creates an NFS PV. NFS is a type of in-tree storage natively supported by Kubernetes. The YAML file is as follows:

apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 192.168.4.1
path: /nfs_storage

Step 2) A user creates a PVC. The YAML file is as follows:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi

Run the kubectl get pv command to check that the PV and PVC are bound.

[root@huizhi ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pvc Bound nfs-pv-no-affinity 10Gi RWO 4s

Step 3) The user creates an application and uses the PVC created in Step 2.

apiVersion: v1
kind: Pod
metadata:
name: test-nfs
spec:
containers:
- image: nginx:alpine
imagePullPolicy: IfNotPresent
name: nginx
volumeMounts:
- mountPath: /data
name: nfs-volume
volumes:
- name: nfs-volume
persistentVolumeClaim:
claimName: nfs-pvc

The NFS remote storage is mounted to the /data directory of the NGINX container in the pod.

Dynamically Create a PV

To dynamically create a PV, ensure that the cluster is deployed with an NFS client provisioner and the corresponding StorageClass.

Compared with static PV creation, dynamic PV creation requires no intervention from the cluster administrator. The following figure shows the process of dynamically creating a PV.

Image for post
Image for post

The cluster administrator only needs to ensure that the environment contains an NFS-related StorageClass.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: nfs-sc
provisioner: example.com/nfs
mountOptions:
- vers=4.1

Step 1) The user creates a PVC and sets storageClassName to the name of the NFS-related StorageClass.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfs
annotations:
volume.beta.kubernetes.io/storage-class: "example-nfs"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Mi
storageClassName: nfs-sc

Step 2) The NFS client provisioner in the cluster dynamically creates the corresponding PV. A PV is created in the environment and bound to the PVC.

[root@huizhi ~]# kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
pvc-dce84888-7a9d-11e6-b1ee-5254001e0c1b 10Mi RWX Delete Bound default/nfs 4s

Step 3) The user creates an application and uses the PVC created in Step 2. This step is the same as Step 3 of statically creating a PV.

Process of Kubernetes Persistent Storage

1) Overview

The following figure shows the process of Kubernetes persistent storage. This figure is taken from the cloud native storage courses given by Junbao.

Image for post
Image for post

Let’s take a look at the steps involved in the process.

1) A user creates a pod that contains a PVC, which uses a dynamic PV.
2) Scheduler schedules the pod to an appropriate worker node based on the pod configuration, node status, and PV configuration.
3) PV Controller watches that the pod-used PVC is in the Pending state and calls Volume Plugin (in-tree) to create a PV and PV object. The out-of-tree process is implemented by External Provisioner.
4) AD Controller detects that the pod and PVC are in the ‘To Be Attached’ state and calls Volume Plugin to attach the storage device to the target worker node.
5) On the worker node, Volume Manager in the kubelet waits until the storage device is attached and uses Volume Plugin to mount the device to the global directory `/var/lib/kubelet/pods/[pod uid]/volumes/kubernetes.io~iscsi/[PV
name]` (iscsi is used as an example).
6) The kubelet uses Docker to start the containers in the pod and uses the bind mount method to map the volume that is mounted to the local-global directory to the containers.

The following diagram shows a detailed process:

Image for post
Image for post

2) Process Explanation

The persistent storage process varies slightly depending on different Kubernetes versions. This article uses Kubernetes 1.14.8 as an example.

The preceding process map shows the three stages from when a volume is created to when it is used by applications: Provision/Delete, Attach/Detach, and Mount/Unmount.

Provisioning Volumes

Image for post
Image for post

PV Controller Workers

PV Status Changes (UpdatePVStatus)

PVC Status Changes (UpdatePVCStatus)

Provisioning Process (Assuming a User Creates a PVC)

Static volume process (FindBestMatch): PV Controller selects a PV in the Available state in the environment to match to the new PVC.

Dynamic volume process (ProvisionVolume): The dynamic provisioning process is initiated if no appropriate PV exists in the environment.

claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] = storageClass.Provisioner.

"pv.kubernetes.io/bound-by-controller"="yes" and "pv.kubernetes.io/provisioned-by"=plugin.GetPluginName()".

Deleting Volumes

The deleting volumes process is the reverse of the provisioning volumes process.

When a user deletes a PVC, PV Controller changes PV.Status.Phase to Released.

When PV.Status.Phase is set to Released, PV Controller checks the value of Spec.PersistentVolumeReclaimPolicy. If it is set to Retain, it is skipped. If it is set to Delete, then either of the following options is executed:

Attaching Volumes

Both the kubelet and AD Controller perform the Attach and Detach operations. These operations are performed when kubelet if — enable-controller-attach-detach is specified in the startup parameters of the kubelet. Otherwise, these operations are performed by AD Controller. The following section explains the Attach and Detach operations using AD Controller as an example.

Image for post
Image for post

Two Core Variables of AD Controller

The Attaching Process

AD Controller initializes DSW and ASW based on the resource information in the cluster.

AD Controller has three components that periodically update DSW and ASW.

Detaching Volumes

Detaching Process

a) In-tree Detaching: 1) AD Controller implements the NewDetacher method of the AttachableVolumePlugin interface to return a new detacher. 2) AD Controller calls the Detach function of the detacher to perform the Detach operation on the volume. 3) AD Controller updates ASW.
b) Out-of-tree Detaching: 1) AD Controller calls the in-tree CSIAttacher to delete the related VolumeAttachement object. 2) External Attacher watches the VolumeAttachement (VA) resource in the cluster. If a data volume needs to be deleted, External Attacher calls the Detach function and calls the ControllerUnpublishVolume interface of the CSI plug-in through gRPC. 3) AD Controller updates ASW.

Image for post
Image for post

Volume Manager

It has two core variables:

The mounting and unmounting processes are as follows:

The global directory (global mount path) is a block device mounted to the Linux system only once. In Kubernetes, a PV may be mounted to multiple pod instances on a node. A formatted block device is mounted to a temporary global directory on a node. Then, the global directory is mounted to the corresponding directory of the pod by using the bind mount technology of Linux. In the preceding process map, the global directory is /var/lib/kubelet/pods/[pod uid]/volumes/kubernetes.io~iscsi/[PVname].

VolumeManager initializes DSW and ASW based on resource information in the cluster.

VolumeManager has two components that periodically update DSW and ASW.

UnmountVolumes ensures that the volumes are unmounted after the pod is deleted. All the pods in ASW are traversed. If any pod is not in DSW (indicating this pod has been deleted), the following operations are performed (VolumeMode=FileSystem is used as an example):

1) Remove all bind-mounts by calling the TearDown interface of Unmounter, or calling the NodeUnpublishVolume interface of the CSI plug-in in out-of-tree mode.
2) Unmount volume by calling the UnmountDevice function of DeviceUnmounter, or calling the NodeUnstageVolume interface of the CSI plug-in in out-of-tree mode.
3) ASW is updated.

MountAttachVolumes ensures that the volumes to be used by the pod are successfully mounted. All the pods in DSW are traversed. If any pod is not in ASW (the directory is to be mounted and mapped to the pod), the following operations are performed (VolumeMode=FileSystem is used as an example):

1) Wait until the volume is attached to the node by External Attacher or the kubelet.
2) Mount the volume to the global directory by calling the MountDevice function of DeviceMounter or calling the NodeStageVolume interface of the CSI plug-in in out-of-tree mode.
3) Update ASW if the volume is mounted to the global directory.
4) Mount the volume to the pod through bind-mount by calling the SetUp interface of Mounter or calling the NodePublishVolume interface of the CSI plug-in in out-of-tree mode.
5) Update ASW.

UnmountDetachDevices ensures that volumes are unmounted. All UnmountedVolumes in ASW are traversed. If any UnmountedVolumes do not exist in DSW (indicating these volumes are no longer used), the following operations are performed:

1) Unmount volume by calling the UnmountDevice function of DeviceUnmounter, or calling the NodeUnstageVolume interface of the CSI plug-in in out-of-tree mode.
2) ASW is updated.

Summary

This article introduces the basics and usage of Kubernetes persistent storage and analyzes the internal storage process of Kubernetes. In Kubernetes, all storage types require the preceding processes, but the Attach and Detach operations are not performed in certain scenarios. Any storage problem in an environment can be attributed to a fault in one of these processes.

Container storage is complex, especially in private cloud environments. However, through this process, it’s possible to seize more opportunities while braving more challenges. Currently, competition is fierce in the storage landscape of China’s private cloud market. Our agile PaaS container team is always looking for talented professionals to join us and help build a better future.

References

1) Source code of the Kubernetes community
2) kubernetes-design-proposals volume-provisioning
3) kubernetes-design-proposals CSI Volume Plugins in Kubernetes Design Doc

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store