Kubernetes Persistent Storage Process

Kubernetes Persistent Storage: Basic Concepts

1) Terms

  • In-tree: The code logic is in the Kubernetes repository.
  • Out-of-tree: The code logic is outside the Kubernetes repository and decoupled from the Kubernetes code.
  • PersistentVolume (PV): This is a cluster-level resource created by the cluster administrator or external provisioner. The lifecycle of a PV is independent of the pods that use the PV. The .Spec of the PV stores details about storage devices.
  • PersistentVolumeClaim (PVC): This is a namespace-level resource that is created by a user or StatefulSet controller based on VolumeClaimTemplate. A PVC is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods may request specific levels of resources (CPU and memory). PVCs may request the size and access mode of a specific volume.
  • StorageClass: This is a cluster-level resource created by the cluster administrator. A StorageClass provides administrators with a class template used to dynamically provision volumes. The .Spec of the StorageClass defines the different quality of service (QoS) and backup policies of PVs.
  • CSI: This is an interface compliant with industry standards, which allows storage providers (SPs) to work in different container orchestration (CO) systems by using CSI-based plug-ins. CO systems include Kubernetes, Mesos, and Swarm.

2) Components

  • PV Controller binds PVs and PVCs and manages their lifecycles. It also performs the Provision and Delete operations on data volumes as needed.
  • AD Controller performs the Attach and Detach operations on data volumes, and attaches devices to target nodes.
  • Kubelet is the main node agent running on each node. It manages pod lifecycles, checks container health, and monitors containers.
  • Volume Manager is a component of the kubelet. It performs the Mount, Unmount, Attach, and Detach operations on data volumes. These operations require specific parameter settings of the kubelet. It also formats volume devices.
  • Volume Plugins is a storage plug-in developed by storage vendors. It is used to expand the volume management capabilities of various storage classes and implement the operation capabilities of third-party storage- the preceding operations highlighted in blue. Volume Plugins includes in-tree and out-of-tree.
  • External Provisioner is a sidecar container that calls the CreateVolume and DeleteVolume functions of Volume Plugins to perform the Provision and Delete operations. The Kubernetes PV Controller cannot directly call the functions of Volume Plugins. These functions are called by External Provisioner through gRPC.
  • External Attacher is a sidecar container that calls the ControllerPublishVolume and ControllerUnpublishVolume functions of Volume Plugins to perform the Attach and Detach operations. The Kubernetes AD Controller cannot directly call the functions of Volume Plugins. These functions are called by External Attacher through gRPC.

3) How to Use PVs

  • A cluster administrator manually and statically creates the PV required by an application.
  • A user manually creates a PVC and the Provisioner component dynamically creates the corresponding PV.
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 192.168.4.1
path: /nfs_storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
[root@huizhi ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pvc Bound nfs-pv-no-affinity 10Gi RWO 4s
apiVersion: v1
kind: Pod
metadata:
name: test-nfs
spec:
containers:
- image: nginx:alpine
imagePullPolicy: IfNotPresent
name: nginx
volumeMounts:
- mountPath: /data
name: nfs-volume
volumes:
- name: nfs-volume
persistentVolumeClaim:
claimName: nfs-pvc
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: nfs-sc
provisioner: example.com/nfs
mountOptions:
- vers=4.1
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfs
annotations:
volume.beta.kubernetes.io/storage-class: "example-nfs"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Mi
storageClassName: nfs-sc
[root@huizhi ~]# kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
pvc-dce84888-7a9d-11e6-b1ee-5254001e0c1b 10Mi RWX Delete Bound default/nfs 4s

Process of Kubernetes Persistent Storage

1) Overview

2) Process Explanation

  • ClaimWorker processes the Add, Update, and Delete events of PVCs and the status changes of PVCs.
  • VolumeWorker processes the status changes of PVs.
  • The PV starts in the Available state and changes to the Bound state after being bound to the PVC.
  • The PV changes to the Released state after the bound PVC is deleted.
  • The PV changes to the Available state when the PV reclaim policy is Recycled or the .Spec.ClaimRef of the PV is manually deleted.
  • The PV changes to the Failed state when the PV reclaim policy is unknown, the recycle operation fails, or the volume cannot be deleted.
  • The PV changes to the Available state when the .Spec.ClaimRef of the PV is manually deleted.
  • The PVC changes to the Pending state when the cluster does not include any PV that matches the PVC. The PVC changes from the Pending to the Bound state after it is bound to a PV.
  • The PVC changes to the Lost state when the bound PV is deleted from the environment.
  • The PVC changes to the Bound state after it is bound to a PV with the same name as that of the previous PV.
  • DelayBinding: PV Controller determines whether to delay PVC binding. First, the PV Controller checks whether the PVC’s annotation includes volume.kubernetes.io/selected-node. If yes, the PVC is scheduled to a node by the scheduler (the PVC belongs to ProvisionVolume). In this case, binding is not delayed. Secondly, if the PVC's annotation does not include volume.kubernetes.io/selected-node and no StorageClass exists, binding is not delayed. If a StorageClass exists, PV Controller checks the VolumeBindingMode field. If it is set to WaitForFirstConsumer, binding is delayed. If it is set to Immediate, binding is not delayed.
  • FindBestMatchPVForClaim: PV Controller tries to find a PV in the environment that matches the PVC. PV Controller traverses all PVs and selects the optimal PV among the candidate PVs. Filter rules: 1) PV Controller checks whether VolumeMode is matched. 2) PV Controller checks whether the PV has been bound to the PVC. 3) PV Controller checks whether the Status Phase of the PV is Available. 4) PV Controller uses LabelSelector to check whether the PV and PVC have the same label. 5) PV Controller checks whether the PV and PVC have the same StorageClass. 6) PV Controller updates the smallest PV that meets the PVC requested size in each iteration and returns it as the final result.
  • Bind: PV Controller binds the selected PV to the PVC. 1. The .Spec.ClaimRef of the PV is updated to the current PVC. 2. The .Status.Phase of the PV is updated to Bound. 3. The annotation pv.kubernetes.io/bound-by-controller: "yes" is added to the PV. 4. The .Spec.VolumeName of the PVC is updated to the name of the PV. 5. .Status.Phase of the PVC is updated to Bound. 6. The annotations pv.kubernetes.io/bound-by-controller: "yes" and pv.kubernetes.io/bind-completed: "yes" are added to the PVC.
  • Before Provisioning: 1) PV Controller determines whether the StorageClass used by the PVC is in-tree or out-of-tree. Therefore, PV Controller checks whether the Provisioner field of the StorageClass contains the kubernetes.io/ prefix. 2) PV Controller updates the PVC's annotation as follows:
  • In-tree Provisioning (Internal Provisioning): 1) The in-tree provisioner implements the NewProvisioner method of the ProvisionableVolumePlugin interface to return a new provisioner. 2) PV Controller calls the Provision function of the provisioner to return a PV object. 3) PV Controller creates the returned PV object and binds it to the PVC. Spec.ClaimRef is set to PVC, .Status.Phase is set to Bound, and .Spec.StorageClassName is set to the StorageClassName that is the same as the name of the PVC’s StorageClass. The following annotations are added:
  • Out-of-tree Provisioning (External Provisioning): 1) External Provisioner checks whether claim.Spec.VolumeName in the PVC is empty. If not, the PVC is skipped. 2) External Provisioner checks whether claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] in the PVC is the same as its provisioner name. External Provisioner passes in the --provisioner parameter to determine its provisioner name upon startup. 3) If VolumeMode of the PVC is set to Block, External Provisioner checks whether it supports block devices. 4) External Provisioner calls the Provision function and calls the CreateVolume interface of the CSI storage plug-in through gRPC. External Provisioner creates a PV to represent the volume and binds the PV to the PVC.
  • In-tree Deleting: 1) The in-tree provisioner implements the NewDeleter method of the DeletableVolumePlugin interface to return a new deleter. 2) PV Controller calls the Delete function of the deleter to delete the corresponding volume. 3) PV Controller deletes the PV object after the volume is deleted.
  • Out-of-tree Deleting: 1) External Provisioner calls the Delete function and calls the DeleteVolume interface of the CSI plug-in through gRPC. 2) External Provisioner deletes the PV object after the volume is deleted.
  • DesiredStateOfWorld (DSW) indicates the expected volume attachment status in the cluster, including information about nodes->volumes->pods.
  • ActualStateOfWorld (ASW) indicates the actual volume attachment status in the cluster, including information about volumes->nodes.
  • The Reconciler component runs a GoRoutine periodically to ensure that the volume is attached or detached. During this period, ASW is continuously updated.
  • In-tree Attaching: 1) The in-tree attacher implements the NewAttacher method of the AttachableVolumePlugin interface to return a new attacher. 2) AD Controller calls the Attach function of the attacher to attach the device. 3) ASW is updated.
  • Out-of-tree Attaching: 1) The in-tree CSIAttacher is called to create a VolumeAttachement (VA) object, which contains the attacher information, node name, and information about the PV to be attached. 2. External Attacher watches VolumeAttachement resources in the cluster. If there are data volumes to be attached, External Attacher calls the Attach function and calls the ControllerPublishVolume interface of the CSI plug-in through gRPC.
  • The DesiredStateOfWorldPopulator component runs a GoRoutine periodically to update DSW.
  • FindAndRemoveDeletedPods traverses all the pods in DSW. Any pods that have been deleted from the cluster are removed from DSW.
  • FindAndAddActivePods traverses the pods in all PodListers. Any pods that do not exist in DSW are added to DSW.
  • The PVC Worker component watches the Add and Update events of PVCs, processes PVC-related pods, and updates DSW in real-time.

Detaching Volumes

  • When a pod is deleted, the AD Controller watches this event. It checks whether the node where the pod is located contains the volumes.kubernetes.io/keep-terminated-pod-volumes label. If yes, no operations are performed. If no, the volume is removed from DSW.
  • AD Controller uses Reconciler to transfer the ASW status to the DSW status. The Detach operation is performed if ASW contains any volume that does not exist in DSW.
  • DesiredStateOfWorld (DSW) indicates the expected volume mount status in the cluster, including information about volumes->pods.
  • ActualStateOfWorld (ASW) indicates the actual volume mount status in the cluster, including information about volumes->pods.
  • DesiredStateOfWorldPopulator periodically runs a GoRoutine to update DSW.
  • Reconciler periodically runs a GoRoutine to ensure that a volume is mounted or unmounted. During this period, ASW is continuously updated.

Summary

References

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com