Backup and Recovery Solution for Disk-based Data Volumes in Kubernetes Clusters

Image for post
Image for post

By Junbao

Disk-based data volumes are often used for data storage when stateful services are deployed to Alibaba Cloud Kubernetes (ACK) clusters. Despite the disk data backup (snapshotting) and recovery mechanisms in place, it remains a challenge for cloud-native storage services to integrate the underlying capabilities and Kubernetes services, and provision these services to apps in a flexible manner. Kubernetes enables backup and recovery capabilities by using the following features:

  • VolumeSnapshot objects for the backup of disks (snapshotting).
  • DataSource in a persistent volume claim (PVC) for data recovery (snapshot-based recovery).

The VolumeSnapshot feature remains in the Alpha phase in Kubernetes 1.16, and therefore it is not deployed to ACK clusters by default. Instead, it requires manual installation of the plug-in to use this feature.

Kubernetes Snapshot Description

Kubernetes defines the following three resource types in Custom Resource Definition (CRD) in order to implement snapshot functions:

  • VolumeSnapshotContent: It describes the snapshot instance that acts as the storage backend. VolumeSnapshotContent objects are created and maintained by system administrators and have no namespaces. These objects are analogous to persistent volume (PV) objects.
  • VolumeSnapshot: It declares a snapshot instance. VolumeSnapshot objects are created and maintained by users and belong to a specific namespaces. These objects are analogous to PVC objects.
  • VolumeSnapshotClass: It defines a snapshot class and describes the attributes and the controller used for creating the snapshot. It is analogous to a storage class.

Let’s take a look at the key rules for binding snapshot resources.

1) While using a snapshot object, first bind the VolumeSnapshot object with the VolumeSnapshotContent object, which is similar to binding a PV with a PVC.
2) If no static VolumeSnapshotContent object is available to bind with the VolumeSnapshot object, Kubernetes creates a dynamic VolumeSnapshotContent object for this purpose.
3) VolumeSnapshotContent and VolumeSnapshot objects are bound in a one-to-one manner.

If you delete a VolumeSnapshotContent object, its backend snapshot will also be deleted.

The following snippet shows a VolumeSnapshotClass definition template.

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotClass
metadata:
name: default-snapclass
snapshotter: disk-snapshot
parameters:
forceDelete: "false"

The key terms in the above snippet are given below:

  • snapshotter: It defines the controller used by the VolumeSnapshot object that falls into this snapshot class.
  • forceDelete: It indicates whether to allow the deletion of the snapshot when it is referenced by a disk. The default value is “false”, because creating disks that use snapshots as data sources lead to latency, and forcibly deleting the snapshot may lead to data loss.

The following snippet shows a VolumeSnapshot definition template.

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
name: snapshot-test
spec:
snapshotClassName: default-snapclass
source:
name: pvc-disk
kind: PersistentVolumeClaim

The key terms in the above snippet are listed below:

  • VolumeSnapshot: It defines the data source (PVC object name) and a class of the created snapshot.
  • Snapshot data source (PVC): It defines the disk volume for which a snapshot is created. The disk ID is retrieved from pvc-pv-handler.
  • snapshotClassName: It defines the class of the created snapshot.

Creating VolumeSnapshot resources helps to create a snapshot instance for a disk (associated through the PVC).

Snapshot-based creation of disks is a basic function provided by Alibaba Cloud disks. The Alibaba Cloud Container Service for Kubernetes allows specifying the snapshot for a data source in the PVC to enable snapshot-based dynamic creation of disks.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: disk-snapshot
spec:
accessModes:
- ReadWriteOnce
storageClassName: alicloud-disk-ssd
dataSource:
name: snapshot-test
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
resources:
requests:
storage: 20Gi

The key terms in the above snippet are listed below:

  • storageClassName: It defines the storage class of the created PV. The target disk controller must support the DataSource feature.
  • dataSource: It specifies the snapshot resources to ensure that the data of the snapshot will be used for creating disks.

Deploying the Plug-in

Before deploying CSI snapshotter, create an ACK 1.16 cluster and enable the CSI plug-in while creating the cluster. For more information about how to create a cluster, see Create a Kubernetes Cluster.

Download the CSI snapshotter template here.

Deploy the plug-in using the command below:

$ kubectl apply -f csi-snapshotter.yaml

After the deployment, the CSI plug-in appears as follows in the cluster:

# kubectl get pod -nkube-system |grep csi
csi-plugin-25xhh 9/9 Running 0 28h
csi-plugin-5xjqh 9/9 Running 0 28h
csi-plugin-9p4kd 9/9 Running 0 28h
csi-plugin-tmlmg 9/9 Running 0 28h
csi-plugin-tw57q 9/9 Running 0 28h
csi-provisioner-577d66cbb7-zks24 8/8 Running 0 161m
csi-provisioner-577d66cbb7-kja32 8/8 Running 0 161m
csi-snapshotter-859bdf8888-mq4dk 2/2 Running 0 161m

Using the Plug-in

The following figure shows the three-steps process to use the plug-in.

Image for post
Image for post
  • Step 1) Create an original app and a disk volume for storing data.
  • Step 2) Create a VolumeSnapshot object. The VolumeSnapshotContent object and the storage snapshot instance will be created automatically.
  • Step 3) Create another app and configure the PVC reference to the snapshot object created in Step 2.

The preceding steps fulfill the following purposes:

  • Backup: Volume1 data is backed up to Snapshot1.
  • Recovery: Snapshot1 data (Volume1 data) is restored to Volume2.

Creating a VolumeSnapshotClass Snapshot Class

Download the VolumeSnapshotClass template.

$ kubectl apply -f volumesnapshotcalss.yaml

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotClass
metadata:
name: default-snapclass
snapshotter: diskplugin.csi.alibabacloud.com
parameters:
forceDelete: "true"
# kubectl get VolumeSnapshotClass
NAME AGE
default-snapclass 4h40m

Step 1) Create an original app and write data to it

$ kubectl apply -f sts.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx
serviceName: "nginx"
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: disk-ssd
mountPath: /data
volumeClaimTemplates:
- metadata:
name: disk-ssd
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "alicloud-disk-snap"
resources:
requests:
storage: 20Gi
  • Write data to the pod as shown below.
# kubectl exec -ti web-0 touch /data/test
# kubectl exec -ti web-0 ls /data
lost+found test

Step 2) Create a VolumeSnapshot object

$ kubectl apply -f snapshot.yaml

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
name: new-snapshot-test
spec:
snapshotClassName: default-snapclass
source:
name: disk-ssd-web-0
kind: PersistentVolumeClaim

Check the cluster status to ensure that the VolumeSnapshot and VolumeSnapshotContent objects have been successfully created. Additionally, log on to the ECS console to check that the snapshot instance has been created.

# kubectl get VolumeSnapshot
NAME AGE
new-snapshot-test 173m
# kubectl get VolumeSnapshotContent
NAME AGE
snapcontent-b9bcccde-9ea4-41f0-967d-3647b8a5cc29 173m

Step 3) Restore the data

$ kubectl apply -f sts-snapshot.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: disk-snapshot-restore
spec:
accessModes:
- ReadWriteOnce
storageClassName: alicloud-disk-snap
resources:
requests:
storage: 20Gi
dataSource:
name: new-snapshot-test
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
name: web-restore
spec:
selector:
matchLabels:
app: nginx
serviceName: "nginx"
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
volumeMounts:
- name: pvc-disk
mountPath: /data
volumes:
- name: pvc-disk
persistentVolumeClaim:
claimName: disk-snapshot-restore

Specify dataSource as the VolumeSnapshot type in the PVC definition, and select the VolumeSnapshot object named “new-snapshot-test” created in Step 2.

Check the pod data to verify whether the recovery was successful using the code below.

# kubectl exec -ti web-restore-0 ls /data
lost+found test

Note that the data has been restored.

This solution only depicts a scenario where a snapshot is created to implement data recovery. The solution to the timed-creation of snapshots will be provided later.

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store