By Bruce Wu
You can conveniently deploy a highly available and scalable distributed stateless service in Kubernetes (K8s) by using Deployment and ReplicationController. These applications do not store data locally and they distribute requests based on simple load balancing policies. With the popularization of K8s and the rise of cloud-native architectures, more and more people hope to orchestrate stateful services such as databases in K8s. This process is not easy because of the complexity of stateful services. This article shows you how to deploy stateful services in K8s by taking the most popular opensource database MySQL as an example. The study of this article is made based on
Use StatefulSet to Deploy MySQL
This section shows you how to deploy a highly available MySQL service based by using StatefulSet. The example is taken from the official K8s tutorial Run a Replicated Stateful Application.
Introduction to StatefulSet
Deployment and ReplicationController are designed for stateless services. Their pod names, host names, and storage are unstable, and their pod startup and destruction orders are random, so they are not suitable for stateful applications such as databases. To address this problem, K8s offers a StatefulSet controller for stateful services. Pods managed by this controller have the following characteristics:
1. Uniqueness — If a StatefulSet has N pods, each pod is assigned a unique ordinal number in the range of [0, N).
2. Sequence — By default, the startup, update, and destruction of pods in StatefulSet are performed in sequence.
3. Stable network identity — The host name and DNS address of a pod does not change when the pod is rescheduled.
4. Stable persistent storage — After a pod is rescheduled, the original PersistentVolume can still be mounted to this pod to ensure data integrity and consistency.
The highly available MySQL service used in this example consists of one master node and multiple slave nodes that asynchronously replicate data from the master node. This is the one-master-multiple-slave replication model. The master node is used to handle read and write requests from users, but the slave nodes can only be used to handle read requests.
To deploy such a service, you need many other K8s resource objects apart from StatefulSet, such as ConfigMap, Headless Service, and ClusterIP Service. Their collaboration allows stateful services such as MySQL to be run on K8s.
To make application configuration maintenance easier, large systems and distributed applications often use the centralized configuration management policy. In the K8s environment, you can use ConfigMap to separate configurations and pods. This is helpful to ensure that the controller is portable, and its configuration can be easily modified and managed.
This example has a ConfigMap named mysql. When pods of the StatefulSet controller are started, they read the corresponding configuration from the ConfigMap according to their roles.
Headless Service provides a DNS address for each associated pod. The format of the DNS address is:
<pod-name>.<service-name>. This allows the client to choose the application instance as needed and solves the problem with identifying different instances in a distributed environment.
This example has a Headless Service named
mysql, which is associated with pods of the StatefulSet controller. These pods are assigned the following DNS addresses
mysql-2.mysql. This allows the client to access the master node by using
mysql-0.mysql, and slave nodes by using
To make read-only access more convenient, the example provides a normal service named
mysql-read. This service has its own cluster IP address to receive user requests, and to distribute requests to the associated pods, including pods of master nodes and slave nodes. This service also hides the pod access details from the user.
StatefulSet is the key to the deployment of the service. Each pod managed by it will be assigned a unique name. The format of the names is:
<statefulset-name>-<ordinal-index>. In this example, the StatefulSet is named
mysql, so these pods are named respectively as
mysql-2. By default, these pods are created in order and destructed in a descending order.
As shown in the following figure, a pod contains two init containers and two app containers. The pod is bound to the PersistentVolume provided by the volume provider by using a unique PersistentVolumeClaim.
Functions of components related to the pod are described as follows:
init-mysqlcontainer is mainly responsible for generating configuration files. It extracts the ordinal number of a pod from the hostname, and stores this ordinal number into the
/mnt/conf.d/server-id.cnffile. In addition, it replicates master.cnf or slave.cnf from ConfigMap to the
/mnt/confdirectory based on the node type.
clone-mysqlcontainer is mainly responsible for cloning data. The
clone-mysqlcontainer of the Pod
N+1clones data from Pod
Nto the PersistentVolume that is bound with Pod N.
- After Init container completes running, the app container starts running. The
mysqlcontainer is responsible running the mysqld service.
xtrabackupcontroller runs in the sidecar mode. When it detects that mysqld for the
mysqlcontainer is ready, it runs the
START SLAVEcommand to replicate data from the slave node. In addition, it monitors data cloning requests from other pods.
- StatefulSet uses volumeClaimTemplates to associate each pod with a unique PersistentVolumeClaim (PVC). In this example, Pod
Nis associated with the
data-mysql-NPVC. This PVC is then bound with a persistent volume (PV) provided by the system. This mechanism ensures that the pod can still mount the existing data after it is rescheduled.
To ensure service performance and system reliability, the corresponding maintenance support is required after the completion of the deployment. For database services, common maintenance work includes service failure recovery, service scaling, service status monitoring, and data backup and recovery.
Service Failure Recovery
The service failure recovery capability is a key metric that measures the automation degree of a system. In this architecture, the MySQL service can automatically recover when the host, the master node, or the slave nodes are down. If any of these problems occur, K8s reschedules the problematic pod and restarts it. Due to the fact that these pods are managed by the StatefulSet controller, the pod names, host names, and storage will remain unchanged.
Under the one-master-multiple-slave model, scaling means to adjust the number of slave points. When you use StatefulSet, the startup and destruction orders of pods are ensured. On this basis, you can use the following command to easily scale up or scale down your service.
kubectl scale statefulset mysql --replicas=<NumOfReplicas>
Service Status Monitoring
To ensure service stability, you must closely monitor the service status. Apart from the readiness and liveness probes, you usually need other monitoring metrics with finer granularity to check whether the service runs normally. You can use mysqld-exporter to expose core MySQL metrics to Prometheus, and then set Prometheus monitoring alarms. We recommend that you deploy mysqld-exporter in the same pod with the mysqld container in the sidecar mode.
Data Backup and Recovery
Data backup and recovery are effective measures to ensure data security. In this example, you can directly use volume APIs or use the VolumeSnapshot feature to back up and recover data.
Use the Volume API
Many volume providers offer features of saving data snapshots and recovering data based on such snapshots. These features are usually provided to users in the form of APIs. To use this method, you must be familiar with APIs that are provided by volume providers. For example, if you use Alibaba Cloud disk as the external volume, you need to know how to use the snapshot API of Alibaba Cloud disk.
K8s v1.12 introduces three snapshot-related resource objects:
VolumeSnapshotClass, and provides standard operation methods through these objects. In this case, you can create snapshots for volumes that store the MySQL data or restore data based on such snapshots without being aware of the existence of external volumes.
Comparing with directly using the underlying volume APIs, using VolumeSnapshot is obviously a better choice. However, VolumeSnapshot is still in the Alpha stage, and not all external volumes support snapshot operations. These all restrict the application of VolumeSnapshot. For more information about VolumeSnapshot, see Volume Snapshots.
Use Operator to Deploy MySQL
StatefulSet allows you to deploy a highly available MySQL service in K8s, but the process is relatively complex. You must be familiar with various K8s resource objects and learn about many MySQL operation details. You also need to maintain a complex set of management scripts. To reduce the difficulty of deploying complex applications in K8s, Kubernetes Operator was developed.
Introduction to Operator
Operator was developed by CoreOS to package, deploy and manage complex applications that need to be run in K8s. Operator turns the software operation knowledge of maintenance personnel into code, and comprehensively uses various K8s resource objects to deploy and maintain complex applications.
Operator defines new resource objects for services by using CustomResourceDefinition. In addition, it ensures that the application runs in the desired state by using custom controllers.
The workflow of Operator can be summarized in the following steps:
1. Observe: Operator observes the status of the target object by using a K8s API.
2. Analyze: Operator analyzes the differences between the current status and the desired status of the service.
3. Act: Operator orchestrates the service to adjust the current status to the desired status.
Oracle MySQL Operator
For MySQL services, many outstanding opensource Operator solutions are available, such as grtl/mysql-operator, oracle/mysql-operator, presslabs/mysql-operator, and kubedb/mysql. The Oracle MySQL Operator is a typical representative of these opensource solutions.
Working Mechanism of Oracle MySQL Operator
Oracle MySQL Operator supports two MySQL deployment modes:
- Primary — The service consists of one read/write primary node and multiple read-only secondary nodes.
- Multi-Primary — The roles of all nodes within the cluster are the same, and all nodes are primary nodes. Every node can process read and write requests of users.
The following figure shows how an Operator works under the Multi-Primary mode.
The following procedure is the key to understand the working mechanism of the Operator:
1. Use CustomResourceDefinition(CRD) of K8s to define several resource objects related to the deployment and maintenance of the MySQL service.
- [mysqlclusters](https://github.com/oracle/mysql-operator/blob/0.3.0/contrib/manifests/custom-resource-definitions.yaml#L5) - Describes the desired status of the cluster, including the deployment mode and the number of nodes.
- [mysqlbackups](https://github.com/oracle/mysql-operator/blob/0.3.0/contrib/manifests/custom-resource-definitions.yaml#L18) - Describes the on-demand backup policy and sets the location to store the backup data, for example AWS S3.
- [mysqlrestores](https://github.com/oracle/mysql-operator/blob/0.3.0/contrib/manifests/custom-resource-definitions.yaml#L31) - Describes the data recovery policy. You need to set the backup data and target cluster.
- [mysqlbackupschedules](https://github.com/oracle/mysql-operator/blob/0.3.0/contrib/manifests/custom-resource-definitions.yaml#L44) - Describes the scheduled backup policy and sets the time interval for backup.
2. Deploy an Operator instance in K8s. This Operator will continuously monitor the CREATE, READ, UPDATE, and DELETE (CRUD) operations on these resource objects, and observe the status of these objects.
3. If you perform an operation, for example when you create a MySQL cluster, a new MySQLCluster resource object will be created. When the Operator detects the event that the MySQLCluster resource object is created, it creates a cluster that meet the requirements based on the user configuration. In this example, a highly available MySQL cluster is created based on Group Replication. Several K8s native resource objects are used, such as StatefulSet and Headless service.
4. When Operator detects any differences between the current status of MySQLCluster and the desired status, it performs the corresponding orchestration operation to ensure status consistency.
Operator encapsulates the complex application deployment details. This allows you to easily create a cluster. For example, you can use the following configuration to deploy a MySQL Multi-Primary cluster that consists of three nodes:
The Operator mode also requires the maintenance work, such as service failure recovery, service scaling, service status monitoring, and data backup and recovery.
Service Failure Recovery
StatefulSet allows K8s to reschedule a MySQL service instance when it becomes unavailable. In addition, if StatefulSet is deleted by mistake, Operator can recreate it.
You can modify the
spec.members field of MySQLCluster to easily scale up and down the service. In this example, only MySQLCluster is exposed to users. The underlying K8s resource objects are hidden.
Service Status Monitoring
You can monitor the status of Operator and each MySQL cluster by deploying Prometheus in K8s. For more information about the detailed procedure, see Monitoring.
Data Backup and Recovery
You can use the MySQLBackup and MySQLRestore resource objects to back up and restore data without worrying about the operation differences between different volumes. In addition, you can create scheduled back up jobs by using MySQLBackupSchedule.
For example, you can use the following configuration to back up data of the
test database of the MySQL cluster
schedule: '*/30 * * * *'
This article successively shows you how to deploy and maintain a highly available MySQL service by using the K8s native resource object StatefulSet and by using a MySQL Operator instance. As you can see, Operator hides the orchestration details of complex applications, and significantly reduces the difficulty of deploying these applications in K8s. If you need to deploy any complex applications in K8s, we recommend that you use Operator.