By Wu Bo (Bruce Wu)
With Deployments and ReplicationControllers, users can conveniently deploy a highly available and scalable distributed stateless service in Kubernetes. These type of applications do not store data locally. By using simple load balancing policies, they can implement request delivery. With the popularization of k8s and the rise of cloud-native architectures, more and more people want to orchestrate stateful services like databases by using k8s. However, this process is not easy due to the complexity of stateful services. This article uses the most popular open-source database MySQL as an example to describe how to deploy and maintain stateful services on k8s. The content of this article is based on
Use StatefulSets to Deploy MySQL
This section uses the sample in the official k8s tutorial Run a Replicated Stateful Application to describe how to deploy highly available MySQL services by using StatefulSets.
Deployments and ReplicationControllers are designed for stateful services. Pod names, host names, and storage in Deployments and ReplicationControllers are not stable. In addition, Pods are started and destroyed in random order. Therefore, they are not suitable for stateful applications like databases. K8s provides the StatefulSet workload that is used to manage stateful services. Its management pod has the following features:
1.Uniqueness: For a StatefulSet with N replicas, each Pod in the StatefulSet will be assigned a unique integer ordinal, from 0 up through N-1.
2.Sequence: By default, Pods in a StatefulSet are started, updated and destroyed sequentially.
3.Stable network identity: The hostname and DNS of a Pod will not change after the Pod is rescheduled.
4.Stable persistent storage: When a Pod is rescheduled, it can still mount the original PersistentVolume to ensure data integrity and consistency.
In this example, the highly available MySQL service consists of one master node and multiple slave nodes that asynchronously replicate data from the master node (that is, the one-master-multiple-slave replication model). The master node can process read/write requests from users, while the slave nodes can only process read requests from users.
To deploy such a service, in addition to StatefulSets, many other k8s resource objects are required, including ConfigMaps, Headless Services, and ClusterIP Services. The collaboration among these objects allows stateful services like MySQL to conditionally run on k8s.
To make it easy and convenient to maintain application configuration, large systems and distributed applications usually adopt centralized configuration management policies, In k8s, users can separate configuration from Pods by using ConfigMap to maintain the portability of the workload and simplify configuration change and management.
The sample contains a ConfigMap called
mysql. When a Pod in the StatefulSet is started, it will read proper configuration from the ConfigMap based on its own role.
A Headless Service provides each associated Pod with a corresponding DNS address of the form
<pod-name>.<service-name>. This allows the client to access any desired application instances and can solve the identity recognition among different instances in a distributed environment.
The sample contains a Headless Service called
mysql, which is associated with Pods. These Pods are assigned the following DNS addresses:
mysql-0.mysql, mysql-1.mysql, and
mysql-2.mysql. By doing this, the client can access the master node through
mysql-0.mysql and the slave nodes through
To simplify access in read-only scenarios, the sample provides an ordinary service called
mysql-read. This service has its own cluster IP and sends requests to associated Pods (including the master and the slaves) to hide Pod access details from users.
A StatefulSet is a critical part of service deployment. Each Pod that a StatefulSet manages is assigned a unique name of the form
<statefulset-name>-<ordinal-index>. In this example, the name of the StatefulSet is
mysql. Therefore, Pods in the StatefulSet are named
mysql-2 respectively. By default, they are created sequentially and destroyed in reverse sequential order.
As shown in the following figure, a Pod contains two init containers and two app containers, and is bound to the PersistentVolume provided by the volume vendor through the unique PersistentVolumeClaim.
The functions of Pod-related components are as follows:
init-mysqlcontainer generates configuration files. It extracts the Pod ordinal from the hostname and exports the ordinal into the
/mnt/conf.d/server-id.cnffile. It also applies either master.cnf or slave.cnf (depending on the node type) from the ConfigMap by copying the contents into
clone-mysqlcontainer clones data. The
Pod N+1clones data from Pod
Nto the PersistentVolume bound.
- After the Init Containers complete successfully, the app containers run. The
mysqlcontainer runs the actual mysqld server.
xtrabackupcontainer acts as a sidecar. It waits for mysqld in the
mysqlcontainer to be ready and then runs the
START SLAVEcommand to initialize data replication on the slave. The xtrabackup container also listens for connections from other Pods requesting a data clone.
- The StatefulSet associates a unique PC to each Pod by using volumeClaimTemplates. In this sample, Pod
Nis associated to a PVC named
data-mysql-N, which is also bound to the PV provided by the storage system. This mechanism ensures that a rescheduled Pod can still mount the original data.
To ensure service performance and improve system reliability, proper maintenance is required after the deployment completes successfully. Common maintenance work related to database services includes service fault recovery, service scaling, service status monitoring, and data backup and recovery.
Service Fault Recovery
Whether a service can recover itself in the case of a fault is one of the key metrics that indicate the system automation level. In the current architecture, the MySQL service can be automatically restored when the host experiences downtime or the master or slave nodes fail to respond. In the case of the aforementioned problems, k8s reschedules and restarts Pods where a problem happens. The StatefulSets can ensure that the names, hostnames, and volumes of these Pods remain consistent with the original items.
In the one-master-multiple-slave MySQL replication model, scaling means to adjust the number of slaves. Thanks to the Pod startup and destruction ordering guarantee provided by the StatefulSet, the number of slaves can be scaled simply by using the following command.
Kubectl scale statefulset mysql -- replicas = <NumOfReplicas>
Service Status Monitoring
Monitoring service status is one essential part to ensure service stability. In addition to readiness probes and liveness probes, more fine-grained monitoring metrics are often required to detect service health. Users can expose the key metrics in MySQL to Prometheus by using mysqld-exporter and implement monitoring and alerting based on Prometheus. We recommend that users deploy mysqld-exporter in the sidecar mode together with the mysqld container in the same Pod.
Data Backup and Recovery
Data backup and recovery is an effective means to ensure data security. Users can implement data backup and recovery by using either volume interfaces or VolumeSnapshots. The following part describes the two methods.
Use Volume Interfaces
Many volume vendors provide the features to save data snapshots and recover data based on snapshots. These features are usually exposed to users in the form of interfaces. This requires users to be familiar with operation interfaces provided by the corresponding volume vendors. For example, if a service uses Alibaba Cloud disks as external volumes, users need to understand the snapshot interface provided for disks.
Three snapshot-related resource objects are introduced in K8s v1.12:
VolumeSnapshotClass. These objects provide standard methods to perform snapshot operations. Users can create snapshots of volumes that store MySQL data without perceiving external volumes, or recover data based on snapshots.
Using VolumeSnapshots is obviously a better method than directly using underlying volume interfaces. However, the VolumeSnapshot is still in the Alpha stage, and only a limited number of external volumes support standard snapshot operations. These factors limit the application scenarios of VolumeSnapshot. For more information about VolumeSnapshots, see the Volume Snapshots document.
Deploy MySQL by Using Operators
Although users can deploy and maintain a set of highly available MySQL services in k8s based on StatefulSets, the process is relatively complex. This process requires users to familiarize themselves with various k8s resource objects, learn many MySQL operation details and maintain a set of complex management scripts. Kubernetes Operators are designed to reduce the threshold for deploying complex applications on k8s.
An Operator is a method introduced by CoreOS to package, deploy and manage a complex application running on Kubernetes. Operators express the maintainers’ knowledge of software operations in the form of code and comprehensively use various k8s resource objects to deploy and maintain complex applications.
An Operator defines new resource objects for a service by using a CustomResourceDefinition and ensures that applications are in the expected state by using custom controllers.
The workflow process of the Operator can be divided into the three steps:
1.Observe: Observe the current status of the target object by using the k8s API.
2.Analyze: Find the differences between the desired state and current state.
3.Act: Take the necessary steps to make the running state of the application match its expected state
Oracle MySQL Operator
Many excellent open-source Operator solutions have already been available for MySQL services, including grtl/mysql-operator, oracle/mysql-operator, presslabs/mysql-operator, and kubedb/mysql. The Oracle MySQL Operator described in this section is a typical example of these open-source solutions.
How the Oracle MySQL Operator Works
The Oracle MySQL Operator supports the two following MySQL deployment modes.
- Primary: In this mode, the service group consists of a read-write single-primary node and multiple read-only primary nodes.
- Multi-Primary: In multi-primary mode, each node in the cluster plays the same role and the notion of primary-secondary does not apply. Each node can process read/write requests from users.
The following figure shows how the Operator works in Multi-Primary mode.
The following processes are very helpful to understand how the Operator works:
1.Use k8s CustomResourceDefinitions (CRDs) to define several resource objects related to MySQL deployment and maintenance.
mysqlclusters — Describe the expected cluster state, including deployment mode and number of nodes.
mysqlbackups — Describe on-demand backup policies and configure where backup data is stored (for example, in AWS S3).
mysqlrestores — Describe data recovery policies and require the backup data and target cluster.
mysqlbackupschedules — Describe regular backup policies and configure a time interval for backup.
2.Deploy an instance of the Operator in k8s. The Operator will constantly monitor CRUD operations on these resource objects and observes the object state.
3.When a user performs an operation (for example, creating a MySQL cluster), a new MySQLCluster resource object will be created. When the Operator listens for the MySQLCluster creation event, it will create a cluster that matches that user’s configuration. This example creates a highly available MySQL cluster based on the Group Replication and uses native k8s resource objects like StatefulSets and Headless Services.
4.When the Operator finds that the desired state and current state have some differences, it performs proper orchestration operations to ensure a consistent state.
Because the Operator encapsulates complex deployment details, it is now very easy to create a cluster. For example, a user can easily create a multi-primary MySQL cluster consisting of three nodes by using the following configuration.
When Operators are used, maintenance is also necessary, including service fault recovery, service scaling, service status monitoring, and data backup and recovery.
Service Fault Recovery
Due to the existence of the StatefulSet, k8s will reschedule a MySQL service instance when it fails to respond. In addition, if a StatefulSet is accidentally deleted, the Operator will recreate one.
Users can easily scale services by changing the
spec.members field of the MySQLCluster resource object. Only the MySQLCluster is exposed to users and underlying k8s resource objects are hidden.
Service Status Monitoring
Prometheus can be deployed on k8s to monitor the state of Operators and individual MySQL clusters. For more information, see Monitoring
Data Backup and Recovery
MySQLBackups and MySQLRestores can be used to back up and recover data, eliminating differences in operations on different volumes. MySQLBackupSchedules can also be used to create scheduled backup tasks.
For example, the following configuration performs a backup on the test database in the
mysql-clusterMySQL cluster every 30 minutes.
schedule: '*/30 * * * *'
This article describes how to deploy and maintain a set of highly available MySQL services through the native k8s resource object StatefulSet and the MySQL Operator. We can see that the Operator hides the orchestration details of complex applications and greatly reduces the threshold to use them in k8s. If you need to deploy other complex applications, we recommend that you use the Operator.