Kubernetes Application Management — Upgrade (1)

10 min readJun 17, 2019

By Bruce Wu

Background

In software development, the small update and fast iteration development model has been widely recognized and used by more and more Internet companies. The frequency of application changes and upgrades has been continuously increasing. To address different upgrade requirements and ensure the smooth upgrade of diversified applications, a series of deployment and release models have been developed.

Downtime release — Completely shuts down the old application instance, and then releases the new version. This release model mainly solves the problem that the old and new application versions are incompatible with each other. The drawback is that the application is completely unavailable for a period of time.
Blue-green release — Deploys the same number of old and new application instances. After a new version passes the test, data traffic is switched to new application instances all at once. This release model solves the problem where the application is completely unavailable during the downtime release. However, it causes significant resource consumption.
Rolling release — Gradually replaces application instances in batches. This release model does not interrupt the running service, or consume too many additional resources. However, this may cause compliance problems when a request from the same client is transmitted between old and new application versions.
Canary release — Gradually transits traffic from old instances to new instances. If no problems are found for some time after the release, increase the traffic to the new version and reduce it to the old version.
A/B testing — Releases two or multiple versions simultaneously, collect user feedback on these versions, and determine the best version for official release through analysis and evaluation.

In the context that more and more applications are container-based, smoothly upgrading container-based applications has attracted extensive attention. This article describes the upgrade methods for different Kubernetes (K8s) deployment methods, and focuses on describing how to implement the rolling release of applications in Deployment. The study of this article is made based on k8s 1.13.

K8s Application Upgrade

In K8s, pod is the basic unit of application deployment and upgrade. Generally, a pod represents an application instance, and is deployed and run in the forms of Deployment, StatefulSet, DaemonSet, and Job. Next, this article describes how to upgrade pods under different deployment models.

Deployment

In most cases, Deployment is the most common deployment form of pods. This article describes the Deployment method by using a Spring Boot-based Java application as an example. This application is a simplified version of a real application and is very representative. It has the following characteristics:

1. After the application is started, it takes some time to load the configuration. During this period, it cannot provide external services.

2. An application may not always be able to normally provide services after it is started.

3. The application may not be able to automatically exit when it is unable to provide services.

4. During the upgrade process, the old application instance must not receive new requests, but it must have sufficient time to process existing requests.

Parameters

To ensure zero-downtime and uninterrupted upgrade of applications with the preceding characteristics, you need to carefully set the relevant parameters of Deployment. The upgrade-related configuration used in this example is provided as follows. For more information about the complete configuration, see spring-boot-probes-v1.yaml.

kind: Deployment
...
spec:
  replicas: 8
  strategy:
    type: RollingUpdate 
    rollingUpdate:
      maxSurge: 3
      maxUnavailable: 2
  minReadySeconds: 120
  ...
  template:
    ...
    spec:
      containers:
      - name: spring-boot-probes
        image: registry.cn-hangzhou.aliyuncs.com/log-service/spring-boot-probes:1.0.0
        ports:
        - containerPort: 8080
        terminationGracePeriodSeconds: 60
        readinessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 1
        livenessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 40
          periodSeconds: 20
          successThreshold: 1
          failureThreshold: 3
        ...

Configuring Strategies

You can configure the pod replacement strategies by setting various strategy parameters, such as:

.spec.strategy.type - Used to specify the strategy type of the pod to be replaced. Valid values: Recreate and RollingUpdate. Default: RollingUpdate.
Recreate — K8s deletes all existing pods and then creates new ones. This method is suitable for scenarios where the old and new application versions are incompatible with each other. Take caution if you want to use this method in a different scenario, because it may cause the service to be completely unavailable for a period of time.
RollingUpdate — K8s gradually replaces existing pods in batches. This method can be used to implement hot upgrade of services.
.spec.strategy.rollingUpdate.maxSurge - Specifies the maximum number of additional pods that can be created during the rolling update process. The value of this parameter can be an integer number or a percentage. A higher value leads to faster upgrade, but it also consumes extra system resources.
.spec.strategy.rollingUpdate.maxUnavailable - Specifies the maximum number of pods that are allowed to be unavailable during the rolling update process. The value of this parameter can be an integer number or a percentage. A higher value leads to faster upgrade, but it also causes the service to be unstable.

You can meet the upgrade requirements in different scenarios by adjusting the values of maxSurge and maxUnavailable.

1. To upgrade the application as quickly as possible while ensuring high system availability and stability, set maxUnavailable to 0, and specify a larger value for maxSurge.

2. To speed up the upgrade in the case of insufficient system resources and low pod load, set maxSurge to 0 and specify a larger value for maxUnavailable. Note that, if the value of maxSurge is 0, and that of maxUnavailable is DESIRED, the entire service may become unavailable. In this case, RollingUpdate becomes downtime release.

In this example, maxSurge is set to 3 and maxUnavailable is set to 2 to balance the stability, resource consumption, and upgrade speed.

Configuring Probes

K8s provides the following two types of probes:

ReadinessProbe — If all containers of a pod are started, K8s considers this pod as ready, and forwards traffic to this pod. However, after some applications are started, you need to load data or configuration files before they are truly ready for providing external services. Therefore, determining whether a pod is ready based on whether containers running in it are started is inaccurate. You can accurately determine whether containers of a pod are ready by configuring readiness probes for these containers. This allows you to build robust applications. K8s only allows services to send traffic to a pod after all containers of the pod pass readiness detection. If a pod fails readiness detection, K8s stops sending traffic to this pod.
LivenessProbe — K8s considers running containers as available by default. However, this logic has a flaw. The application may keep running and cannot automatically exit when it encounters an error or when it is unhealthy (for example, in the case of a serious deadlock). You can allow K8s to accurately determine whether containers run normally by configuring liveness probes for these containers. If a container fails the liveness detection, Kubelet will terminate it and restart it based on the restart strategy.

The probe configuration is very flexible. You can specify the detection frequency, success threshold, and failure threshold of a probe. For more information about parameter description and configuration methods, see Configure Liveness and Readiness Probes.

In this example, the target containers are configured with readiness probes and liveness probes:

1. Set initialDelaySeconds of ReadinessProbe to 30, because an application takes 30 seconds to complete the initialization in average.

2. When you configure liveness probes, ensure the target containers have sufficient time to get ready. If the values of the initialDelaySeconds, periodSeconds, and failureThreshold parameters are too small, the containers may be restarted before they get ready. In this case, they can never get ready. The example configuration ensures that a container will not be restarted if it gets ready within 80s after it is started. This provides a sufficient buffer in addition to the average initialization time of 30s.

3. For readiness probes, set periodSeconds to 10 and failureThreshold to 1. No traffic will be sent to a container if it remains abnormal for 10 seconds.

4. For liveness probes, set periodSeconds to 20 and failureThreshold to 3. When a container throws an exception, it will not be restarted within 60s.

Configuring minReadySeconds

Generally, after a newly created pod becomes ready, K8s considers it available and deletes the old pod. However, some problems can only be observed when the new pod processes user requests. To be safe, do not delete the old pod until you verify that no problem exists after observing the newly ready pod for a while.

The minReadySeconds parameter can control the length of the period that a pod must be observed after it gets ready. If containers within a new pod run properly during this period, K8s considers this pod available and deletes the old pod. Take caution when you configure this parameter. A smaller value may cause the observation to be insufficient, but a higher value may slow down the upgrade process. In this example, minReadySeconds is set to 120s to ensure a ready pod can complete a liveness detection cycle.

Configuring terminationGracePeriodSeconds

When K8s is about to delete a pod, K8s sends a TERM signal to containers inside the pod, and remove this pod from the endpoint list of the service. If the containers cannot be terminated within the specified period of time (30 seconds by default), K8s sends a SIGKILL signal to forcibly terminate the process. For more information about the detailed procedure, see Termination of Pods.

Generally, an application takes no more than 40s to process a request. To ensure the application can complete processing requests that have already been sent to the server, a graceful shutdown time is set in this example. You can determine the value of the terminationGracePeriodSeconds parameter based on the actual situation of different applications.

Viewing the Upgrade Progress

The preceding configurations can ensure smooth upgrade of the target application. We can change any field of PodTemplateSpec in the Deployment object to trigger a pod upgrade, and run command kubectl get rs -w to view the upgrade progress. We can see changes in the number of old and new pod replicas as follows:

1. Create new pods, the number of which is equal to maxSurge. Now, the total number of pods reaches the upper limit: DESIRED + maxSurge.

2. Before new pods are ready or available, K8s immediately starts the process to old pods, the number of which is equal to maxUnavailable. Now, the number of available pods is equal to DESIRED — maxUnavailable.

3. When an old pod is completely deleted, a new pod is immediately created.

4. When a new pod passes the readiness detection and gets ready, K8s sends traffic to this pod. However, this pod is not considered available until the specified observation period ends.

5. If a ready pod runs properly during the observation period and is considered available, another old pod is deleted.

6. Repeat steps 3, 4, and 5 until all old pods are deleted and the number of new available pods reaches the target number of replicas.

Rolling Back Upgrades

Application upgrades cannot be successful every time. You may find the new version unable to meet your expectations during or after an upgrade. In this case, you may want to roll back to a previous version that is more stable. K8s records every change of PodTemplateSpec (such as changes in template labels or container images). This allows you to conveniently roll back to a stable version based on the version number if you find any problems with the new version. For more information about the detailed procedure, see Rolling Back a Deployment.

StatefulSet

StatefulSet is a common deployment method for stateful pods. For these pods, K8s also provides many parameters to flexibly control their upgrade process. Most of these parameters are the same as those that are used for upgrading pods running in Deployment. This article mainly describes their differences.

Strategy Types

In K8s 1.7 and later versions, StatefulSet supports two policies: OnDelete and RollingUpdate.

OnDelete: After you update PodTemplateSpec of StatefulSet, K8s creates new pods only after you manually delete old pods. This is the default update strategy. This strategy is designed to ensure compatibility with K8s 1.6 and earlier versions. In addition, it avoids the problem that pods of old and new application versions are incompatible with each other during the upgrade.
RollingUpdate — K8s gradually replaces pods managed by StatefulSet in batches. The difference between this RollingUpdate strategy and that of the Deployment mode is that this strategy replaces pods in order. For example, a StatefulSet runs N pods, each of which is assigned a monotonically increasing ordinal number when they are deployed. During the rolling update, they are replaced in a descending order.

Partition

You can set parameter .spec.updateStrategy.rollingUpdate.partition to upgrade only some pods. After you set the partition parameter, only pods with an ordinal number greater than or equal to the partition parameter will undergo the rolling upgrade. The rest of the pods remain unchanged.

You can also implement the canary upgrade by continuously decreasing the value of the partition parameter. For more information about the detailed operation steps, see Rolling Out a Canary.

DaemonSet

DaemonSet allows you to run replicas of one pod at some or all K8s work nodes. DaemonSet is usually used to run monitoring or log collection applications. For pods in the DaemonSet, parameters that are used to control their upgrade process are basically the same as those of Deployment. Slight differences exist in the supported strategy types, though. DaemonSet supports the following two strategy types:

OnDelete: After you update PodTemplateSpec for DaemonSet, K8s creates new pods only after you manually delete old pods. This is the default update strategy. This strategy is designed to ensure compatibility with K8s 1.5 and earlier versions. In addition, it avoids the problem where pods of old and new application versions are incompatible with each other during the upgrade.
RollingUpdate: The meaning and configurable parameters of this strategy are the same as those of the RollingUpdate strategy of Deployment.

For more information about the detailed procedure, see Perform a Rolling Update on a DaemonSet.

Job

Generally, Deployment, StatefulSet, and DaemonSet are used to deploy regular processes. However, pods in Job exit after executing specified jobs, and do not involve rolling update. After you change PodTemplateSpec of a Job, you need to delete the old Job and pods, and run this job again by using the new configuration.

Summary

K8s supports zero-downtime and uninterrupted upgrade of most applications, but it also has some problems:

1. Currently, K8s supports only two deployment upgrade strategies: downtime release and rolling release. For applications that have additional requirements, such as blue-green release, canary release, and A/B testing, secondary development or third-party tools are required.

2. K8s offers the rollback feature, but you must perform the rollback manually, and it does not support automatic rollback based on specified conditions.

3. Some applications also need to gradually scale up or down in batches, which is currently not supported by K8s.

For how to solve these problems, see the next article of this series.

Reference

https://www.alibabacloud.com/blog/kubernetes-application-management---upgrade-1_594899?spm=a2c41.13018589.0.0