OpenKruise V0.5.0: Lossless Streaming and Batch Release Policies Now Supported

Image for post
Image for post

By Jiuzhu, Technical Expert at Alibaba Cloud

OpenKruise is an open-source automatic management engine for large-scale applications provided by Alibaba Cloud. In addition to features similar to those of Kubernetes native controllers such as Deployment and StatefulSet, OpenKruise provides more enhancements, including graceful in-place upgrade, release priority and dispersion policy, multi-zone workload abstraction management, and unified sidecar container injection management. All these core features have been tested in ultra-large-scale application scenarios at Alibaba Cloud. These features help to cope with more diverse deployment environments and requirements and bring more flexible deployment and release policies for cluster maintainers and application developers.

Currently, in Alibaba’s cloud-native environment, most applications use OpenKruise for pod deployment and release management. Besides many Alibaba Cloud customers, several companies across industries use OpenKruise to deploy applications when native Kubernetes Deployment doesn’t fully meet their requirements.

Background

First, let’s take a look at the release capabilities provided by the native Kubernetes workload.

  • Deployment currently supports maxUnavailable and maxSurge.
Image for post
Image for post
  • StatefulSet currently supports the partition policy.
Image for post
Image for post
  • Other workloads such as DaemonSet only support maxUnavailable.

These policies are feasible in test environments or small-scale application scenarios, but they cannot meet the requirements of large-scale application scenarios. For example:

  • Deployment does not support phased release. Therefore, it does not support a phased update of 20% of the pods. Until the release is completed, you can only set a smaller maxUnavailable or pause the release upon errors.

What’s New in V0.5.0

This section describes two main features of CloneSet and SidecarSet in V0.5.0. Check the version update details here.

CloneSet Supports maxSurge

In Alibaba’s cloud-native environment, most stateless applications are managed by CloneSet. To meet the deployment requirements of ultra-large-scale applications, we use the following methods:

  • In-place upgrade, wherein the pod objects, IP addresses, and volumes remain unchanged and only container images are upgraded after a release.

In Kruise V0.4.0 released in February 2020, we launched open-source CloneSet. CloneSet has attracted a lot of attention since its release. Currently, it has been applied by many well-known Internet companies.

CloneSet of the initial version only supports policies such as maxUnavailable and partition but does not support maxSurge (scale-out and then scale-in). This is not a problem for large-scale applications in Alibaba Group. However, many community users have small-scale applications on platforms. If the policy of scale-out and then scale-in is not supported, application availability may be affected during the release.

Based on the feedback regarding issues #250 and #260 from the community, we added the support for the maxSurge policy to CloneSet V0.5.0. We appreciate the community members such as fatedier and shiyan2016 for their contributions and valuable suggestions. So far, CloneSet has covered all the release policies of the native Kubernetes workload. The following figure shows the release features of CloneSet.

Image for post
Image for post

We will elaborate on the release policies of CloneSet in a later article. Let’s take a look at how maxSurge is implemented with streaming and phased release with the help of some examples:

1) Release Based on the maxSurge, maxUnavailable, and Partition Policies

apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
# ...
spec:
replicas: 5 # The total number of pods is 5.
updateStrategy:
maxSurge: 20% # One more pod is expanded: 5 x 20% = 1 (rounded up).
maxUnavailable: 0 # At least five pods are available during the release: 5 - 0 = 5.
partition: 3 # Three old pods are reserved (two pods are released: 5 - 3 = 2).

When a release starts, CloneSet expands one more pod based on maxSurge. Then, the total number of pods is 6 (five old pods and one new pod).

$ kubectl get clone demo
NAME DESIRED UPDATED UPDATED_READY READY TOTAL AGE
demo 5 1 0 5 6 17m

On the premise that maxUnavailable is unchanged, CloneSet deletes and creates pods gradually until there are three old pods (partition = 3). At this time, CloneSet deletes a new pod so that the total number of pods is 5 (three old pods and two new pods), as per the requirements.

$ kubectl get clone demo
NAME DESIRED UPDATED UPDATED_READY READY TOTAL AGE
demo 5 2 2 5 5 17m

To continue the release, the number of old pods must change to 0 (partition = 0). CloneSet expands one more pod based on maxSurge. At this time, the total number of pods is 6 (three old pods and three new pods).

$ kubectl get clone demo
NAME DESIRED UPDATED UPDATED_READY READY TOTAL AGE
demo 5 3 2 5 6 17m

On the premise that maxUnavailable is unchanged, CloneSet deletes and creates pods gradually until all pods are new ones (partition = 0). Finally, CloneSet deletes a new pod so that the total number of pods is 5 (five new pods).

$ kubectl get clone demo
NAME DESIRED UPDATED UPDATED_READY READY TOTAL AGE
demo 5 5 5 5 5 17m

2) In-place Upgrade Using maxSurge

CloneSet supports in-place upgrade and the upgrade by pod recreation, which can be used with policies such as maxSurge, maxUnavailable, and partition for pod release.

apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
# ...
spec:
updateStrategy:
type: InPlaceIfPossible
maxSurge: 20%

If maxSurge is configured in in-place upgrade mode, CloneSet expands pods specified by maxSurge, upgrades old pods in in-place upgrade mode (by updating the images in pod spec), and then clears and deletes pods specified by maxSurge after the specified partition is met.

This ensures the service availability and keeps the information such as IP addresses and volumes unchanged during the pod release.

SidecarSet Supports Volume Injection and Merging

SidecarSet is another key feature provided by Kruise. Unlike CloneSet and StatefulSet workloads that manage business pods, SidecarSet manages the sidecar container versions and injections in a cluster in a centralized manner.

The new feature in V0.5.0 resolves the repeated definitions of volumes in SidecarSet and pods upon sidecar container injection. This is feedback regarding the issue #254 of the community. They use SidecarSet to manage log collection sidecar containers and expect to inject sidecar containers to all pods in the bypass model.

For example, we need to inject a log collection sidecar container to each pod in a cluster. However, we cannot enable all application developers to add the container definition to their CloneSets and Deployments. Even if the container definition is added to the workloads of all applications, we must update the workloads to upgrade the image version of this log collection container, which is costly.

SidecarSet provided by OpenKruise is designed to solve this problem. We only need to write the sidecar definition into a global SidecarSet. No matter whether you use CloneSet, Deployment, or StatefulSet for deployment, the defined sidecar container is injected into all expanded pods.

Image for post
Image for post

Taking log collection as an example, first define a SidecarSet.

apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
name: log-sidecar
spec:
selector:
matchLabels:
app-type: long-term # Inject the container to all pods with the long-term label.
containers:
- name: log-collector
image: xxx:latest
volumeMounts:
- name: log-volume
mountPath: /var/log # Mount log-volume to the /var/log path and collect logs from the path.
volumes:
- name: log-volume # Define a volume named log-volume.
emptyDir: {}

You may wonder what to do if the log file directory varies for each application. This is why volume merge is required.

The original pod of an application before scale-out is as follows:

apiVersion: v1
kind: Pod
metadata:
labels:
app-type: long-term
spec:
containers:
- name: app
image: xxx:latest
volumeMounts:
- name: log-volume
mountPath: /app/logs # The log path of the application.
volumes:
- name: log-volume # Define a volume named log-volume.
persistentVolumeClaim:
claimName: pvc-xxx

The Kruise webhook will inject the log sidecar container defined in the SidecarSet into the pod.

apiVersion: v1
kind: Pod
metadata:
labels:
app-type: long-term
spec:
containers:
- name: app
image: xxx:latest
volumeMounts:
- name: log-volume
mountPath: /app/logs # The log path of the application.
- name: log-collector
image: xxx:latest
volumeMounts:
- name: log-volume
mountPath: /var/log
volumes:
- name: log-volume # Define a volume named log-volume.
persistentVolumeClaim:
claimName: pvc-xxx

Both the log volumes defined in the SidecarSet and pod are named log-volume. Therefore, the volume defined in the pod prevails during the injection. For example, the volume in the pod is mounted to a persistent volume (PV) in persistent volume claim (PVC) mode. After sidecar injection, this volume is also mounted to the /var/log directory in the sidecar container and then logs are collected.

In this way, sidecar containers are managed by SidecarSet. On the one hand, sidecar containers are decoupled from application deployment and release. On the other hand, sidecar containers share volumes with application containers to implement related sidecar functions such as log collection and monitoring.

Summary

The upgrade to the latest version, V0.5.0 enables the lossless release of applications and more convenient management of sidecar containers.

OpenKruise will be further optimized in terms of application deployment and release capabilities. We welcome the participation of more users in the OpenKruise community to build complete Kubernetes application management, delivery, and expansion capabilities for various larger-scale and more complex scenarios with extreme performance.

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store