OpenKruise: The Cloud-Native Platform for the Comprehensive Process of Alibaba’s Double 11

Over 95% of the Codes for OpenKruise Running Internally Comes from the Community.

  • All common capabilities are directly developed and published based on the open source repository, and then synchronized to the internal environment.
  • Each line of code contributed to OpenKruise by community members will run internally for Alibaba.

Automate Workload Management on Kubernetes

The Core Applications of Double 11 Are Fully Deployed Based on OpenKruise

  • CloneSet: Stateless application under the CloneSet is the largest workload. Most common e-commence businesses are deployed and released through CloneSet.
  • Advanced StatefulSet: Stateful application is currently used to deploy middleware in the cloud-native environment.
  • SidecarSet: Sidecar lifecycle management defines the status of sidecar containers into newly created pods. The O&M containers and mesh containers in the cloud are added into business pods as such.
  • Advanced DaemonSet: This workload is used to deploy host-level daemons on all nodes, including various basic components for network configuration and storage for business containers.
  • If only the controller fails, all of the operations of scaling and publishing will fail.
  • If there are major bugs in the controller that result in the incorrect calculation of the quantity or version number, it can cause the business container to be deleted on a large scale by mistake or upgraded to the wrong version.

Major Capabilities

  • When an application is released, all containers need to be migrated and rebuilt. This is nearly unacceptable. If all of Alibaba’s large-scale applications are rebuilt at a large scale at the release peak, it will be disastrous for both business and other components, such as schedulers, middleware, network, and storage components.
  • The deployment workloads do not support gray scale upgrade.
  • The StatefulSet cannot be upgraded in parallel.
  • The publishing efficiency is greatly improved. According to statistics, the publishing speed of an in-place upgrade is at least 80% higher than an upgrade through full re-creation in the Alibaba environment. An in-place upgrade saves time for scheduling, network distribution, and remote disk allocation. In addition, it only needs to pull a small number of incremental layers for the new image because the node already has the old image.
  • Before and after the publishing, the IP address remains the same, and the pod network continues. All containers in the pod, except for the containers being upgraded, remain running.
  • Volumes also remain unchanged. The mounted devices of the original containers are fully reused.
  • The certainty of the cluster is ensured, making the cluster topology, which passes the test of the comprehensive process, a guarantee for Double 11.

OpenKruise Is in the CNCF Sandbox

  • Five maintainers from Alibaba, Tencent, and Lyft
  • 44 contributors
  • Enterprises in China: Alibaba Cloud, Ant Group, Ctrip, Tencent, Pinduoduo
  • Enterprises Overseas: Microsoft, Lyft, Spectro Cloud, and Discord
  • More than 1900 GitHub stars
  • More than 300 Forks
  • OpenKruise will continue to adopt Alibaba’s general automation capabilities of cloud-native applications to stick with the “trinity” strategy mentioned above.
  • Requirements for workload in specific segments will be explored as well. For example, Alibaba is exploring the pooling capabilities for Functions as a Service (FaaS) scenarios.
  • OpenKruise will also fully integrate with other open source products from related fields, such as OAM and KubeVela, to build a more complete cloud-native application system.

