The Story Behind How Alibaba Cloud Developed Cloud Native to Be Used Large Scale
Watch Alibaba Cloud’s webinar on Bringing a Microservice App to Managed Kubernetes in 10 Minutes to learn more about setting up Kubernetes clusters on Alibaba Cloud with Container Service for Kubernetes.
Alibaba Cloud has come to implemented cloud native technologies on a large scale. At the KubeCon + CloudNativeCon + Open Source Summit held on June 26, 2019, Xiang Li, a CNCF TOC representative and the senior staff engineer at Alibaba Cloud, delivered a keynote speech. Mr. Li shared Alibaba’s experience in scalability, reliability, development efficiency, and migration strategy, and discussed how to implement cloud native technologies and address technical challenges.
Questions Delivered in Xiang Li’s Presentation Include:
Why go cloud native? What benefits can cloud native technologies bring to us? From finding its way through the dark, to embracing open-source standards and contributing back to the community, what kind of challenges did Alibaba face to implement cloud native technologies? Does Alibaba have technical experiences to share?
Developing and Using Cloud Native Technology
Since 2011, Alibaba had begun to put the cloud native technology system into practice by leveraging containers. Alibaba, trailblazing in this industry, over time developed a containerized infrastructure architecture that is now today top-of-the-line among the global leading technology companies. This architecture is now the technological backbone of the entire Alibaba Group. Alibaba believes exploration is intrinsic to developing and discovering new technologies. Through much determination and exploration, Alibaba’s technical team has revolutionized many of the ways that technology is used today, becoming a leader in developing cloud native technologies in China.
Alibaba’s businesses are large and complex. A suitable starting point must be found to develop cloud native. Motivated by the cost pressure of the 11.11 global shopping festival, Alibaba chooses resource cost and efficiency optimization as the starting point of their journey into cloud native.
Alibaba leverages containers to develop low-cost virtualization and scheduling technologies. It provides flexible and standard deployment units, and changes the resource scheduling mode from static to dynamic and on demand. This improves deployment efficiency, solves the problem of resource fragmentation, and increases the deployment density. Employing technologies such as storage network virtualization and separation of storage and computing, Alibaba not only enhances the portability of tasks and improves resource reliability, but also reduces operating costs.
Motivated to reduce the cost of resources, Alibaba has completed overall containerization and replaced resource allocation with a highly efficient scheduling platform. Alibaba’s cloud native exploration is still ongoing, however. Increasing the research and development (R&D) efficiency and accelerating the iteration are key to boosting Alibaba’s business. Alibaba hopes to leverage cloud native technologies to improve the efficiency of developers.
To improve automation and simplify application deployment, Alibaba adopted Kubernetes as its container orchestration platform. Since then, Alibaba has dedicated its efforts to improve the performance and scalability of Kubernetes. Kubernetes also enables Alibaba to refine its R&D and deployment processes. To build more cloud-native Continuous Integration and Continuous Delivery (CI/CD) and further implement standardization and automation, Alibaba has introduced standardized application management tools such as Helm to manage the entire process, from R&D all the way to product launch. Alibaba also tries new and innovative deployment patterns like GitOps, and pushes forward the final state-oriented automated construction of the PaaS layer. Additionally, Alibaba is now beginning to explore service mesh, aiming to further improve the universality and standardization of service governance, lower the adoption threshold for developers, and further popularize microservices in multiple languages and environments.
In 2019, Alibaba launched the All-in-Cloud initiative. Through cloud native exploration and reconstruction, Alibaba builds a modern and standard infrastructure system. The container technology decouples applications from hosts when they are running. Kubernetes abstracts resources into Pods and volumes to unify the implementation of various resources. Intelligent scheduling on the PaaS layer makes it possible to automatically migrate applications and fix any instabilities. By using cloud native, Alibaba greatly simplifies the migration to cloud.
In the process of improving resource and personnel efficiency, Alibaba’s entire infrastructure has become more open and connected to multiple open-source ecosystems. Alibaba also integrates and shares beneficial concepts, technologies, and ideas with open-source communities. Now, Alibaba Cloud operates China’s largest cloud native application — the 11.11 global shopping festival. It also boasts the largest public cloud cluster and image repository in China. As the only vendor in China that was listed in Gartner’s Competitive Landscape: Public Cloud Container Services Market, Alibaba Cloud has accumulated the most extensive and valuable customer practices.
Scaling and Optimizing Kubernetes
Scaling and performance optimization effectively helps Alibaba cope with traffic peaks encountered in various complex scenarios.
After many years of determination, Alibaba has made great achievements in the scaling and performance aspects of Kubernetes. Compared with the original iteration, the number of objects that can be stored has been increased by 25 times. At the same time, the number of nodes that can be supported has been increased from 5000 to tens of thousands, and the end-to-end latency has been reduced from 5s to 100 ms. Much of the R&D work is carried out as a collaboration between Alibaba and the open-source community. Most of these R&D achievements have been attributed to the community, and Alibaba hopes that other companies and developers can enjoy these technological benefits that have been brought about through large-scale optimization.
Alibaba has worked hard to constantly optimize the performance of Kubernetes in terms of workload tracking, performance analysis, customized scheduling, and large-scale image distribution. Alibaba provides a complete tracking and replay mechanism for workload scheduling, and analyzes all performance problems in detail to overcome technical bottlenecks. Additionally, using highly customizable Kubernetes, Alibaba has developed a customizable scheduling and image distribution system — Dragonfly, based on its business scenarios. The open-source Dragonfly project was initially launched to address the needs of the 11.11 global shopping festival, and is equipped with excellent image distribution capabilities. During the 11.11 global shopping festival, up to dozens of super clusters are used. Each of these have tens of thousands of nodes and millions of containers.
Alibaba implements Kubernetes in three stages. First, Alibaba supplies resources through Kubernetes but does not interfere much with the O&M process. Doing so allows the system container to be rich and at the same times brings such capabilities as image standardization and lightweight virtualization to the upper-layer PaaS platform. Second, Alibaba uses the Kubernetes controller to transform the O&M process of the PaaS platform. This ensures the PaaS platform has stronger capabilities during the final-state oriented automation. Finally, Alibaba changes the traditional heavyweight mode of operating environments to the lightweight mode of native containers and Pods. Additionally, Alibaba completely hands over the PaaS capabilities to the Kubernetes controller. In this way, Alibaba builds a complete cloud native architecture.
Overcoming Challenges of Cloud Native
In the process of implementing cloud native technologies, Alibaba has transformed from using self-developed containers and scheduling systems, to embracing open-source standardized technologies. Alibaba recommends that developers use Kubernetes to directly build cloud native architectures. There are two reasons. The first of which is that Kubernetes is developed for platform builders and has become the mainstay in the cloud native ecosystem. In this way, Kubernetes not only shields the underlying details in the downstream direction, but also supports various peripheral business ecosystems in the upstream direction. The second of which is that increasing numbers of open-source projects developed by the community are built around Kubernetes, such as Service Mesh and Kubeflow.
One Question to Ask is What are Alibaba’s Suggestions to Help Developers Avoid Pitfalls.
The toughest challenge in the evolution to cloud native-based technical architecture lies in the management of Kubernetes. Kubernetes is still a relatively young system, and does not have a mature ecosystem for O&M and management. Managing tens of thousands of clusters is crucial for Alibaba to succeed. Through great determination, Alibaba decided on the following:
- Use Kubernetes’s self-management capabilities
- Adopt a node release rollback policy, and perform phased release according to rules
- Perform image splitting to divide environments into simulated and production environments
- Focus on the monitoring side to make Kubernetes more transparent and to discover, prevent, and solve problems quickly.
Multi-tenant management of Kubernetes is another key technical issue for Alibaba Cloud. Considering the limits of Namespaces such as poor scalability and naming conflicts, you can use Kubernetes to set up virtual clusters. In addition to high scalability, Kubernetes can implement strong API-layer isolation. Syncer is used to link virtual clusters and real clusters, and agents are added to nodes to improve multi-tenant management and resource utilization.
At the KubeCon + CloudNativeCon + Open Source Summit, Alibaba Cloud announced two major projects: App Hub and OpenKruise. App Hub is a Kubernetes application management center open to all developers. OpenKruise is a set of open source Kubernetes automation projects developed based on Internet scenarios worldwide.
Cloud Native App Hub can be considered as a mirror for Helm Hub in China. It allows users to easily obtain application resources and significantly simplifies the Kubernetes installation procedure. OpenKruise is committed to becoming a cloud native application automation engine that can solve most O&M pain points in large-scale application scenarios. At the conference, developers downloaded OpenKruise from Alibaba Cloud Container Registry by using Helm, and obtained a hands-on experience with various features in a Kruise application scenario, such as the in-place upgrade of stateful containers, sidecar container injections, and one-time broadcasting to all nodes. They also developed an initial understanding of the powerful automation capabilities of the OpenKruise project in large scale scenarios.
The Kruise project is developed based on the best practices of Alibaba’s large-scale application deployment, release, and management. The project is also developed based on the large-scale application O&M and website construction capabilities of the Alibaba Cloud container platform team, as well as the requirements of Alibaba Cloud on leveraging Kubernetes to serve thousands of customers. This project solves a number of Kubernetes’s automation issues in a number of different ways, from deployment, upgrade, elastic scaling, QoS scheduling, and health check to troubleshooting of migration errors.
The Original Presenter
If you don’t have an Alibaba Cloud account, sign up for a New User Free Trial and get $300–1200 USD worth in free trial products.