Collaborative Cloud-Native Application Distribution across Tens of Thousands of Nodes in Minutes
By Xie Yuning, Luo Jing, and Deng Juan.
During the 2019 Double 11 Shopping Festival, all of Alibaba’s core systems were running completely on the cloud for the first time ever. During the event-the world’s busiest and biggest annual online shopping event-Alibaba Cloud withstood a peak of 544,000 transactions per second, proving once again that “cloud native” is the right solution for Double 11.
As an important piece of the infrastructure in the cloud-native domain of the Alibaba ecosystem and e-commerce machine, Alibaba Cloud’s Container Registry (ACR) satisfied all of Alibaba’s requirements in terms of large-scale distribution in the run-up to Double 11.
To handle these requirements, much planning was implemented, and ACR was updated in advance to provide the necessary performance improvements, as well as increases in terms of observability and stability in a large-scale distribution context. Moreover, to prepare for 2019’s big Double 11 event, several petabytes of image data were also added to the registry, which happened to pull hundreds of millions of images each month. The end result of all of these efforts is that Alibaba Cloud Container Registry was made fully geared to providing a cloud-native application delivery pipeline, in addition to several other features, which can meet the demands of Alibaba and its customers in the cloud-native era.
In this article, we are going to continue look at Alibaba Cloud Container Registry (ACR). In particular, we will discuss how this product can be used to address the new development needs and challenges that both Alibaba and its customers face in the cloud-native era. In particular, we will cover how this product can be used to improve your workflow and provide cloud-native application distribution across tens of thousands of nodes in minutes.
New Development Needs and Challenges
As cloud-native technologies prevail the market, growing rapidly in popularity, Kubernetes has become the de facto standard for containerized applications and a leader in the cloud-native space. It uses a declarative container orchestration and management system to standardize software delivery. Kubernetes provides a unified API mode that can define resources in Kubernetes clusters through using YAML-format files. These YMAL-format resource definitions allow Kubernetes to be integrated with upstream and downstream systems more easily, as well as allow you to be able to complete a series of operations more quickly, which would have been previously performed manually or by using non-standard scripts. At the same time, based on the application delivery scenarios and requirements, the Kubernetes community has also generated a series of cloud-native application delivery standards in addition to the resource definition files in the native YAML format, such as Helm Chart, Operator, and Open Application Model.
In addition to the new delivery standards for cloud-native applications, users now also have higher requirements for delivery methods. More and more users want and even require cloud-native applications to be delivered in a more secure, process-based, and automated manner. As a result, what was originally distribution across tens of thousands of nodes in minutes has progressed into what is now multi-stage collaborative distribution across tens of thousands of nodes in minutes. In addition, globalized business development means that, on top of the completion of each stage in minutes, global distribution is also a basic requirement. As such, higher requirements are placed on platforms that support the distribution of cloud-native applications.
By controlling container image sizes, using P2P for image layer distribution, and optimizing the Registry server, at Alibaba Cloud we have significantly improved the performance of large-scale distribution and can now complete distribution across tens of thousands of nodes in minutes. In particular we did the following:
- Optimized container image sizes to reduce image transmission costs through the creation of basic images. Basic images of frequently used applications and environments are reused to minimize the number of image layers and control the number of layers that are changed each time. Application images are streamlined through multi-stage image builds and by separating intermediate products from final products in the image creation process.
- Optimized the server-side processing performance to raise the request response rate. Servers now use multiple methods such as identifying hot images and caching popular data to handle concurrent pulls of large-scale image manifests.
- Optimized the methods that the client image layer uses to download, reducing image transmission time. Clients use Dragonfly to download container images, which greatly reduces the download time for image layers through a P2P-based method.
To enable enterprise users to enjoy these distribution capabilities, ACR officially launched ACR Enterprise Edition in March 2019. ACR Enterprise Edition provides enterprise-level cloud-native asset management, as well as the global and large-scale distribution of cloud-native applications. This service is suitable for enterprise-level customer who require a high level of security, need to deploy services across multiple regions, and have a large numbers of clusters and nodes. In addition, ACR Enterprise Edition further improves collaboration in terms of the hosting, delivery, and distribution of cloud-native assets during the distribution of cloud-native applications across tens of thousands of nodes in minutes.
Cloud-native Application Hosting
- ACR Enterprise Edition currently supports the full-lifecycle management of two types of cloud-native application assets: container images and Helm charts.
- The product provides independent network access control, which can control access policies for public and VPC networks in a fine-grained manner, allowing only compliant sources to access assets. This further ensures access security for cloud-native assets.
- The product also provides a transparent pull plug-in that allows users to pull container images in a transparent manner. This ensures that businesses can quickly pull images in elastic scenarios without business updates or abnormal scaling caused by incorrect credential configurations.
The Delivery of Cloud-Native Applications
In the production stage of cloud-native applications, you can directly upload cloud-native assets such as managed container images and Helm charts. You can also use the build function to automatically upload your own cloud-native assets from source code from Github, Alibaba Cloud, and GitLab and intelligently build a container image. To meet the need for more secure, process-based, and automated delivery of cloud-native applications, ACR Enterprise Edition introduced the cloud-native application delivery pipeline. The cloud-native application delivery pipeline starts with the hosting of cloud-native applications and ends with the distribution of cloud-native applications. The delivery pipeline is observable, traceable, and customizable. It allows you to implement global, muli-scenario automated delivery for a single change to an application. This greatly improves the efficiency and security of distributing cloud-native applications across tens of thousands of nodes.
In the cloud-native application delivery stage, you can automatically initiate static security scans and customize blocking policies. Once a high-risk vulnerability is detected in a static application, the service automatically blocks subsequent deployment links. You can update and optimize the application based on suggestions in the vulnerability report to build a new image version and then re-deliver the image.
The Distribution of Cloud-Native Applications
In the cloud-native application distribution stage, after the front-facing stage is completed without interruption, cloud-native applications officially enter the global and large-scale distribution stage. To ensure that distribution across tens of thousands of nodes can be accomplished in minutes, ACR works with other Alibaba Cloud products seamlessly, including Alibaba Cloud Container Service, Elastic Container Instance (ECI), to provide an exceptional peer-to-peer distribution experience. For global distribution, the global synchronization efficiency of cloud-native applications is seven times higher than that of manual synchronization due to optimizations such as fine-grained synchronization policy scheduling and synchronization link optimization.
To implement large-scale peer-to-peer distribution, Dragonfly-based distribution solutions were repeatedly optimized for cloud environments. Ultimately, at Alibaba we also incorporated multiple innovative technologies to resolve various file distribution issues in scenarios like large-scale file downloading and cross-network isolation, greatly improving the capability of large-scale container image distribution. ACR’s average large-scale image distribution efficiency is several times higher than that generated by normal methods, and it suits scenarios where an individual container cluster has 100 or more nodes.
In addition to large-scale peer-to-peer distribution, this product also supports large-scale distribution based on image snapshots to better meet the need for large-scale distribution in specific scenarios. The image snapshot-based distribution method can avoid or reduce image layer downloads, greatly accelerating the creation of container groups. When working with Container Service for Kubernetes (ACK) and Elastic Container Instance (ECI), ACR can pull images on 500 nodes in seconds, enabling rapid scaling in response to sudden business changes.
Specific improvements and optimizations in stability are being made in several aspects, including monitoring and alert, fault tolerance and disaster recovery, dependency management, throttling and degradation, and capacity planning.
- In terms of dependency management, the platform provides unified management for key stages and external dependencies in the cloud-native application delivery pipeline. This improves the overall delivery capability of the delivery pipeline and helps users identify hot repositories and track specific execution results of the delivery pipeline.
- In terms of throttling and degradation, the platform analyzes and identifies primary and secondary business functions in the core stages of cloud-native application distribution. It gives priority to ensuring that the main business logic is completed, while the secondary business logic can be degraded and handled later.
- In terms of capacity planning, the platform scales resources on demand based on upstream and downstream business changes to ensure the normal delivery of cloud-native applications.
Alibaba Cloud Ecosystem Integration
Based on the rich integration capabilities provided by the Alibaba Cloud platform, you can use ACR Enterprise Edition as a piece of your infrastructure for cloud-native asset hosting and distribution so to be able to deliver cloud-native applications to your customers. ACR Enterprise Edition works to build a container application market in Alibaba Cloud Marketplace, supports container product hosting and commercial distribution in the container application market, and builds a closed-loop cloud-native ecosystem. Independent software vendors (ISVs), such as Intel, Fortinet, and Authine, have already released containerized products on the cloud marketplace in the form of container images or Helm charts, achieving standardized delivery and commercialization. Customers can also obtain high-quality official Alibaba Cloud and ISV-provided container images from the container application market and quickly deploy them to Container Service clusters. As such, they can enjoy the rich cloud-native ecosystem of Alibaba Cloud.
Having supported the large-scale distribution demands of Double 11, Alibaba Cloud Container Register (ACR) can also provide comprehensive solutions for the cloud-native asset hosting and the distribution needs of Alibaba and its consumers. ACR can support the construction of a closed-loop cloud container ecosystem, making it a core piece of the infrastructure of the cloud-native space. In the future, Alibaba Cloud will continue to enrich ACR to provide users with an exceptional cloud-native application distribution experience that also offers superior performance.