Collaborative Cloud-Native Application Distribution across Tens of Thousands of Nodes in Minutes

By Xie Yuning, Luo Jing, and Deng Juan.

During the 2019 Double 11 Shopping Festival, all of Alibaba’s core systems were running completely on the cloud for the first time ever. During the event-the world’s busiest and biggest annual online shopping event-Alibaba Cloud withstood a peak of 544,000 transactions per second, proving once again that “cloud native” is the right solution for Double 11.

The picture above shows the GMV of 2019’s Double 11 event

As an important piece of the infrastructure in the cloud-native domain of the Alibaba ecosystem and e-commerce machine, Alibaba Cloud’s Container Registry (ACR) satisfied all of Alibaba’s requirements in terms of large-scale distribution in the run-up to Double 11.

To handle these requirements, much planning was implemented, and ACR was updated in advance to provide the necessary performance improvements, as well as increases in terms of observability and stability in a large-scale distribution context. Moreover, to prepare for 2019’s big Double 11 event, several petabytes of image data were also added to the registry, which happened to pull hundreds of millions of images each month. The end result of all of these efforts is that Alibaba Cloud Container Registry was made fully geared to providing a cloud-native application delivery pipeline, in addition to several other features, which can meet the demands of Alibaba and its customers in the cloud-native era.

In this article, we are going to continue look at Alibaba Cloud Container Registry (ACR). In particular, we will discuss how this product can be used to address the new development needs and challenges that both Alibaba and its customers face in the cloud-native era. In particular, we will cover how this product can be used to improve your workflow and provide cloud-native application distribution across tens of thousands of nodes in minutes.

New Development Needs and Challenges

The evolution of cloud-native application delivery standards

In addition to the new delivery standards for cloud-native applications, users now also have higher requirements for delivery methods. More and more users want and even require cloud-native applications to be delivered in a more secure, process-based, and automated manner. As a result, what was originally distribution across tens of thousands of nodes in minutes has progressed into what is now multi-stage collaborative distribution across tens of thousands of nodes in minutes. In addition, globalized business development means that, on top of the completion of each stage in minutes, global distribution is also a basic requirement. As such, higher requirements are placed on platforms that support the distribution of cloud-native applications.

New Practices

  • Optimized container image sizes to reduce image transmission costs through the creation of basic images. Basic images of frequently used applications and environments are reused to minimize the number of image layers and control the number of layers that are changed each time. Application images are streamlined through multi-stage image builds and by separating intermediate products from final products in the image creation process.
  • Optimized the server-side processing performance to raise the request response rate. Servers now use multiple methods such as identifying hot images and caching popular data to handle concurrent pulls of large-scale image manifests.
  • Optimized the methods that the client image layer uses to download, reducing image transmission time. Clients use Dragonfly to download container images, which greatly reduces the download time for image layers through a P2P-based method.
Optimization policies for large-scale image distribution

To enable enterprise users to enjoy these distribution capabilities, ACR officially launched ACR Enterprise Edition in March 2019. ACR Enterprise Edition provides enterprise-level cloud-native asset management, as well as the global and large-scale distribution of cloud-native applications. This service is suitable for enterprise-level customer who require a high level of security, need to deploy services across multiple regions, and have a large numbers of clusters and nodes. In addition, ACR Enterprise Edition further improves collaboration in terms of the hosting, delivery, and distribution of cloud-native assets during the distribution of cloud-native applications across tens of thousands of nodes in minutes.

Cloud-native Application Hosting

  • The product provides independent network access control, which can control access policies for public and VPC networks in a fine-grained manner, allowing only compliant sources to access assets. This further ensures access security for cloud-native assets.
  • The product also provides a transparent pull plug-in that allows users to pull container images in a transparent manner. This ensures that businesses can quickly pull images in elastic scenarios without business updates or abnormal scaling caused by incorrect credential configurations.
Delivery of cloud-native applications by ACR Enterprise Edition

The Delivery of Cloud-Native Applications

Creating a cloud-native application delivery pipeline in the console

In the cloud-native application delivery stage, you can automatically initiate static security scans and customize blocking policies. Once a high-risk vulnerability is detected in a static application, the service automatically blocks subsequent deployment links. You can update and optimize the application based on suggestions in the vulnerability report to build a new image version and then re-deliver the image.

The Distribution of Cloud-Native Applications

Global distribution of cloud-native applications

To implement large-scale peer-to-peer distribution, Dragonfly-based distribution solutions were repeatedly optimized for cloud environments. Ultimately, at Alibaba we also incorporated multiple innovative technologies to resolve various file distribution issues in scenarios like large-scale file downloading and cross-network isolation, greatly improving the capability of large-scale container image distribution. ACR’s average large-scale image distribution efficiency is several times higher than that generated by normal methods, and it suits scenarios where an individual container cluster has 100 or more nodes.

The peer-to-peer-based distribution process

In addition to large-scale peer-to-peer distribution, this product also supports large-scale distribution based on image snapshots to better meet the need for large-scale distribution in specific scenarios. The image snapshot-based distribution method can avoid or reduce image layer downloads, greatly accelerating the creation of container groups. When working with Container Service for Kubernetes (ACK) and Elastic Container Instance (ECI), ACR can pull images on 500 nodes in seconds, enabling rapid scaling in response to sudden business changes.

The image snapshot-based distribution process

New Platform

  • In terms of dependency management, the platform provides unified management for key stages and external dependencies in the cloud-native application delivery pipeline. This improves the overall delivery capability of the delivery pipeline and helps users identify hot repositories and track specific execution results of the delivery pipeline.
  • In terms of throttling and degradation, the platform analyzes and identifies primary and secondary business functions in the core stages of cloud-native application distribution. It gives priority to ensuring that the main business logic is completed, while the secondary business logic can be degraded and handled later.
  • In terms of capacity planning, the platform scales resources on demand based on upstream and downstream business changes to ensure the normal delivery of cloud-native applications.
Policies to ensure platform stability

Alibaba Cloud Ecosystem Integration

The process of the container application market


Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.