The Network Architecture and Network Management System behind This Year’s Double 11

The GMV of 2019 Double 11

The gross merchandise volume of this year’s Double 11 Shopping Festival reached a whopping 268.4 billion CNY, and all of Alibaba’s core systems were running right on Alibaba Cloud, meaning that Alibaba Cloud was powerful enough to not only withstand but also support one of the world’s largest traffic peaks ever recorded in the history of the Internet.

The Technology Stack for the Big Event

Simplified e-commerce system architecture

All the core business component systems were up in the cloud this year, running on Alibaba Cloud’s host of products, services, and solutions, including those specifically designed for computing, storage, networks, and databases.

Due to the huge processing capacity needed to run Alibaba’s e-commerce platforms, these business products, components and modules were and are deployed in a distributed manner. In addition, there are massive requests for product-to-product, component-to-component, and even module-to-module communication. Alibaba Cloud’s Cloud Network Management system, nicknamed Luoshen, can support all of these communication requests.

What Is the Cloud Network Management System?

The Architecture behind Alibaba Cloud’s Apsara Distributed Operating System

The kernel for Alibaba Cloud’s Apsara Distributed Operating System provides some of the most basic of system services and virtualizes basic resources, especially the computing, storage, and network resources of the system. Next, the Cloud Network Management system also provides virtual network services, such as the Virtual Private Cloud (VPC), Software-Defined Network (SDN) controllers, and Server Load Balancer (SLB) network elements (NEs). In short, Cloud Network Management is the core component of the kernel of the Apsara Distributed Operating System, which provides all the functions of the cloud computing network.

The Features of Cloud Network Management

Complete In-house Development by Alibaba Cloud

Alibaba Cloud’s cloud networks

These products are developed based on the Cloud Network Management system, with core business code wholly developed in-house at Alibaba Cloud. So far, the code has accumulated to a mind-boggling millions of lines of code. The technical solutions and business logic of all the underlying software systems and hardware devices, again, were completely the creation of Alibaba Cloud. Therefore, Alibaba Cloud’s Alibaba Virtual Switch (AVS) is utterly different from the Open Virtual Switch (OVS) in various aspects, including its proprietary forwarding entry design and in terms of packet processing.

SDN

Alibaba Cloud’s SDN architecture

Forwarding NEs are programmable in both their software and hardware modes. Moreover, all related business logic is implemented based on software code. Custom channel communication protocols are supported among SDN controllers. Software and hardware are both integrated and completely scalable.

Massive Scale

How Does Cloud Network Management Support Double 11?

Then, you may think, during the Double 11 Shopping Festival, what are the specific challenges that face the Cloud Network Management system, and how does Cloud Network Management tackle these challenges?

Ultra-large Scale

Network devices at the logical level are composed of control and data forwarding devices. At the control layer, centralized SDN controllers use the traditional method, and the performance of delivering forwarding entries is low. As a result, the launch of virtual instances is slow, which affects the business provisioning efficiency and switchover efficiency. Therefore, the control system of Cloud Network Management adopts a hierarchical cluster architecture. While improving the centralization capability, it brings a large number of virtual instances online. This greatly improves the management configuration and table entry processing performance.

The hierarchical control architecture of Cloud Network Management

At the data forwarding layer, Cloud Network Management provides a technical architecture that integrates both the software and hardware sides. The VSwitch is upgraded based on the traditional DPDK architecture to support quick forwarding by programmable hardware.

Cloud Network Management VSwitch based on programmable hardware

Compared with traditional software VSwitches, programmable hardware-based VSwitches improve the forwarding performance by about 10 times and reduce the latency by more than half it was previously.

The rapid increase in the public network and cross-domain bandwidth also poses a great challenge to the performance of data plane development kit (DPDK) virtual gateways. With this, the device quantity increases, which increases management complexity and supply costs. Moreover, the single-core CPU capability is limited, and therefore cannot support burst traffic and high-bandwidth single-stream traffic, affecting normal communication.

The software-hardware integrated gateway of Alibaba Cloud’s Cloud Network Management

However, through the technical architectural upgrade of the virtual gateway, software and hardware integrated gateways are supported. Moreover, the business logic is implemented in the programmable P4 language. External interfaces are compatible with software virtual gateways. Therefore, compared with the traditional 32-bit software architecture, the programmable hardware gateway improves forwarding performance dozens of times over, while also effectively preventing high-bandwidth single-streams to impact the single CPU core. In the software-hardware integrated architecture of Cloud Network Management, the traffic peak during the Double 11 Shopping Festival is well dispersed.

High Stability

Cloud Network Management ensures stable network communication through its architecture. Businesses are deployed by zones, and gateways for public network and cross-domain access are deployed in clusters in different zones to prevent the impact of single point of failures (SPOFs). In addition, data is backed up between multiple zones.

Reliable deployment architecture of Cloud Network Management gateways

Complex Traffic Model

Consider Alibaba Group’s online and offline businesses for example. One major offline business is big data. Large-traffic consumers of big data often encounter the issue that the system’s bandwidth may be fully taken up due to traffic spikes, which can in turn cause packet loss issues. Online businesses generally require less traffic but are more sensitive to latency and packet loss issues, therefore they require that the relevant cloud networks support traffic classification, so that, in the case that the network is congested, low-priority traffic is discarded to protect the main offline and online business.

Differentiated network requirements of businesses

The Cloud Network Management supports different QoSs for different business scenarios. For businesses that require high bandwidth but are not concerned with packet loss issue, the priority of communication packets is set to low so that high-priority packets are not discarded upon traffic spikes and so that complex traffic models are well supported.

Efficient O&M

The Apsara Network Intelligence, the O&M platform of Cloud Network Management, is a distributed and intelligent big data O&M system that integrates the massive amounts of data on Alibaba Cloud through using big data and AI analysis capabilities, to help with the locating of system faults and conducting emergency measures much faster.

Architecture of Alibaba Cloud’s Apsara Network Intelligence

Based on the underlying network and virtual network data streams, logs, and device statuses, the blink-based big data analysis platform can quickly determine the status of network and identify the root cause of a fault to implement automatic emergency measures before the user even knows that a fault occurred. In addition, all of the typical, common faults are included in daily fault drills, so help to ensure even more efficient network O&M. The smart network is another powerful tool provided by Alibaba Cloud’s Cloud Network Management to support the Double 11 Shopping Festival.

From the above discussion, one thing is clear: the Cloud Network Management system is being continuously improved. And, with the evolution from the DPDK NEs in version 1.0 to software-hardware integrated NEs in version 2.0, Alibaba’s Cloud Network Management system has significantly improved in terms of network capability, which has also allowed all the core businesses of Alibaba Group to run on the cloud. In the future, Cloud Network Management will strive to be even more elastic and be open to provide users with an even better overall experience.

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.