Everything There Is to Know about Alibaba Cloud’s Sixth-Generation ECS Instances

Alibaba Cloud
7 min readJan 14, 2020

--

Just this year, Alibaba Cloud has released the newest generation of its enterprise-grade Elastic Compute Service (ECS) instances. This new generation of ECS instances uses Intel® Cascade Lake processors, and being based on Alibaba Cloud’s proprietary X-Dragon computing platform, also use Alibaba Cloud’s next-generation Alibaba Virtual Switch (AVS), in-house 25-GB network architecture, and proprietary Enhanced SSDs (ESSDs) with an ultra-high IOPS. Therefore, these next-generation ECS instances have been fully upgraded in terms of metering, network, and storage.

Overall Upgrades

The newest generation of ECS instances are equipped with second-generation Intel® Xeon® Scalable processors, which feature the all-core turbo frequency up to 3.2 GHz and offer a 12% to 15% increase in single-core performance. Moreover, these instances use Alibaba Cloud’s proprietary X-Dragon computing platform, with CPU resources fully accessible to users. Therefore, the computing power of these sixth-generation ECS instances with the highest specifications is 20% higher than that of the last generation. When NUMA is enabled on the host, the memory access latency of an ECS instance is reduced by 30%.

In terms of storage, the next-generation enterprise-grade ECS instances fully support Alibaba Cloud’s proprietary ultra-high-performance ESSDs. This is the first Chinese product that provides the quality-of-level (QoS) capability for storage IOPS and bandwidth at the instance level to ensure stable and predictable storage performance for enterprise-level users. In addition, the performance of ultra disks was also upgraded. By default, the next-generation enterprise-grade ECS instances and ECS Bare Metal Instance are equipped with ultra disks that provide higher performance.

These instances provide a maximum network bandwidth of 25 Gbps and the increased packet forwarding rate up to six million PPS. The burstable bandwidth can be more than three times that of previous-generation instances. The number of elastic network queues was increased to 32, and the number of small and medium ENI queues has doubled. The number of network connections is guaranteed by instance-level QoS, which ensures predictable network performance for enterprise-level users.

Latest Generation Customized Processors

The sixth-generation ECS instances and ECS Bare Metal instances all come equipped with Intel® Xeon® Platinum 8269CY processors customized for Alibaba Cloud, which are built on the Cascade Lake microarchitecture for the second-generation Intel® Xeon® Scalable Platinum processors. Compared with the previous Intel® Xeon® Platinum 8163 processors that used the Skylake microarchitecture, 8269CY processors use Intel’s latest 14nm++ technology and therefore feature better integration and lower power consumption. In terms of microarchitecture, 8269CY processors represent a minor upgrade over the previous generation, providing 1MiB of L2 cache with 16-way associativity, a 400% capacity increase, and 2,933 MHz RAM. This means higher memory bandwidth and lower memory latency. 8269CY and 8163 processors have the same base frequency. However, the all-core turbo frequency of 8269CY processors reaches 3.2 GHz, which is 18.5% higher than 2.7 GHz of the previous generation, with a maximum turbo frequency of 3.8 GHz. Moreover, 8269CY processors have been added with Intel® AVX-512 VNNI instruction, which has increased their deep learning and reasoning capabilities by a factor of 11.

Alibaba Cloud’s Proprietary Computing Platform

The sixth-generation ECS instances and ECS Bare Metal Instance are based on Alibaba Cloud’s proprietary X-Dragon computing platform. The X-Dragon computing platform consists of the MoC NIC, X-Dragon software system, and X-Dragon Hypervisor. Through this architecture, the management software, network virtualization software, and storage virtualization software that run on physical servers in the traditional KVM virtualization solution are offloaded onto the MoC NIC. In this way, the servers are fully available to users.

The core of the MoC NIC is the X-Dragon chip. The X-Dragon software system runs on the X-Dragon chip to provide virtual private network (VPC) and EBS disk capabilities. It offers these capabilities to ECS instances and ECS Bare Metal instances through VirtIO-net and VirtIO-blk standard interfaces. If you start an ECS instance on a physical server, you must run X-Dragon Hypervisor on this physical server to virtualize the CPU and memory. However, unlike the traditional KVM Hypervisor, the X-Dragon Hypervisor is simplified to minimize resource usage and its interference with your ECS instances. If you get access to a physical server through ECS Bare Metal Instance, you can perform secondary virtualization without running any software on this physical server.

Figure 5 The X-Dragon computing platform

Alibaba Cloud’s Proprietary Network Platform

The next-generation ECS instances and ECS Bare Metal Instance use the Cloud Network Management platform developed by Alibaba Cloud to implement VPCs, server load balancing (SLB), NAT gateways, and other NFV capabilities. These instances can also perform east-west and north-south traffic forwarding.

Compared with the previous generation, the next-generation instances offload Alibaba Cloud’s proprietary VSwitches based on Cloud Network Management from physical servers to the MoC NIC for the first time. Then the VPC and SLB capabilities are provided through the VirtIO-net frontend driver. When a physical server is running, the DPDK loop process of the corresponding VSwitch needs to run on a single CPU node. As a result, your ECS instances that run on this physical server may experience differences in capabilities such as the packet forwarding rate and network latency. Meanwhile, the CPU turbo frequency and memory access bandwidth of your ECS instances on the same CPU node as the DPDK loop process are also affected. After the switch to the MoC NIC, this process no longer interferes with your ECS instances. In addition, Cloud Network Management and the MoC NIC have been comprehensively optimized to provide better network performance for your ECS instances.

Alibaba Cloud’s Proprietary Storage Platform

The next-generation ECS instances and ECS Bare Metal Instance are equipped with Alibaba Cloud’s proprietary ESSDs that deliver an ultra-high IOPS. A single ESSD can deliver one million IOPS. An ESSD consists of a VirtIO-blk frontend driver, a Kunpeng backend driver, and an Apsara Distributed File System cluster. The VirtIO-blk frontend driver runs on your ECS instance or ECS Bare Metal Instance, the Kunpeng backend driver runs in the MoC NIC, and the Apsara Distributed File System cluster is deployed separately.

ESSDs enable the multi-queue capability of the block layer of Linux kernel in the VirtIO-blk frontend driver. An ESSD can provide multiple hardware queues, which significantly improves the performance of concurrent access from multiple I/O processes to this ESSD. The ESSD Kunpeng backend driver uses multiple technologies to improve storage performance:

  • It uses vhost-user as the backend of vhost-blk and offloads the I/O dataplane from QEMU. In this way, the I/O request lifecycle completely bypasses QEMU.
  • It uses the SPDK framework to interact with NVMe SSDs and adopts a user-state driver and multi-queue adaptive polling mode.
  • The user-state driver switches to the coroutine and adds the run to completion mode. User-state drivers can avoid the performance loss that occurs when the system must call the kernel to handle each sent I/O request. Coroutines are implemented in user-state to further reduce the kernel thread scheduling overhead.

Apsara Distributed File System clusters have also been upgraded to improve storage performance:

  • We have upgraded the storage servers and NVMe SSDs and now use two 25 Gbps NICs.
  • During volume replica chain synchronization, we use the high-performance Remote Direct Memory Access (RDMA) protocol, adopt our in-house switch chips and VSwitches, and have implemented chip-level optimization for the RDMA protocol. This reduces end-to-end communication latency across VSwitches to two microseconds (4 KB + 128B ACK round robin latency).
  • We now append data in the Append Only mode, which records all write operations in sequence and then modifies the pointer. This greatly improves the NVMe SSD throughput.
Figure 7 Apsara Distributed File System — ESSD

Summary

Alibaba Cloud’s next-generation enterprise-grade ECS instances are equipped with the latest-generation Intel processors customized for Alibaba and Alibaba Cloud’s proprietary 25 Gbps MoC NIC. In terms of software, these instances use Alibaba Cloud’s proprietary X-Dragon Hypervisor, Cloud Network Management-based VSwitches, and ESSDs. The next-generation instances feature improved computing, network, and storage specifications, along with enhanced QoS capabilities for storage and networking. As a result, these instances provide enterprise-level users with more powerful and more stable computing services.

Original Source:

--

--

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com