Zero-Copy Optimization for Alibaba Cloud Smart NIC Solution

Project background

During the VPC product deployment, Virtual Switch plays the role of encap/decap for the network protocol in the overlay layer and the underlay layer. It is also responsible for the routing, QoS, traffic limiting, and security groups for Layer 2 and Layer 3 on multi-tenant (virtual machine or container) hosts. Currently there are Open vSwitch (OVS) open source projects that are widely accepted in the industry. On Alibaba Cloud, we have also developed AVS projects that are suitable for our business. Application Virtual Switch (AVS) has similar functions to OVS, and it is also a pure software implementation. During our use, we found some problems, which led to the Smart NIC project.

Virtio or SR-IOV?

It is well known that VF devices using PCI-E SRIOV technology are presented in the VM by Pass-Through, which is characterized by high performance and rich functions. But it also has obvious drawbacks, such as inconvenient hot-plug, inconsistent interfaces of VF devices provided by different manufacturers, and a requirement for additional drivers. As a relatively common virtual device interface in the industry, Virtio can solve the above mentioned problems, but another problem is that there is yet no hardware manufacturer that fully supports devices with the Virtio interface, which is an implementation using software simulation, so the performance is not satisfactory. So, is there a solution that has high performance and supports Virtio too?

Smart NIC Solution

The figure below illustrates the framework design of our Smart NIC. There is a standard NIC ASIC and an SoC. The SoC has built-in memory and CPU. AVS runs on the SoC, offloading the fast-path to the ASIC, and is only responsible for the slow-path processing. The separation of the fast- and slow-paths not only reduces the processing load of AVS, but also greatly improves the throughput.

Message Receiving Path

Before mentioning the message receiving path, it is necessary to introduce how the receiving queue is initialized in DPDK. First, the software needs to allocate a buffer as a queue in memory. The data structure in the queue is referred to as a descriptor, which is the interface for interaction between software and hardware, or between frontend and backend drivers if it is a para-virtualized analog device such as virtio-net. In addition, the software also needs a memory pool, which contains fixed-size data buffers for carrying network messages sent through the NIC.

Optimized Zero-Copy Receiving Method

As can be seen from the above process, the message is copied twice in the entire path when it is sent to the virtual machine. First it is copied from the hardware DMA to the host memory, and then it is copied from the soft forwarding program to the virtual machine’s guest memory. The second copy is an operation that especially consumes a lot of CPU time. It also can be seen that memory copies take up a large proportion in the subsequent performance profiling. So, is it possible to reduce one copy?

  1. Establishing a 1:1 mapping between the frontend virtio-net and the description of the SR-IOV VF queue.
  2. Monitoring changes of the frontend virtio-net receiving queue descriptor and synchronizing the changes to the receiving queue of the VF. Thus, in the VF’s receiving queue, the memory pool only needs to directly use caches passed by the virtual machine instead of caching the message received from the network card.
  3. Converting the guest physical address (GPA) to the host physical address (HPA). In the VF of virtio-net, the driver fills in the GPA. In the case of not using IOMMU, the GPA needs to be converted into an HPA, so that Smart NIC can transfer the DMA.

Summary

In this article, we introduced the motivation, functional framework, and soft forwarding program of Smart NIC developed by Alibaba Cloud, as well as problems found during the soft forwarding process and corresponding optimization methods.

  1. Reduces memory copy at the receiver.
  2. Reduces footprints, avoiding cache-coherency.
  3. Improves overall performance by 40%.
  1. Reduces the time overhead of copying.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com