Using RDMA on Container Service for Kubernetes

What Is RDMA?

Remote direct memory access (RDMA) is developed to handle the latency of data processing on servers during network transmission.

In RDMA, data to be transmitted is transferred directly from the memory of one computer to that of another computer, without involving any operating systems or protocol stacks. Because the communication process bypasses operating systems and protocol stacks, RDMA can greatly lower the CPU usage, decrease memory replication in the kernel, and reduce context switches between the user mode and kernel mode.

Common RDMA implementations include RDMA over Converged Ethernet (RoCE), InfiniBand, and iWARP.

Image for post
Image for post

Alibaba Cloud’s Support for RDMA

Alibaba Cloud supports Super Computing Cluster (SCC), RoCE, and Virtual Private Cloud (VPC). RoCE is dedicated to RDMA communication. SCC is mainly used in high-performance computing, artificial intelligence, machine learning, scientific computing, engineering computing, data analysis, audio and video processing, and other scenarios.

RoCE can provide a network speed comparable with the network performance of InfiniBand. It can also support more Ethernet-based applications.

Learn more about Alibaba Cloud ECS Bare Metal Instance and Super Computing Clusters at https://www.alibabacloud.com/help/doc-detail/60576.htm

You can directly purchase a yearly or monthly package of SCC virtual machines on the Elastic Compute Service (ECS) console. For more information, visit https://www.alibabacloud.com/help/doc-detail/61978.htm

Container Service’s Support for RDMA

Currently, Alibaba Cloud Container Service supports RDMA. You can add SCC ECS instances to a container cluster and deploy an RDMA device plug-in to support RDMA at the scheduling level.

You can run the resourcesLimit rdma/hca: 1 statement to schedule containers to RDMA ECS instances.

Create a Container Cluster

Log on to the Container Service console, and then create a Kubernetes cluster. Because SCC is currently supported only in Shanghai, you need to select China East 2 (Shanghai) for the region of the container cluster to be created. After setting other parameters, click to create the cluster and wait until it is successfully created.

Deploy an RDMA Device Plug-in

On the Container Service console, use a template to deploy a plug-in. Deploy a device plug-in that supports RDMA. Select the corresponding cluster and namespace. The template is shown in the following figure.

apiVersion: v1
kind: ConfigMap
metadata:
name: rdma-devices
namespace: kube-system
data:
config.json: |
{
"mode" : "hca"
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: rdma-device-plugin
namespace: kube-system
spec:
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: rdma-sriov-dp-ds
spec:
hostNetwork: true
tolerations:
- key: CriticalAddonsOnly
operator: Exists
containers:
- image: registry.cn-shanghai.aliyuncs.com/acs/rdma-device-plugin
name: k8s-rdma-device-plugin
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: config
mountPath: /k8s-rdma-sriov-dev-plugin
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: config
configMap:
name: rdma-devices
items:
- key: config.json
path: config.json

Manually Add an SCC ECS Instance to the Cluster

Image for post
Image for post
Image for post
Image for post

Deploy Two Test Images

apiVersion: v1
kind: Pod
metadata:
name: rdma-test-pod
spec:
restartPolicy: OnFailure
containers:
- image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60
name: mofed-test-ctr
securityContext:
capabilities:
add: [ "IPC_LOCK" ]
resources:
limits:
rdma/hca: 1
command:
- sh
- -c
- |
ls -l /dev/infiniband /sys/class/net
sleep 1000000
---
apiVersion: v1
kind: Pod
metadata:
name: rdma-test-pod-1
spec:
restartPolicy: OnFailure
containers:
- image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60
name: mofed-test-ctr
securityContext:
capabilities:
add: [ "IPC_LOCK" ]
resources:
limits:
rdma/hca: 1
command:
- sh
- -c
- |
ls -l /dev/infiniband /sys/class/net
sleep 1000000

Run ib\_read\_bw -q 30 in a container.

Image for post
Image for post

Run ib\_read\_bw -q 30 <IP address of the preceding container> in another container.

Image for post
Image for post

Test results show that data can be transmitted between two containers through RDMA. The bandwidth is 5,500 Mbit/s, which is about 44 Gbit/s.

Note: An RDMA communication connection is usually established through TCP or RDMA_CM. If an application chooses the RDMA_CM mode, the assigned IP address of the pod in the VPC plug-in cannot be used as the RDMA_CM address. You need to configure a host network for the container and set bond0 ip as the RDMA_CM communication address.

Reference:https://www.alibabacloud.com/blog/using-rdma-on-container-service-for-kubernetes_594462?spm=a2c41.12560487.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store