Building a High Performance Container Solution with Super Computing Cluster and Singularity

Alibaba Cloud Elastic High Performance Computing (E-HPC) service provides users with one-stop HPC services over the public cloud based on the Alibaba Cloud infrastructure. E-HPC automatically integrates the IaaS-layer hardware resources to provide users with on-cloud HPC clusters. In addition, E-HPC further supports the high availability of on-cloud HPC services and provides new features, such as CloudMetrics multidimensional performance monitoring and low-cost resumable computing, to allow users to use on-cloud HPC services in a better and more cost-effective manner.

This article introduces the elastic high performance container solution launched by Alibaba Cloud Super Computing Cluster and its applications in the Molecular Dynamics (MD) and AI fields.

High Performance Container: Singularity

Singularity is a container technology developed by the Lawrence Berkeley National Lab specifically for large-scale and cross-node HPC and DL workloads. Singularity features lightweight, fast deployment, and convenient migration. It supports conversion from Docker images to Singularity images. Singularity differs from Docker in the following aspects:

User Permissions

Singularity can be started by both root and non-root users. Before and after the startup of the container, the user context remains unchanged. Therefore, user permissions are the same both inside and outside the container.

Performance and Isolation

Singularity emphasizes the convenience, portability, and scalability of the container service, and weakens the high isolation of the container process. Therefore, Singularity is more lightweight, has a smaller kernel namespace, and results in less performance loss.

Image for post
Image for post

The following figure shows the HPL performance data measured when different containers are used on a single Shenlong bare metal server (ecs.ebmg5.24xlarge, Intel Xeon (Skylake) Platinum 8163, 2.5 GHz, 96 vCPUs, and 384 GB). As shown in the following figure, the HPL performance measured when the Singularity container is used is slightly better than that when the Docker container is used, and is equivalent to that of the host.

Image for post
Image for post

HPC-Optimized

Singularity is highly suitable for scenarios where HPC is used. It allows full utilization of host software and hardware resources, including the HPC scheduler (PBS and Slurm), cross-node communication library (IntelMPI and OpenMPI), network interconnection (Ethernet and InfiniBand), file systems, and accelerators (GPU). Users can use Singularity without having to perform extra adaptation to HPC.

Image for post
Image for post

E-HPC Elastic High Performance Container Solution

Alibaba Cloud E-HPC integrates the open source Singularity container technology. While supporting the rapid deployment and flexible migration of user software environments, E-HPC also ensures the high availability of on-cloud HPC services and compatibility with existing E-HPC components, delivering an efficient and easy-to-use elastic and high performance container solution to users.

Image for post
Image for post

Users only need to pack and upload their local software environments to the Docker Hub to complete the entire process on the E-HPC console: Create a cluster > Pull an image > Deploy a container application > Submit a job > Monitor the performance and query monitoring results. In this way, users can reduce the HPC costs and improve the scientific research efficiency and production efficiency.

Image for post
Image for post

Singularity Deployment Cases

Case 1: Run the NAMD Container Job on Multiple SCC Nodes

NAMD is a type of mainstream MD simulation software featuring good scalability and high parallel efficiency. It is often used to process large-scale molecular systems. In the following description, we assume that a Singularity image containing Intel MPI, NAMD, and inputfile is created based on the image docker.io/centos:7.2.1511. The PBS scheduler is used to submit the NAMD container job and a local job sequentially to four SCC nodes (ecs.scch5.16xlarge, Intel Xeon (Skylake) Gold 6149, 3.1 GHz, 32 physical cores, and 192 GB). The PBS job script is as follows:

#!/bin/sh
#PBS -l ncpus=32,mem=64gb
#PBS -l walltime=20:20:00
#PBS -o namd_local_pbs.log
#PBS -j oe

To differences in CPU usage, RoCE network bandwidth, and software execution efficiency between running NAMD in the container and running NAMD directly on the host, the performance monitoring tool CloudMetrics that comes with E-HPC is used to monitor the usage of SCC cluster resources. The functions of CloudMetrics have been introduced in previous articles, particularly GROMACS and WRF. The following figure shows the resource usage information for each node.

Image for post
Image for post

As shown in this figure, the cluster resource usage is basically the same no matter whether NAMD is executed in the container or directly on the host. That is, the CPU is fully loaded and the RoCE network bandwidth remains at about 1.3 Gbit/s on each of the four nodes. The job execution times are 1324s and 1308s, respectively. This result indicates that Singularity is not only highly adapted to the host scheduler, MPI parallel library, and RoCE network, but also can ensure efficient execution of container jobs. The performance loss is less than 2% if container jobs are executed in Singularity, instead of on the host.

Case 2: Run the TensorFlow Image Classification Container Job on an EGS Instance

CIFAR-10 is a classic dataset in the image recognition field. In the following description, it is assumed that a Singularity and a Docker container that contain the image classification model are created based on the image docker.io/tensorflow/tensorflow: latest-devel-gpu-py3. Based on these two containers, training is carried out on a single EGS node (ecs.gn5-c8g1.4xlarge, Intel Xeon E5–2682v4, 2.5 GHz, 16 vCPUs, 120 GB, and 2 P100s). The command lines are as follows:

# Run the job in the Singularity container
singularity exec --nv /opt/cifar10.sif python /cifar10/models/tutorials/image/cifar10/cifar10_multi_gpu_train.py --num_gpus=2

CloudMetrics is used to monitor the jobs. The following figure shows the resource usage information of the node.

Image for post
Image for post

As shown in the figure, no significant difference is found in the resource usage when training of the TensorFlow image classification model is conducted in the Singularity or Docker container. The CPU usage remains at 75%, and the usage of a single GPU ranges from 30% to 40%. Regarding the training efficiency, the training durations for 100,000 steps are 1432s and 1506s, respectively. This indicates that the Singularity container is not only highly adapted to the host GPU and CUDA, and but also has slightly higher job execution efficiency than the Docker container.

Conclusion

Alibaba Cloud Super Computing Cluster integrates the open source Singularity container technology to deliver an efficient and easy-to-use on-cloud elastic high performance container solution. This solution greatly reduces users’ cloud migration costs and improves their scientific research efficiency.

Reference:https://www.alibabacloud.com/blog/building-a-high-performance-container-solution-with-super-computing-cluster-and-singularity_594569?spm=a2c41.12663314.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store