Compare and Compute: High-Performance Computing with Alibaba Cloud

Image for post
Image for post

Weighing up the essentials of high performance, compute intensive, robust and reliable technology stacks with Alibaba Cloud’s world-class cloud compute platforms and solutions.

With China poised to reach exascale supercomputing capability this year[1], it may come as no surprise that Alibaba Cloud offers an impressive range of high performance, next generation, cloud compute products and services. Our compute products support enterprise scale, resource intensive systems that implement artificial intelligence (AI) and machine learning (ML) software, video and audio processing, as well as big data computations, simulations, modelling, and analysis.

Even the compute behemoths of old are facing up to the stiff competition coming from the likes of Alibaba Cloud[2], and some are now teaming up with Alibaba Cloud to strengthen and enhance their own compute product offerings[3].

It’s All about the Hardware

Alibaba Cloud’s products and services rely on state-of-the-art hardware, with a special emphasis on the hardware supporting the Alibaba Cloud compute product range. Alibaba Cloud’s research and development teams work constantly to improve cloud compute specifications. Last autumn, Alibaba Cloud’s DAMO team announced they had developed a new AI chip for Alibaba Cloud’s underlying hardware. The Hanguang 800[4] accelerates compute intensive tasks whilst ensuring efficiency and resilience and is a welcome addition to the Alibaba Cloud infrastructure.

In this blog we’ll take a look at two of Alibaba Cloud’s high-end compute products: the Super Computing Cluster (SCC) and the Elastic-High Performance Computing (E-HPC) all-in-one HPC High-performance Public Computing as a Service (HPCaaS) product.

Super Computing Cluster (SCC)

Alibaba Cloud’s Super Computing Cluster (SCC) is the latest plug-and-play, high performance cluster solution that supports compute intensive tasks. Alibaba Cloud’s SCC is made up of Alibaba Cloud Elastic Bare Metal (EBM) server instances, each one having an extraordinary performance specification which makes scaling up a breeze. EBMs offer up to 96 CPU vCores (Skylake), 512GB of memory, and up to eight Graphics Processing Unit (GPU) cards (Pascal or Volta) depending on your requirements[5].

EBMs guarantee the same elasticity as virtual servers, as well as the high-performance features of physical servers, including physical isolation. They communicate using high-speed Remote Direct Memory Access (RDMA) over Converged Ethernet[6] network which rivals the speed of InfiniBand[7]. The optional plug-in accelerating GPUs ensure high bandwidth and low latency for compute intensive tasks. CPUs combined with GPUs support extraordinary compute requirements and the GPUDirect RDMA option offers the revolutionary Nvidia GPU option[8].

SCC eliminates network bottlenecks, significantly improving cluster acceleration. EBM cluster nodes can be deployed in minutes and scaled as necessary, in equally efficient timescales.

Image for post
Image for post
Alibaba Cloud’s Super Computing Cluster

Alibaba Cloud’s SCC has flexible Pay-As-You-Go billing and is also highly secure.

Elastic-High Performance Computing (E-HPC)

SCC is E-HPC with bells on. That means that if you can tolerate a small level of compute loss here and there, Alibaba Cloud’s Elastic-High Performance Computing (E-HPC) may be a better option.

E-HPC is an end-to-end public cloud service, also known as a High Performance Compute as a Service (HPCaaS) cloud computing platform[9]. E-HPC includes the following specifications:

  1. Parallel scheduling with open source solutions PBS Pro and Slurm.

Alibaba Cloud’s Super Computing Cluster and E-HPC solutions have similarly cutting-edge specifications, although E-HPC misses out on the advantages of EBM instance isolation. Even so, E-HPC performs in equal measure and is a good choice if you don’t need the extra assurance coming from isolated physical instances.

Like SCC, E-HPC is built up of high-performance elastic instances (Intel Skylake CPU), the RoCE v2 RDMA network, and Nvidia P100/V100 GPUDirect options. All these maximize network and compute performance, availability, low latency, and reliability.

Like SCC, E-HPC integrates with a range of Alibaba Cloud products and services to boost and expand the technology stack. You can also combine SCC with E-HPC for full power optimization. To demonstrate this, Alibaba Cloud engineers and developers tested an automotive wind tunnel and compared it to a digital simulated wind tunnel running on SCC and E-HPC at the 2018 Computing Conference held in Hangzhou.[11]

Image for post
Image for post

How do SCC and E-HPC Stack Up against Regular Physical and Virtual Servers?

As you can see from the table, SCC and E-HPC stack up against regular and virtual servers pretty well. Remember, SCC runs Bare Metal instances that perform like physical machines so the E-HPC solution would perform at the Virtual server level in this table.

Image for post
Image for post
Comparison of ECS Bare Metal Instances, physical machines, and virtual machines[12]

Exploring Manual Options for Building an E-HPC or SCC Solution

Building a similar physical infrastructure in-house would take considerable time and expense. Anything is possible, but the effort would be enormous, costly, and probably unnecessary.

Likewise, building a solution to match SCC or E-HPC using Alibaba Cloud products and services would not be without complications. Remember, Alibaba Cloud’s plug-and-play products have eliminated much of the pain of infrastructure build and management such as upgrade concerns, security, and software license management.

Safe, Secure, Supported

Obviously, there is a higher level of security available when using isolated physical servers like the Elastic Bare Metal instances in SCC. Even so, Alibaba Cloud’s built-in security, which applies to all Alibaba Cloud products and services, including E-HPC and everything inside the cluster infrastructure, is more than enough to keep you secure.

You also have the choice of enhanced plugin security defenses from the Alibaba Cloud security product range as well as a wide range of support options too.

Documentation and Community

Alibaba Cloud publishes all its documentation for free online and the community site is a daily port of call for system administrators running Alibaba Cloud products and services.

Amongst Friends

If you are already using or have decided to implement your technology stack with Alibaba Cloud, regardless of your business requirements, you will be amongst friends. As well as startups like A.I. Nemo and research centers like the Chinese Institute of Public & Environmental Affairs (IPE), international corporate giants such as Philips, SAP, KPMG, and Ford are entrusting their business applications to Alibaba Cloud.

With many regional discounts and special offers, is now the time to move to Alibaba Cloud?

References

  1. https://spectrum.ieee.org/computing/hardware/will-china-attain-exascale-supercomputing-in-2020

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store