How to Install NVIDIA GPU Cloud Virtual Machine Image on Alibaba Cloud

By NVIDIA

NVIDIA makes available on the Alibaba Cloud platform a customized image optimized for the NVIDIA Pascal? and Volta? -based Tesla GPUs. Running NGC containers on this virtual machine (VM) instance provides optimum performance for deep learning jobs.

For those familiar with the Alibaba platform, the process of launching the instance is as simple as logging into Alibaba, selecting the “NVIDIA GPU Cloud Machine Image” and one of the supported NVIDIA GPU instance types, configuring settings as needed, then launching the VM. After launching the VM, you can SSH into it and start running deep learning jobs using framework containers from the NGC container registry.

This article provides step-by-step instructions for accomplishing this.

Prerequisites

These instructions assume the following:

  1. You have an Alibaba account — https://home-intl.console.aliyun.com/ with permissions to create resources.

Preliminary Setup

Perform these preliminary setup tasks to simplify the process of launching the NVIDIA GPU Cloud machine image.

Setting Up SSH Keys

If you do not already have SSH keys set up specifically for Alibaba, you will need to set one up and have it on the machine you will use to SSH to the VM. In the examples, the key is named “alibaba-key”.

  1. From a browser, log in to the ECS console — https://ecs.console.aliyun.com/.
  • mv alibaba-key.pem ~/.ssh/ chmod 400 ~/.ssh/alibaba-key.pem
  1. On Windows, the location will depend on the SSH client you use, so modify the path above and in the snippets or your SSH client configuration. See the Alibaba documentation for Creating an SSH key pair.

Setting Up a Security Group

In order to create instances, you need to put them in a Security Group.

  1. Log in to the ECS console — https://ecs.console.aliyun.com/.
Image for post
Image for post

(Optional) Installing Alibaba CLI

To use the Alibaba CLI, follow the Alibaba CLI Install Instructions and also install the ECS SDK.

  1. Install the ECS SDK.
  • sudo pip install aliyun-python-sdk-ecs
  1. Configure the CLI with your keys.
  • aliyuncli configure

Launching an NVIDIA GPU Cloud VM with the Alibaba Console

Creating the NVIDIA GPU Cloud VM Instance

  1. Log on the Alibaba Elastic Compute Service (ECS) console — https://marketplace.alibabacloud.com.
Image for post
Image for post
  1. Near the bottom of the page, click ECS Advanced Purchase page.
Image for post
Image for post
  1. Make the following selections from the Basic Configurations page.
Image for post
Image for post
  1. For Region, select one that contains the image (e.g. “US-West 1 Zone B”), and then a region from the dropdown menu (for example, US.West 1 (Silicon Valley)). Note that not all regions support GPU instances.
Image for post
Image for post
  1. Click Next: System Configurations.
Image for post
Image for post
  1. At the Activated message, click Console.
Image for post
Image for post
  1. Wait until your instance status is Running, then you can connect to the instance using SSH.
Image for post
Image for post
Image for post
Image for post

Connecting to the VM Instance with SSH

Once started, you can SSH into your instance using the SSH key for the root user. If you followed the setup in this tutorial, your key is in ~/.ssh/.

Command syntax:

ssh -i <KEYPATH> root@<IP>

Example:

ssh -i ~/.ssh/alibaba-key.pem root@47.89.248.188

Refer to Connect to a Linux Instance for more instructions on connecting to your instance.

Launching an NVIDIA GPU Cloud Virtual Machine Image Using Alibaba CLI

Using Example Python Scripts

A comprehensive set of example Python scripts for automating the CLI are provided at https://github.com/nvidia/ngc-examples/tree/master/ncsp. You can download the scripts and modify them to meet your requirements. The code examples that follow use similar environment variables and structure as the scripts.

Using the Instructions in This Chapter

This flow and the code snippets in this section are for Linux or Mac OS X. If you are using Windows, you can use the Windows Subsystem for Linux and use the bash shell (where you will be in Ubuntu Linux).

Many of these CLI command can have significant delays.

For complete CLI documentation and sample scripts visit the Alibaba Documentation Center.

Getting the NGC VMI Image ID

Once started, you can SSH into your instance using the SSH key for the root user. If you followed the setup in this tutorial, your key is in ~/.ssh/.

You need to specify a source ImageID when creating an instance. Use this command to find the latest ImageID of the NVIDIA-GPU-Cloud-Machine-Image:

aliyuncli ecs DescribeImages --RegionId us-west-1 \  --ImageName "NVIDIA-GPU-Cloud-Virtual-Machine" \  --output json --filter Images.Image[0].ImageId

It will output the Image ID such as “m-rj9iy0xjiod3ghkyhz4p”

Creating Your VM Instance

Creating an instance with the CLI is done using the aliyuncli ecs CreateInstancecommand.

Full syntax documentation — https://www.alibabacloud.com/help/doc-detail/25499.htm

Recommended Instance Options

  1. “ — InternetMaxBandwidthOut 10” sets the peak outbound network bandwidth to 10 Mbps. The valid range is [1, 200].

Other Notable Create Instance Options

  1. The inbound network bandwidth defaults to 200 Mbps. Use “ — InternetMaxBandwidthIn” to change this. The valid range is [1, 200].
Image for post
Image for post

Launch Example

Launch the instance and capture the resulting JSON:

aliyuncli ecs CreateInstance \
--RegionId us-west-1 \
--ImageId "m-rj9iy0xjiod3ghkyhz4p" \
--SecurityGroupId "sg-rj94krsusal2k5l6gnnz" \
--InstanceType ecs.gn5-c4g1.xlarge \
--InstanceName "my-instance" \
--InternetMaxBandwidthOut 10 \
--InstanceChargeType PostPaid \
--KeyPairName alibaba-key

The output shows the instance ID.

{
"InstanceId": "i-rj9a0iw25hryafj0fm4v",
"RequestId": "440ECC70-09F9-492C-AB9E-21AA9C4E0531"
}

Assigning a Public IP Address

Instances created via CLI are not automatically given a public IP address.

To assign a public IP address to the instance you just created, run:

aliyuncli ecs AllocatePublicIpAddress --RegionId us-west-1 \
--InstanceId "i-rj9a0iw25hryafj0fm4v"

Successful completion of the command will return the IP address:

{
"IpAddress": "47.89.248.188",
"RequestId": "65EB59AE-FA75-446F-B5C7-2BA0F9A77CDC"
}

Starting the Instance

Instances created via CLI are not started automatically.

To start the instance you just created, run:

aliyuncli ecs StartInstance --InstanceId "i-rj9a0iw25hryafj0fm4v"

Connecting to the VM Instance with SSH

Once started, you can SSH into your instance using the SSH key for the root user. If you followed the setup in this tutorial, your key is in ~/.ssh/.

Command syntax:

ssh -i <KEYPATH> root@<IP>

Example:

ssh -i ~/.ssh/alibaba-key.pem root@47.89.248.188

Refer to Connect to a Linux Instance for more instructions on connecting to your instance.

Starting, Stopping, or Deleting Your VM Instance

Once an instance is running, you can stop, (re)start, or delete your instance.

Stop:

aliyuncli ecs StopInstance --InstanceId INSTANCE_ID

Start or Restart:

aliyuncli ecs StartInstance --InstanceId INSTANCE_ID

Delete:

aliyuncli ecs DeleteInstance --InstanceId INSTANCE_ID

Source: https://docs.nvidia.com/ngc/ngc-alibaba-setup-guide/index.html

Reference:https://www.alibabacloud.com/blog/how-to-install-nvidia-gpu-cloud-virtual-machine-image-on-alibaba-cloud_594361?spm=a2c41.12499420.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store