How to Install NVIDIA GPU Cloud Virtual Machine Image on Alibaba Cloud

Prerequisites

  1. You have an Alibaba account — https://home-intl.console.aliyun.com/ with permissions to create resources.
  2. You have performed the following steps from the NGC website (see NGC Getting Started Guide)
  3. Signed up for an NGC account at https://ngc.nvidia.com/signup.
  4. Created an NGC API key for access to the NGC container registry.
  5. Browsed the NGC website and identified an available NGC container and tag to run on the VMI.
  6. If you plan to use the CLI or Terraform, then the Alibaba CLI must be installed, with at least the ECS SDK, and you must create SSH keys to use with Alibaba; see setup instructions below.
  7. Windows Users: The CLI code snippets are for bash on Linux or Mac OS X. If you are using Windows and want to use the snippets as-is, you can use the Windows Subsystem for Linux and use the bash shell (you will be in Ubuntu Linux).

Preliminary Setup

Setting Up SSH Keys

  1. From a browser, log in to the ECS console — https://ecs.console.aliyun.com/.
  2. Open the left navigation menu tab and then click Key Pairs from the Network & Security group.
  3. From the upper right of the screen, click Create Key Pair.
  4. Give it a name, such as “alibaba-key” and click OK. A .pem file will immediately download. This is the ONLY time you can download it.
  5. After downloading the .pem file, move it to the .ssh directory.
  • mv alibaba-key.pem ~/.ssh/ chmod 400 ~/.ssh/alibaba-key.pem
  1. On Windows, the location will depend on the SSH client you use, so modify the path above and in the snippets or your SSH client configuration. See the Alibaba documentation for Creating an SSH key pair.

Setting Up a Security Group

  1. Log in to the ECS console — https://ecs.console.aliyun.com/.
  2. Open the left navigation menu tab and then click Security Groups from the Network & Security group.
  3. From the upper right of the screen, click Create Security Group.
  4. Give it a name and description, then click OK.
  5. Immediately set the rules using the section Quickly Create Rules.
  6. Check SSH and HTTPS.
  7. At Custom Port Range, select TCP and then enter 5000/5000.
  8. Set Authorization Object = 0.0.0.0/0 or the IP address from which you will access.
  9. Click OK.

(Optional) Installing Alibaba CLI

  1. Install the ECS SDK.
  • sudo pip install aliyun-python-sdk-ecs
  1. Configure the CLI with your keys.
  • aliyuncli configure

Launching an NVIDIA GPU Cloud VM with the Alibaba Console

Creating the NVIDIA GPU Cloud VM Instance

  1. Log on the Alibaba Elastic Compute Service (ECS) console — https://marketplace.alibabacloud.com.
  2. Search for NVIDIA GPU, then select the NVIDIA GPU Cloud Virtual Machine Image.
  3. Click Choose your plan from the NVIDIA GPU Cloud Virtual Machine Image product page.
  1. Near the bottom of the page, click ECS Advanced Purchase page.
  1. Make the following selections from the Basic Configurations page.
  2. For Billing Method, select Pay-As-You-Go.
  1. For Region, select one that contains the image (e.g. “US-West 1 Zone B”), and then a region from the dropdown menu (for example, US.West 1 (Silicon Valley)). Note that not all regions support GPU instances.
  2. For Instance Type,click Heterogeneous Compute and then locate and select a GPU Compute Type gn5 instance according to your GPU and memory requirements.
  3. For Image, select Marketplace Image, and then make sure the NVIDIA GPU Cloud Virtual Machine Image is selected .
  4. For Storage, add a disk for dataset storage by clicking Add Disk under Data Disk, and then entering the storage size. Recommended minimum dataset storage size is 1 TB (1024 GB)
  5. Click Next: Networking.
  6. Make the following selections from the Networking page.
  7. For Security Group, click Select Security Group.
  8. Select your security group from the list, then click Select. Make sure you created a Security Group as explained in the section “Setting Up a Security Group”.
  1. Click Next: System Configurations.
  2. Make the following selections from the System Configurations page.
  3. Make sure Key Pair is selected for Log On Credentials.
  4. Click the Key Pair list arrow and then select your key pair.
  5. Click Preview, then review your configuration.
  6. Click the Terms of Service check box, indicating your acceptance, and then click Create Instance.
  1. At the Activated message, click Console.
  1. Wait until your instance status is Running, then you can connect to the instance using SSH.

Connecting to the VM Instance with SSH

ssh -i <KEYPATH> root@<IP>
ssh -i ~/.ssh/alibaba-key.pem root@47.89.248.188

Launching an NVIDIA GPU Cloud Virtual Machine Image Using Alibaba CLI

Using Example Python Scripts

Using the Instructions in This Chapter

Getting the NGC VMI Image ID

aliyuncli ecs DescribeImages --RegionId us-west-1 \  --ImageName "NVIDIA-GPU-Cloud-Virtual-Machine" \  --output json --filter Images.Image[0].ImageId

Creating Your VM Instance

Recommended Instance Options

  1. “ — InternetMaxBandwidthOut 10” sets the peak outbound network bandwidth to 10 Mbps. The valid range is [1, 200].
  2. “ — InstanceChargeType PostPaid” sets the billing method to pay-as-you-go. Change this to “PrePaid” to set it to a subscription billing.

Other Notable Create Instance Options

  1. The inbound network bandwidth defaults to 200 Mbps. Use “ — InternetMaxBandwidthIn” to change this. The valid range is [1, 200].
  2. To change the size of the system disk (default is 40 GB), use the “ — SystemDiskSize” option. Valid values are [40, 500].
  3. To add a data disk (up to 16), use the “ — DataDiskNSize” and “ — DataDiskNCategory” options where “N” is [1, 16]. Valid values are:

Launch Example

aliyuncli ecs CreateInstance \
--RegionId us-west-1 \
--ImageId "m-rj9iy0xjiod3ghkyhz4p" \
--SecurityGroupId "sg-rj94krsusal2k5l6gnnz" \
--InstanceType ecs.gn5-c4g1.xlarge \
--InstanceName "my-instance" \
--InternetMaxBandwidthOut 10 \
--InstanceChargeType PostPaid \
--KeyPairName alibaba-key
{
"InstanceId": "i-rj9a0iw25hryafj0fm4v",
"RequestId": "440ECC70-09F9-492C-AB9E-21AA9C4E0531"
}

Assigning a Public IP Address

aliyuncli ecs AllocatePublicIpAddress --RegionId us-west-1 \
--InstanceId "i-rj9a0iw25hryafj0fm4v"
{
"IpAddress": "47.89.248.188",
"RequestId": "65EB59AE-FA75-446F-B5C7-2BA0F9A77CDC"
}

Starting the Instance

aliyuncli ecs StartInstance --InstanceId "i-rj9a0iw25hryafj0fm4v"

Connecting to the VM Instance with SSH

ssh -i <KEYPATH> root@<IP>
ssh -i ~/.ssh/alibaba-key.pem root@47.89.248.188

Starting, Stopping, or Deleting Your VM Instance

aliyuncli ecs StopInstance --InstanceId INSTANCE_ID
aliyuncli ecs StartInstance --InstanceId INSTANCE_ID
aliyuncli ecs DeleteInstance --InstanceId INSTANCE_ID

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Asgard: A case study to envision data infrastructure automation at GO-JEK

Spawning with Coroutines

What is the back end?

Building Maven Project with Jenkins Pipeline (Jenkins docker image)

Understanding Basic OOP

Cs Go Major Tournaments

Tournaments

AWS, Azure, GCP: Resource Hierarchies

What’s New with Mars — Alibaba’s Distributed Scientific Computing Engine

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

Personality Prediction System End to End Deployment on Docker

Train Yolo-V5 using Distributed Data Parallel on Multi Node Multi-GPU on AWS in 5 minutes

How to set up microk8s and Charmed Kubeflow on Ubuntu/OCI

An adventure with the asynchronous multi-threaded time-series data processing in Python over…