GPU-Enabled Container Instances for Faster Performance

6 min readAug 26, 2019

By Xian Wei

Alibaba Cloud Container Service for Kubernetes (ACK), specifically Alibaba Cloud’s ACK Serverless service, has now added official support for GPU-enabled container instances. These instances, based on existing Alibaba Cloud Elastic Container Instances (ECI), allow for running AI-computing tasks on the cloud in a serverless mode more quickly. They also minimize the O&M burden of AI platform while also significantly improving overall computing efficiency.

It is now very much an industry consensus that AI-computing is synonymous with the computing power of GPUs. However, building a GPU-enabled cluster environment from scratch can be a difficult task. It involved the purchasing GPUs, preparing the relevant machines, installing drivers, as well as installing the relevant container environment.

The serverless delivery mode of GPU-enabled resources fully demonstrates the core advantages of going Serverless. It provides users with standardized and immediate resource supply capabilities so that you do not need to purchase machines or log on to the node to install the GPU Driver. This can in turn greatly reduce the deployment complexity of the AI platform, allowing you to focus on the development and maintenance of AI models and applications rather than the infrastructure involved.

More to it, this ultimately also makes using GPU/CPU resources as simple and convenient as turning on a faucet. With these applications, you can use resources using the pay-as-you-go billing method allowing you to avoid high costs, purchasing resources that you may not ultimately use.

So, let’s get a bit more into the nitty gritty. Creating a GPU-mounted pod in ACK Serverless is actually a relatively simple process. You just need to specify the type or specifications of the GPU you require for your services through annotation, and specify the number of GPUs in resource.limits. You can also specify the instance-type. Each pod occupies the GPU exclusively — in other words, know that vGPU is not supported for the time being. Charges that incur for GPU-enabled instances are the same as those for ECS GPU type instances. No additional charges are incurred. Currently, Alibaba Cloud ECI provides the following types of GPU:

Below is a screenshot of the console.

As you can see creating one of these instances is really quite simple.

Now in the reminder of this blog, we are going to take a quick look at an AI model that involves a simple image recognition algorithm. Through this example, you can learn how these new GPU-enabled instances can be powerful items in your tool set. They can quickly compute the deep learning tasks involved in these ACK Serverless services.

Use TensorFlow for an Image Recognition Algorithm

Recognizing what is shown in this image is a very easy task for you or me, but it is not necessarily an easy task for machines. To recognize the panda in the picture, a computer requires a large amount of data input for an appropriate algorithm model to be trained. These algorithms can be pretty intensive for a regular CPU instance. Let’s try to have the system recognize the panda in this image based on the existing TensorFlow model.

Here, we choose the example for getting started with TensorFlow.

The image registry-vpc.cn-hangzhou.aliyuncs.com/ack-serverless/tensorflow is built based on the TensorFlow official image tensorflow/tensorflow:1.13.1-gpu-py3, and the models repository required for the example has been downloaded from: https://github.com/tensorflow/models

In the Serverless cluster console, create the following yaml file based on the template, or use kubectl to deploy it. In the pod, set the GPU type to P4 and the number of GPUs to 1.

apiVersion: v1
kind: Pod
metadata:
  name: tensorflow
  annotations:
    k8s.aliyun.com/eci-gpu-type : "P4"
spec:
  containers:
  - image: registry-vpc.cn-hangzhou.aliyuncs.com/ack-serverless/tensorflow
    name: tensorflow
    command:
    - "sh"
    - "-c"
    - "python models/tutorials/image/imagenet/classify_image.py"
    resources:
      limits:
        nvidia.com/gpu: "1"
  restartPolicy: OnFailure

Create the pod and wait for the execution to complete. Check the pod log:

# kubectl get pod -a
NAME         READY     STATUS      RESTARTS   AGE
tensorflow   0/1       Completed   0          6m
# kubectl logs tensorflow
>> Downloading inception-2015-12-05.WARNING:tensorflow:From models/tutorials/image/imagenet/classify_image.py:141: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
2019-05-05 09:43:30.591730: W tensorflow/core/framework/op_def_util.cc:355] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
2019-05-05 09:43:30.806869: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-05 09:43:31.075142: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-05-05 09:43:31.075725: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4525ce0 executing computations on platform CUDA. Devices:
2019-05-05 09:43:31.075785: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P4, Compute Capability 6.1
2019-05-05 09:43:31.078667: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2494220000 Hz
2019-05-05 09:43:31.078953: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4ad0660 executing computations on platform Host. Devices:
2019-05-05 09:43:31.078980: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-05-05 09:43:31.079294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:00:08.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2019-05-05 09:43:31.079327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-05 09:43:31.081074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-05 09:43:31.081104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-05-05 09:43:31.081116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-05-05 09:43:31.081379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7116 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:08.0, compute capability: 6.1)
2019-05-05 09:43:32.200163: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
>> Downloading inception-2015-12-05.tgz 100.0%
Successfully downloaded inception-2015-12-05.tgz 88931400 bytes.
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296)
custard apple (score = 0.00147)
earthstar (score = 0.00117)

According to the pod log, the model has detected a panda in the picture. In the entire machine learning and computing process, we only run a pod. When the pod becomes terminated, the task is completed. We did not prepare the ECS environment, did not buy any GPUs, did not install Nvidia GPU drivers, did not install docker software, and the computing, like water and electricity, was used on demand.

Conclusion

The virtual nodes in ACK also implement the support for GPUs based on ECI in the same way as ACK Serverless, but the pod must be specified and scheduled to the virtual node, or create the pod in the namespace with the virtual-node-affinity-injection=enabled label. The virtual node-based method can support various deep learning frameworks flexibly, such as Kubeflow, Arena, or other custom CRDs.

Consder the following example:

apiVersion: v1
kind: Pod
metadata:
  name: tensorflow
  annotations:
    k8s.aliyun.com/eci-gpu-type : "P4"
spec:
  containers:
  - image: registry-vpc.cn-hangzhou.aliyuncs.com/ack-serverless/tensorflow
    name: tensorflow
    command:
    - "sh"
    - "-c"
    - "python models/tutorials/image/imagenet/classify_image.py"
    resources:
      limits:
        nvidia.com/gpu: "1"
  restartPolicy: OnFailure
  nodeName: virtual-kubelet

GPU-Enabled Container Instances for Faster Performance

Use TensorFlow for an Image Recognition Algorithm

Conclusion

Original Source

GPU-Enabled Container Instances for Faster Performance

Alibaba Container Service July 29, 2019 173 By Xian Wei Alibaba Cloud Container Service for Kubernetes (ACK)…

Written by Alibaba Cloud

No responses yet