Experience RAPIDS Data Science Acceleration in Alibaba Cloud Container Service

Image for post
Image for post

By Biran


Deep Learning is the first thing that strikes anyone while mentioning the NVIDIA GPU. Traditional machine learning and data analysis methods rarely use GPUs. In fact, NVIDIA has an excellent project called RAPIDS, which is a GPU-accelerated library launched by NVIDIA for data science and machine learning. For more information about RAPIDS, visit the official RAPIDS website. This project aims to bring GPU acceleration to traditional algorithms and provides the same operations and user experience as Pandas and scikit-learn. RAPIDS has three modules: cuDF is equivalent to Pandas, cuML is equivalent to scikit-learn, and cuGraph is used to process graph data. Given its sound compatibility, combine RAPIDS with the deep learning framework, use cuDF to accelerate data processing through a GPU, and then use the TensorFlow and PyTorch deep learning model frameworks to perform tasks.

Image for post
Image for post

This article shows how to use TensorFlow and RAPIDS to perform search by image on Alibaba Cloud Container Service and use ECI to apply for GPU resources. GPU resources are prepared in seconds and released when no longer needed. This implies that there is no need to prepare GPU instances in advance. Additionally, there is no need to deal with the Kubernetes infrastructure. Run Arena commands to build and run a RAPIDS environment that contains GPUs and manage the GPU infrastructure.


Step 1: Prepare the Cluster

In case, a Kubernetes cluster for container service is already created, select a managed Kubernetes cluster.

As the system component container needs to be run, the node must contain at least one worker node.

1) For more information on installing virtual nodes, see Virtual Nodes Documentation.

2) Configure virtual-kubelet-autoscaler. If the GPU resources in the cluster are insufficient, use virtual-kubelet-autoscaler to remove ECI container groups that are using the GPU.

Step 2: Run Arena to Create the RAPIDS Service

$ wget http://kubeflow.oss-cn-beijing.aliyuncs.com/arena-installer-0.3.0-b556a36-linux-amd64.tar.gz
$ tar -xvf arena*.tar.gz
$ cd arena-installer
$ ./install.sh

2) Now, run an Arena command to view the cluster’s GPU resources. As the following snippet shows, this user’s cluster has one real node and does not contain GPU resources. In addition, a virtual node is present, which does not physically exist and therefore is not billed. This node provides unlimited GPU resources, which can be scaled.

$ arena top node
arena top node
cn-shanghai. <none> ready 0 0
virtual-kubelet agent ready 1000 0
Allocated/Total GPUs In Cluster:
0/1000 (0%)

3) Before submitting the RAPIDS task, complete the necessary preparation to accelerate the creation process and simplify access operations.

3.1 Set the method to access LoadBalancer. Note that this method is used only for the sake of simplicity. We recommend that you disallow external IP addresses to access the production environment.

$ find /charts/ -name "*.yaml" | xargs sed -i "s/NodePort/LoadBalancer/g"

3.2 Next, accelerate the startup speed using the step below.

3.2.1 The GPU container image is normally very large in size. For example, the RAPIDS container image used in this experiment has a capacity of 14.7 GB. Generally, the startup time is about 10 minutes. However, the image caching function may reduce this time to 20 seconds.

docker images | grep rapids
registry.cn-shanghai.aliyuncs.com/tensorflow-samples/rapids-samples 0.8.2-cuda10.0-runtime-ubuntu16.04 4597a0334d41 12 days ago 14.7GB

3.2.2 In serverless Kubernetes, just create an ImageCache CRD to directly use the image caching function.

$ cat > imagecache.yaml << EOF
apiVersion: eci.alibabacloud.com/v1
kind: ImageCache
name: imagecache-rapids
- registry.cn-shanghai.aliyuncs.com/tensorflow-samples/rapids-samples:0.8.2-cuda10.0-runtime-ubuntu16.04
$ kubectl create -f imagecache.yaml

3.2.3 Wait a moment after submission. Check the status of ImageCache. The CACHEID may use the snapshot-id specified earlier while submitting a previous task.

$ kubectl get imagecache
imagecache-rapids 3d9h imc-uf6dxdji7txxxxx Ready 100%

4) Submit the RAPIDS development environment as shown below.

$ arena serve custom \
--name=rapids \
--selector=type=virtual-kubelet \
--toleration=all \
--annotation=k8s.aliyun.com/eci-image-snapshot-id=imc-uf6dxdji7txxxxx \
--annotation=k8s.aliyun.com/eci-instance-type=ecs.gn5i-c8g1.2xlarge \
--gpus=1 \
-e=PASSWORD=mypassw0rd \
--restful-port=80 \
configmap/rapids-201912011815-custom-serving created
configmap/rapids-201912011815-custom-serving labeled
service/rapids-201912011815 created
deployment.extensions/rapids-201912011815-custom-serving created

Let’s take a quick look at the commnds used in the preceding code snippet.

  • --selector=type=virtual-kubelet: It indicates that the pod is started by using a virtual node.
  • --annotation=k8s.aliyun.com/eci-instance-type=ecs.gn5i-c8g1.2xlarge: It specifies the ECI container group type.
  • ecs.gn5i-c8g1.2xlarge: It refers to the Alibaba Cloud P4 model. For more information about the detailed specifications, see the relevant document.
  • --annotation=k8s.aliyun.com/eci-image-snapshot-id=imc-uf6dxdji7txxxxx: It specifies the CACHEID in Step 3.2.3.
  • -e=PASSWORD=mypassw0rd: It indicates that RAPIDS notebook is accessed by setting the PASSWORD environment variable.
  • --gpus=1: It indicates the number of GPUs applied for.

5) Now, view the access address, which is a combination of ENDPOINT_ADDRESS and PORTS. In this example, its value is At the same time, see that this task switches to the Running state in 32 seconds.

$ arena serve list
rapids CUSTOM 201911181827 1 1 restful:80
$ arena serve get rapids
arena serve get rapids
NAME: rapids
NAMESPACE: default
VERSION: 201912011815
ENDPOINT PORTS: restful:80
AGE: 32s
rapids-201912011815-custom-serving-6b54d5cd-swcwz Running 32s 1/1 0 N/A

6) Check the GPU usage of the cluster again. Note that the GPU resources are already being used.

$ arena top node
cn-shanghai. <none> ready 0 0
virtual-kubelet agent ready 1000 1
Allocated/Total GPUs In Cluster:
1/1000 (0%)

7) To query the pods that use this GPU, append “-d” to the original command to view specific pod names.

$ arena top node -d
NAME: cn-shanghai.
ROLE: <none>
Total GPUs In Node cn-shanghai. 0
Allocated GPUs In Node cn-shanghai. 0 (0%)
NAME: virtual-kubelet
ROLE: agent
default rapids-201912011815-custom-serving-6b54d5cd-swcwz 1
Total GPUs In Node virtual-kubelet: 1000
Allocated GPUs In Node virtual-kubelet: 1 (0%)
Allocated/Total GPUs In Cluster: 1/1000 (0%)

8) Use the access address and port from Step 4 to visit the address in the local browser. Enter http://{ENDPOINT ADDRESS}:{ENDPOINT PORT}. In this example, the address is

Note: We recommend using the Chrome browser.

9) Enter the logon password set in the preceding command and then click on the Log in button. In this example, the password is mypassw0rd.

Image for post
Image for post

Step 3: Perform the Image Search Demo

Note: Click once to run one cell. Click until the demo is fully executed. For more detailed instructions, refer to the following section on Demo execution process.

Image for post
Image for post

Demo Execution Process

1) Process the Dataset

1.1) Download and Decompress the Dataset

In this demo, the STL-10 dataset contains 100,000 unlabeled images with the dimensions of 96 x 96 x 3. Use other datasets to extract image features, but make sure that these datasets use images of the same size.

Use the download_and_extract(data_dir) method to download and decompress the STL-10 dataset. In the RAPIDS image, the dataset is downloaded to the ./data directory. Use the download_and_extract() method to decompress the dataset.

Image for post
Image for post

1.2) Read the Images.

Data decompressed from the dataset is binary. Use the read_all_images(path_to_data) method to load and convert the data to the NHWC format (batch, height, width, and channels). This format allows Tensorlow to extract image features.

Image for post
Image for post

1.3) Display an Image.

Use the show_image(image) method to display a random image from the dataset.

Image for post
Image for post

1.4) Split the Dataset.

Split the dataset into two parts at a 9:1 ratio. One part is used to create image index libraries, and the other is used to search for images.

Image for post
Image for post

2) Extract Image Features

Use TensorFlow and Keras to extract image features. Use the pre-trained model ResNet50 (notop), which is based on the ImageNet dataset.

2.1) Set TensorFlow Parameters.

By default, TensorFlow uses all GPU memory capacities. Reserve some GPU memory capacities for cuML. Use the following methods to set the GPU memory parameters:

  • Method 1: Allocate the memory according to operational requirements.
config.gpu_options.allow_growth = True
  • Method 2: Set a ratio to determine the amount of memory that is used by TensorFlow.

This demo uses method 2 and sets the ratio to 0.3. This indicates that TensorFlow may use 30% of the GPU memory. Adjust the ratio as needed.

config.gpu_options.per_process_gpu_memory_fraction = 0.3
Image for post
Image for post

2.2 Download the pre-trained model ResNet50 (notop). Connect TensorFlow to the public network and download the model. The size of the model is about 91 MB. The model will be downloaded to the /root/.keras/models/ directory.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Now, run the model.summary() method to view the network structure of your model.

Image for post
Image for post

2.3 Extract Image Features

Call the model.predict() method on the split datasets to extract image features.

Image for post
Image for post

3) Search for Similar Images

3.1 Use cuML KNN to Search for Similar Images.

Set K to 3 (k=3) to search for the three most similar images. Adjust the value of K as needed. Use the knn_cuml.fit() method while creating indices. Use the knn_cuml.kneighbors() method while searching for neighbors.

Image for post
Image for post

It takes 791 milliseconds for KNN to retrieve vectors.

Use scikit-learn KNN to search for similar images. Set K to 3 (n_neighbors=3), and specify n_jobs=-1 to use all the CPUs to search for the nearest neighbors.

Note: The ecs.gn5i-c8g1.2xlarge model is configured with 8 vCPUs.

Image for post
Image for post

It takes 7 minutes and 34 seconds for KNN to retrieve vectors.

Compare the search results of cuML KNN and scikit-learn KNN. Compare the vector retrieval speeds of cuML KNN and scikit-learn KNN. GPU-accelerated cuML KNN takes only 791 milliseconds, whereas scikit-learn KNN that uses CPUs takes 7 minutes and 34 seconds. The retrieval speed of cuML KNN is nearly 600 times faster than that of scikit-learn KNN.

Check whether the search results of cuML KNN and scikit-learn KNN are the same. Compare the following output arrays:

  • Distance: The K smallest distance values. K indicates the number of distance values. This demo searches 10,000 images. The value of K is 3. Therefore, distance.shape=(10000,3).
  • Indices: The corresponding image indices. indices.shape=(10000, 3).

The dataset used in this demo contains identical images. Identical images may have different indices. Therefore, you must use distances instead of indices to compare the results. A calculation deviation may exist. If the deviation of the three smallest distance values for the 10,000 images is smaller than 1 in both methods, the results are considered to be identical.

Image for post
Image for post

Image Search Results

The first column displays the original five images. The second, third, and fourth columns display similar images. From the second column to the fourth column, the similarity between the original and similar images decreases. The title of each similar image is the calculated distance. A larger value indicates a lower similarity.

Image for post
Image for post

Step 4: Clean Up

$ arena serve delete rapids
service "rapids-201912011815" deleted
deployment.extensions "rapids-201912011815-custom-serving" deleted
configmap "rapids-201912011815-custom-serving" deleted
INFO[0000] The Serving job rapids with version 201912011815 has been deleted successfully


Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store