Deploying Kubeflow Pipelines on Alibaba Cloud

By Bi Ran

Overview

Due to software development issues, machine learning projects are always complex and complicated. In addition, machine learning projects are data-driven. This creates other challenges, such as long workflows, inconsistent data versions, difficulties to trace experiments and recur experiment results, and high model iteration costs. To resolve these issues, many enterprises have built an internal machine learning platform to manage the machine learning lifecycle, such as the Google Tensorflow Extended platform, Facebook FBLearner Flow platform, and Uber Michelangelo platform.

What Is Kubeflow Pipelines

The Kubeflow Pipelines platform consists of the following components:

  • The workflow engine Argo for scheduling multi-step machine learning workflows.
  • An SDK for defining workflows. Currently, the SDK only supports Python.
  • Easy experiment management: makes it easy for you to try numerous ideas and techniques and manage your experiments. Kubeflow Pipelines also makes the transition from experiments to production much easier.
  • Easy re-use: enables you to re-use components and pipelines to quickly create end-to-end solutions without the need to rebuild experiments each time.

Deploy Kubeflow Pipelines on Alibaba Cloud

You may want to get started with Kubeflow Pipelines after learning all of its features. To use Kubeflow Pipelines, you must overcome the following challenges:

Prerequisites

opsys=linux  # or darwin, or windows
curl -s https://api.github.com/repos/kubernetes-sigs/kustomize/releases/latest |\
grep browser_download |\
grep $opsys |\
cut -d '"' -f 4 |\
xargs curl -O -L
mv kustomize_*_${opsys}_amd64 /usr/bin/kustomize
chmod u+x /usr/bin/kustomize

Procedure

1. Connect to the Kubernetes cluster through SSH. For more information, click here.

yum install -y git
git clone --recursive https://github.com/aliyunContainerService/kubeflow-aliyun
yum install -y openssl
domain="pipelines.kubeflow.org"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.key -out kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.crt -subj "/CN=$domain/O=$domain"
yum install -y httpd-tools
htpasswd -c kubeflow-aliyun/overlays/ack-auto-clouddisk/auth admin
New password:
Re-type new password:
Adding password for user admin
cd kubeflow-aliyun/
kustomize build overlays/ack-auto-clouddisk > /tmp/ack-auto-clouddisk.yaml
sed -i.bak 's/regionid: cn-beijing/regionid: cn-hangzhou/g' \
/tmp/ack-auto-clouddisk.yaml
sed -i.bak 's/zoneid: cn-beijing-e/zoneid: cn-hangzhou-g/g' \
/tmp/ack-auto-clouddisk.yaml
sed -i.bak 's/gcr.io/registry.aliyuncs.com/g' \
/tmp/ack-auto-clouddisk.yaml
sed -i.bak 's/storage: 100Gi/storage: 200Gi/g' \
/tmp/ack-auto-clouddisk.yaml
kubectl create --validate=true --dry-run=true -f /tmp/ack-auto-clouddisk.yaml
kubectl create -f /tmp/ack-auto-clouddisk.yaml
kubectl get ing -n kubeflow
NAME HOSTS ADDRESS PORTS AGE
ml-pipeline-ui * 112.124.193.271 80, 443 11m

FAQ

1. Why are Alibaba Cloud SSD cloud disks used in this example?

kubectl delete -f /tmp/ack-auto-clouddisk.yaml

Summary

This document introduces the background of Kubeflow Pipelines, the major issues that Kubeflow Pipelines resolves, and the procedure of using Kustomize to deploy Kubeflow Pipelines for machine learning on Alibaba Cloud. To learn more, see the document about how to use Kubeflow Pipelines to develop a machine learning workflow.

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com