Alibaba Cloud Machine Learning Platform for AI: Offline Scheduling Instructions

By Garvin Li

This article implements an ad click-through rate (CTR) prediction scenario. Ad CTR prediction is a typical application in the advertising industry. By using history data to train the prediction model, this prediction method predicts daily increment data, and finds and advertises samples that meet the ad CTR standard.

The whole experiment uses Alibaba Cloud Machine Learning to perform data mining and uses DataWorks to perform scheduling and pushing. Here is the specific business scenario:

  1. Use historical data to perform model training on the Alibaba Cloud machine learning platform.
  2. Use DataWorks to perform scheduling for the model.
  3. Perform CTR prediction on ads at midnight every day to find and push ads that meet the standards.

Dataset Introduction

The detailed fields are as follows:

Image for post
Image for post

Because data shown in the following screenshot is randomly generated by using the random algorithm, this experiment doesn’t evaluate results, and mainly describes the experiment establishment and the use and scheduling of DataWorks. History data of 20160919 and 20160920 is used to predict 20160921 data. The MaxCompute partition table is used.

Image for post
Image for post

Experiment Procedure

The following diagram shows the experiment process.

Image for post
Image for post

The experiment can be roughly divided into four modules: data source importing (ads), data pre-processing (normalization), model training (binary logistic regression), and predicting (prediction).

1. Importing Data Source

  1. “ad-2” is the data source for training.
  2. “ad-1” is the data source for predicting.
  3. In the partition table, configure partition to dt=@@{yyyyMMdd} to ensure prediction data is the daily incremental data, as shown in the following screenshot. (For more information on using partitions, please see https://help.aliyun.com/document_detail/30281.html?spm=5176.doc30276.6.126.3kX7OU)

2. Intermediate Processing

The intermediate process includes two steps: data normalization and model training. Model training is to use history data to train the generated prediction model. (For more principle details, please see Heart disease prediction case)

3. Data Prediction

The list of prediction results is “ad_result-1”, as shown below.

Image for post
Image for post
  1. prediction_result: Indicates whether an ad is clicked. 1 indicates that an ad has been clicked, and 0 indicates that an ad has not been clicked.
  2. prediction_score: Indicates the probability of being clicked.

Module Scheduling

1. Go to the Workspace of DataWorks

Go to the homepage of the console, click DataWorks to access the Data IDE workspace.

Image for post
Image for post

DataWorks and the machine learning platform share the same set of projects. Select the project where the experiment to be scheduled for is located, and click Start Data Modeling.

Image for post
Image for post

2. Create a New Node Scheduling Task

Click New and select New Task

Image for post
Image for post

In the configuration section of the created task, select Node Task for Task Type and Machine Learning for Type.

Image for post
Image for post

3. Configure the Scheduling Task

After the node task has been created, select the machine learning task to be scheduled for and select scheduling time in the configuration bar on the right side. In this experiment, we choose to perform training and push information at 00:00 each day.

Image for post
Image for post

Click Submit. Submitted jobs will be effective next day.

Image for post
Image for post

4. Query Task Logs

After the scheduling task has been submitted, click Maintain to view logs

Image for post
Image for post

To learn more about Alibaba Cloud Machine Learning Platform for Artificial Intelligence (PAI), visit www.alibabacloud.com/product/machine-learning

Reference:https://www.alibabacloud.com/blog/alibaba-cloud-machine-learning-platform-for-ai-offline-scheduling-instructions_594399?spm=a2c41.12531986.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store