Build a Personalized Recommendation System on Alibaba Cloud in Three Steps

Image for post
Image for post

Step up the digitalization of your business with Alibaba Cloud 2020 Double 11 Big Sale! Get new user coupons and explore over 16 free trials, 30+ bestselling products, and 6+ solutions for all your needs!

By Hanchao

Background

In this internet era, personalized recommendations are common. Often, people are directed to links such as “You May Also Like” and “Related Products”. Personalized recommendation algorithms and user-generated data are used to record and predict users’ interests in real-time and provide recommendations based on their interests. Those algorithms have been introduced in a number of successful mobile apps such as news clients (ByteDance’s Toutiao, NetEase News, and Alibaba’s UC News) and e-commerce shopping clients (Pinduoduo, Taobao, and Tmall). In Alibaba Cloud, vector analysis provided by AnalyticDB for PostgreSQL helps implement personalized recommendation systems.

Overview

For example, a user reads a piece of news. A personalized news recommendation system uses the natural language processing (NLP) algorithm to extract keywords from the news title and body. Then the system uses the feature_extractor function in AnalyticDB for PostgreSQL to convert the keywords into news feature vectors and import the vectors to the vector library of AnalyticDB for PostgreSQL for news recommendations. Figure 1 shows the implementation process.

Image for post
Image for post
Figure 1. The overall framework of recommendation algorithms

1) Use the vector library in AnalyticDB for PostgreSQL to obtain user feature vectors: The system analyzes historical browsing data of users, constructs user profiles, and builds user preference models to obtain user feature vectors. Specifically, the system obtains details about the news read by users from their browsing logs and extracts keywords from each piece of the news to establish user profiles. If a user reads several pieces of National Basketball Association (NBA) playoff news, which contain keywords such as NBA, basketball, superstar, and sports, it indicates that the user is an NBA fan. The system uses the feature_extractor function in AnalyticDB for PostgreSQL to convert these keywords to vectors and import the vectors into the vector library of AnalyticDB for PostgreSQL to obtain the feature vector of the user.
2) Push news based on the vector library and the logistic regression prediction model of AnalyticDB for PostgreSQL: The system uses the vector library of AnalyticDB for PostgreSQL to retrieve the first 500 pieces of unread news which may interest the user. Then, it extracts the publish time and click-through rates (CTRs) of the 500 pieces of news and pushes news to the user based on the logistic regression prediction model. This model is obtained based on the browsing records of the user.

The feature_extractor function in AnalyticDB for PostgreSQL uses the Bidirectional Encoder Representations from Transformers (BERT) model. This model is trained based on a large number of corpora. It contains semantic information and has a higher query precision than the term frequency-inverse document frequency (TFIDF) algorithm.

Schema Design

Figure 2 shows the schema of tables in AnalyticDB for PostgreSQL in the personalized news recommendation system. Note that the system contains three tables: news, person, and browses_history.

Image for post
Image for post
Figure 2. Schema of tables in AnalyticDB for PostgreSQL in a personalized recommendation system

Let’s look at these three tables.

The news table stores information about news, including the news ID (news_id), publication time (create_time), title (title), body (content), total number of clicks (click_times), and number of clicks within two hours (two_hour_click_times). The personalized news recommendation system extracts keywords from the news title and the body and converts the keywords into a vector (news_vector). When data is inserted into the news table, the system automatically converts keywords into vectors and inserts the vectors and other news information into the news table.

The browses_history table stores information about news read by users, including the news ID (news_id), user ID (person_id), and reading time of the news (browse_time).

The person table stores user information, including the user ID (person_id), age (age), and star class (star).

Implementing a Personalized Recommendation System

1) Extract News Feature Vectors

The personalized news recommendation system uses the feature_extractor function in AnalyticDB for PostgreSQL to extract news feature vectors and import the vectors to the news table. For example, execute the SELECT statement to obtain the feature vector corresponding to the text “ADB For PG is very good!”.

The following figure shows a piece of news. The personalized news recommendation system stores the news information to the news table in two steps:

Image for post
Image for post

1) Extract news keywords. AnalyticDB for PostgreSQL does not support the extract_tags function. Instead, call the (jieba.analyse.extract_tags(title + content, 3)) function in the jieba NLP system to extract keywords.
2) Execute the INSERT statement to store news information, including the keywords and news feature vectors, to the news table.

2) Extract User Feature Vectors

2.1) Extract User Browsing Keywords

Based on the news browsing logs of the users, their browsing keywords can be obtained easily. For example, by executing the SELECT statement, the return result shows that the value of person_id is 9527.

2.2) Convert Browsing Keywords to User Feature Vectors.

The system extracts all browsing keywords of users in the same way. For example, the user whose person_id is 9527 reads the news with the following keywords: NBA sports, finals, Miami Heat, and Houston Rockets. The system then uses the feature_extractor function to convert the keywords to vectors.

3) Provide News Recommendations Based on User Feature Vectors

The system uses the user feature vectors to search for relevant news information in the news table. For example, execute the SELECT statement to obtain the first 500 pieces of news that may interest the user. Meanwhile, the system filters out articles the user has read. After the news recommendations are obtained, the application pushes them to the user.

The parameters are described as follows:

  • ann_distance: indicates the degree of correlation between the user and the news.
  • create_time: indicates the time when the news was published.
  • click_times/(now()-create_time): indicates the total CTR of the news.
  • two_hour_click_times/(now()-create_time): indicates the CTR of the news within the last two hours.
  • w1, w2, w3, and w4: indicate the weights of all attributes during the learning of the logistic regression model.

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store