Recommendation System: Matching Algorithms and Architecture

Image for post
Image for post

Are you an AI enthusiast with a keen eye on innovation? Sign up for the Alibaba Cloud Global AI Innovation Challenge and win big! Sign Up Here >>

By GarvinLi

In this article, Alibaba technical expert Aohai introduces the matching algorithms and architecture, specifically, the introduction of matching module in a recommender system, matching algorithms, collaborative filtering, and vector matching architecture.

1) The Matching Module in a Recommender System

In the first article of the series for building an enterprise-level recommender system, we have introduced the recommender system architecture, its modules, and the application of cloud services in each module. In this article, we will focus on the matching algorithms in a recommender system and how you can build a matching architecture. First, let’s review the matching module in a recommender system. The matching module is used for preliminary filtering. When user A visits a platform, the matching module filters out items that user A may like from a huge number of items. For example, the platform has 100,000 items, and the matching module filters out 500 items that user A may like. Then, the ranking module ranks these items based on user A’s preferences.

Image for post
Image for post

2) Matching Algorithms in Recommendation Scenarios

This section describes the algorithms that the matching module may use. The following figure shows four popular algorithms. The rightmost is the collaborative filtering algorithm, and the left three algorithms are related to vector matching. Collaborative filtering is similar to statistics-based algorithms. It finds out users with the same interests or items that are purchased at the same time. For example, we find out that beers and diapers in supermarkets are always purchased together based on a large amount of data statistics. Vector matching algorithms are deep-level models based on machine learning. For example, Alternating Least Squares (ALS) is a typical matrix factorization method. It can generate a user embedding table and item embedding table based on behavior data tables. This is a basic method for vector matching. Factorization Machine (FM) has the similar logic and uses the inner product method to enhance feature representation. I would like to introduce the GraphSage algorithm. It is actually a matching algorithm for graph neural networks (GNNs). Currently, the algorithm is not widely used in the Internet field. However, some large Internet companies, such as Taobao, use it frequently in recommendation scenarios. GraphSage is a graph algorithm built based on a deep learning framework. It can generate the user embeddings and item embeddings based on features and behavior of users and items. The GraphSage algorithm is often used in e-commerce matching scenarios.

Image for post
Image for post

3) Collaborative Filtering

This section describes the collaborative filtering algorithm, which is easy to understand. For example, the following figure shows the preferences of users A, B, and C. Users A and C have similar tastes. Specifically, both users A and C like rice and milk. In addition, user A likes lamb but user C does not like it. We assume that user C also likes lamb and regard lamb as a matching result of user C. This is standard collaborative filtering based on data statistics. This figure can help you understand how collaborative filtering works.

Image for post
Image for post

4) Vector Matching Architecture

This section describes how you can use the preceding three vector matching algorithms. The input data of these algorithms are user IDs, item IDs, and behavior data. The following figure shows a user behavior data table. After you access the table, you can use a vector matching algorithm to obtain two vector tables. These vector tables contain key-value pairs. Each user ID corresponds to a vector, and the key-value pairs can be cached in Redis. In actual use, you need to store the data to the Faiss server. Faiss is an open-source engine developed by the Facebook AI team for vector retrieval. It provides multiple vector retrieval modes and can return the result for retrieving millions of vectors in one millisecond. It has an excellent performance and is usually used in the recommendation matching field. For example, when we want to recommend an item to a user, we use the user ID and its vector to check which item vectors in the Faiss engine have the closest Euclidean distance to the user’s vector. For example, we use the top 10 item vectors as the matching result of the user. That is the overall vector matching architecture, in which both Redis and Faiss are used.

Image for post
Image for post

Learn more about Alibaba Cloud Machine Learning Platform for AI (PAI) at https://www.alibabacloud.com/product/machine-learning

The views expressed herein are for reference only and don’t necessarily represent the official views of Alibaba Cloud.

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store