In this article, Alibaba technical expert Aohai introduces the matching algorithms and architecture, specifically, the introduction of matching module in a recommender system, matching algorithms, collaborative filtering, and vector matching architecture.
1) The Matching Module in a Recommender System
In the first article of the series for building an enterprise-level recommender system, we have introduced the recommender system architecture, its modules, and the application of cloud services in each module. In this article, we will focus on the matching algorithms in a recommender system and how you can build a matching architecture. First, let’s review the matching module in a recommender system. The matching module is used for preliminary filtering. When user A visits a platform, the matching module filters out items that user A may like from a huge number of items. For example, the platform has 100,000 items, and the matching module filters out 500 items that user A may like. Then, the ranking module ranks these items based on user A’s preferences.
2) Matching Algorithms in Recommendation Scenarios
This section describes the algorithms that the matching module may use. The following figure shows four popular algorithms. The rightmost is the collaborative filtering algorithm, and the left three algorithms are related to vector matching. Collaborative filtering is similar to statistics-based algorithms. It finds out users with the same interests or items that are purchased at the same time. For example, we find out that beers and diapers in supermarkets are always purchased together based on a large amount of data statistics. Vector matching algorithms are deep-level models based on machine learning. For example, Alternating Least Squares (ALS) is a typical matrix factorization method. It can generate a user embedding table and item embedding table based on behavior data tables. This is a basic method for vector matching. Factorization Machine (FM) has the similar logic and uses the inner product method to enhance feature representation. I would like to introduce the GraphSage algorithm. It is actually a matching algorithm for graph neural networks (GNNs). Currently, the algorithm is not widely used in the Internet field. However, some large Internet companies, such as Taobao, use it frequently in recommendation scenarios. GraphSage is a graph algorithm built based on a deep learning framework. It can generate the user embeddings and item embeddings based on features and behavior of users and items. The GraphSage algorithm is often used in e-commerce matching scenarios.
3) Collaborative Filtering
This section describes the collaborative filtering algorithm, which is easy to understand. For example, the following figure shows the preferences of users A, B, and C. Users A and C have similar tastes. Specifically, both users A and C like rice and milk. In addition, user A likes lamb but user C does not like it. We assume that user C also likes lamb and regard lamb as a matching result of user C. This is standard collaborative filtering based on data statistics. This figure can help you understand how collaborative filtering works.
4) Vector Matching Architecture
This section describes how you can use the preceding three vector matching algorithms. The input data of these algorithms are user IDs, item IDs, and behavior data. The following figure shows a user behavior data table. After you access the table, you can use a vector matching algorithm to obtain two vector tables. These vector tables contain key-value pairs. Each user ID corresponds to a vector, and the key-value pairs can be cached in Redis. In actual use, you need to store the data to the Faiss server. Faiss is an open-source engine developed by the Facebook AI team for vector retrieval. It provides multiple vector retrieval modes and can return the result for retrieving millions of vectors in one millisecond. It has an excellent performance and is usually used in the recommendation matching field. For example, when we want to recommend an item to a user, we use the user ID and its vector to check which item vectors in the Faiss engine have the closest Euclidean distance to the user’s vector. For example, we use the top 10 item vectors as the matching result of the user. That is the overall vector matching architecture, in which both Redis and Faiss are used.
Learn more about Alibaba Cloud Machine Learning Platform for AI (PAI) at https://www.alibabacloud.com/product/machine-learning
The views expressed herein are for reference only and don’t necessarily represent the official views of Alibaba Cloud.