Collaborative Filtering for Product Recommendation
11.11 The Biggest Deals of the Year. 40% OFF on selected cloud servers with a free 100 GB data transfer! Click here to learn more.
Machine learning is the science of using statistical algorithms to give computers the ability to learn from a large amount of historical data, create analytical models, and then use analytical models to support business development. Machine learning can currently be applied to the following scenarios:
- Marketing scenarios, such as product recommendations, user profiling, and precise marketing.
- Finance scenarios, such as bank loan prediction, financial risk control, stock trend prediction, and gold price prediction.
- Data mining in social networking sites (SNS), such as Twitter opinion leader (influential) analysis and social relationship chain analysis.
- Text scenarios, such as new categorization, keyword extraction, document summarization, and text analysis.
- Unstructured data processing scenarios, such as image categorization and optical character recognition (OCR).
- Other prediction scenarios, such as rain prediction and soccer game prediction.
Machine learning can be typically divided into three categories:
- Supervised learning. Each sample has an expected value in supervised learning. Supervised learning is a machine learning task that maps the input (feature vectors) to expected values using modeling. Supervised learning is used in regression and classification.
- Unsupervised learning. Unsupervised learning is a machine learning task that draws potential inferences from samples without expected values, such as some simple aggregations.
- Reinforcement learning. Reinforcement learning is about how agents take actions to interact with an environment to maximize the cumulative reward. Examples of reinforcement learning include AlphaGo Zero and autonomous driving.
You can implement your own machine learning algorithm for the above scenarios with Alibaba Cloud’s Machine Learning Platform for AI. Built based on the Alibaba Cloud MaxCompute (ODPS) platform, Machine Learning Platform for AI is an integration of data processing, modeling, and online and offline prediction. Alibaba Cloud Machine Learning for AI applies the proven technology of Alibaba Group to offer more simple operations for machine learning users. This brings Artificial Intelligence (AI) to the machine learning users.
Using Alibaba Cloud Machine Learning Platform for AI
Note: The data in this section is created for testing only.
The parable of beer and diapers is a classic case of data mining utilization. When the diapers and beer are put next to each other on shelves, the sales of both items increase. The problem is how to find the hidden correlation between two irrelevant products in order to increase their sales. To resolve this problem, you can use data mining algorithms such as collaborative filtering. This algorithm enables you to find the hidden correlations from customers to customers or products to products.
Collaborative filtering is a correlation rule-based algorithm. The following example shows how collaborative filtering predicts the interests of customers A and B in products a, b, and c. If both customers A and B have purchased products X and Y, collaborative filtering determines that customers A and B have similar interests in shopping. Collaborative filtering then recommends product Z to customer B because customer A has purchased product Z. This is a classic example of using features of users as a correlation.
You can use collaborative filtering to make product recommendations, as follows:
This experiment uses the customer shopping behavior recorded before July to find the correlations between products. The information is then used to recommend relevant products to customers and make an assessment of the recommendation results. For example, customer A purchased product X before July. Product X is strongly correlated with product Y. The system then recommends product Y to customer A after July and calculates the probability of customer A purchasing product Y.
This experiment uses data collected from TIANCHI challenges. The data is divided into two parts: shopping behavior before July and shopping behavior after July.
The attributes are as follows:
The following figure shows the data.
Data Exploring Procedure
The experiment flowchart is as follows:
- Generate a product recommendation list based on correlation rules.
- Actual shopping behavior after July.
- Number of recommended products and hit rate.
1. Generate a Recommendation List
Load the shopping behavior data recorded before July, use SQL scripts to extract the shopping behavior, and then import the data to the corresponding filtering component. Set the TopN attribute to 1 for the corresponding filtering component. This allows the corresponding filtering component to find the most similar item for each input item and calculate its weight. Analyze which products are most likely to be purchased by the same customer, as shown in the following figure:
The corresponding filtering result shows the correlation between products. The itemid field indicates target products. Products strongly correlated to the target products and correlation coefficients in the similarity field are separated with colons (:).
2. Make Recommendations
Step 1 shows how to list all strongly correlated products. The following procedure shows how to recommend product b to customer A after customer A purchase product a by using the product similarity list and how to calculate the hit rate.
3. Display Results Statistics
This figure shows the statistics components. The full table scan component 1 shows the recommendation list created based on the shopping behavior before July. By removing duplicate rows, the final list contains 18,065 entries. The full table scan component 2 shows the number of products (in the recommendation list) that are purchased by the customers. In this experiment, 90 products are purchased by the customers.
By referencing the recommendation results, the experiment does not reach our expectations. The reasons include the following:
- This experiment only introduces how to use collaborative filtering to make recommendations. Key components for shopping behavior-based recommendations, such as time series, are not processed in this experiment. The validation of shopping behavior is essential. Using data from shopping behavior collected across several months may not deliver the expected results.
- This experiment only focuses on the correlation between products. The attributes of recommended products, such as the purchase frequency of products, are not concerned. For example, mobile phones are products with a low purchase frequency. if customer A has purchased a mobile phone last month, customer A may not purchase another mobile phone this month.
- To increase the accuracy of the prediction, machine learning algorithms must be used to train models. The method of product correlation-based recommendations should only be used to supplement other methods.
To learn more about machine learning on Alibaba Cloud, visit www.alibabacloud.com/product/machine-learning