Real-Time Personalized Recommendation System

Image for post
Image for post

Introduction

A real-time system is a system that processes input data within milliseconds so that the processed data is available almost immediately for feedback. Real-timeliness in the video recommendation system is mainly reflected in three layers:

  • Real-Time Changes of Candidate Sets: In this recommendation system, the definition of the candidate sets is the recommendation of different types of video libraries for the user. A user cannot view all the candidate sets but can view only a part of the candidate set after the matching algorithm processes the candidate sets. The update interval of the candidate set directly affects the real-timeliness of videos that the user can see. Several candidate sets exist, each tailored for different scenarios. For example, by combining the latest candidate set and the most popular candidate set in the past N hours, we can achieve a recommendation effect similar to that of toutiao.com. The generation of new content candidate sets is in real time, while the popular video candidate set in the past N hours may be updated every several minutes. As another example, synergy can achieve the recommendation of related videos, which can further shortlist the user-favorite content from popular candidate sets that attract the common interest of users.
  • Real-Time Presentation of Recommendation Performance Metrics: After the product is online, key metrics such as the click conversion rate can be updated once every several minutes. The recommendation system has a special feature in which the performance is not measured by subjective opinions but by specific metrics, such as the click conversion rate.

User Profiles and Video Profiles

Reflection of user profiles in the interest model is a common occurrence. By building users’ long-term and short-term interest models, businesses can satisfy users’ interests and demands. There are various ways to provide recommendations, such as synergy and a variety of small tricks. However, user profile-based and video profile-based recommendations are difficult in the initial phase. In the long run, these user profiles can promote the team’s understanding of users’ video consumption habits and support businesses other than recommendations.

Asynchronous

A user’s refresh actions triggers the recommendation calculation. Once refreshed, the user information is sent to Kafka asynchronously, and the Spark Streaming program will analyze the data and match candidate sets with users. The user’s private queue in Redis receives the calculation result. The interface service is only responsible for getting the recommendation data and sending the user refresh actions. The private queue of a new user or a user who has not accessed the service in a long time may have expired. In this case, the asynchronous operation will cause problems. Once the front-end interface discovers this issue, it will perform one of the following actions to resolve the problem:

  • Obtains the user interest tags and tries to determine the synergy according to certain rules. Then it searches for the data in ES, populates the data onto the private queue, and quickly provides the result (the solution we are adopting).

Impact of Streaming Technologies on Recommendation System

In 2014, the concept of stream computing concept did not exist and the reuse of existing technical system was not possible. As a result, our recommendation system was overly complicated and difficult to be productized. To make matters worse, the recommendation effects were only visible the next day, resulting in a prolonged cycle of effect improvement. During that time, the entire development cycle exceeded one month.

  • User profile construction, such as short-term interest model
  • New and popular data candidate sets
  • Short-term synergy
  • Recommendation performance metrics, such as click conversion rate
  • Reuse of Existing Big Data Infrastructure: Throughout the entire recommendation system, only the provision of API services requires a separate deployment and all other calculations run on the Hadoop cluster using Spark.
  • Adjustment of Calculation Cycles: All calculation cycles and computing resources can be adjusted conveniently or even dynamically (Spark Dynamic Resource Allocation). This is vital because it allows us to sacrifice specific real-timeliness to save resources or spare more resources for offline tasks.

Recommendation System Architecture

The figure below shows the structure of the entire recommendation system:

Image for post
Image for post
  • New video tagging
  • Short-term interest model calculation
  • User recommendation
  • Calculation of candidate sets, such as the latest, the most popular sets (during any time period)
  1. HBase (user profile and video profile)
  2. Parquet (HDFS) (archived data)
  3. ElasticSearch (copy of HBase)
Image for post
Image for post
  • Flume (Collect nginx logs)
  • Kafka (Receive Flume reports)

Personalized Recommendation

Image for post
Image for post

Conclusion

This article explores the usage of stream computing for the personalized video recommendation system. In this approach, a tag system is designed and then applied users and videos. Multiple algorithms, including LDA and Bayesian, are combined to gain a wholesome and useful experience.

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store