Yidui — Challenges for Serverless in Livestreaming Scenarios
By Lenny Zeng, Middleware Architeture
Yidui is a video matchmaking app founded in 2015 as a sub-brand of Beijing Milian Technology & Trade Co., Ltd, one of the national high-tech enterprises. The Yidui app is ideal for making friends and setting up blind dates by creatively integrating video, live broadcasts, and online matchmaking. The app opens up an independent channel for online dating community, providing a new social experience for single people. After more than five years of rapid development, the number of registered users has reached more than 100 million, and about 10 million online blind dates are held each month. Therefore, Yidui has turned into one of the most influential brands in the online dating industry.
As Yidui business grows, the system scale and complexity of core applications are undergoing major changes. Its technical team is devoted to maintaining the technological advancement of the entire system architecture by using emerging technical means to support business needs and reduce IT costs. Since its inception, the core system architecture of Yidui has undergone multiple major upgrades, involving important technologies in microservitization, containerization, distributed database, and AIOps. They have invested a lot of energy to fully enjoy the value of rapidly elastic scaling of resources in the cloud computing era, especially in Serverless technology exploration.
In the Yidui business scenario, live broadcasts are the most important link. Based on live broadcasts, online matchmakers and other innovative business patterns are integrated, raising high requirements for content security. The intelligent analysis of live broadcast content through AI technology and the supervision requirements need to capture the content according to the fixed rate frequency. Every video is captured after the start of each live broadcast, and the images generated by the capture are processed through the unified audit service.
In this demand, the frame capture service undertakes key responsibilities. The service needs to run the FFmpeg command on each livestream for frame captures, save the generated pictures in Object Storage Service (OSS), and write the captured information in Kafka. The downstream capture service can pull the capture information from Kafka for the image addresses in OSS, thus completing the reviews of the images. In this architecture, Kafka was introduced to ease the load on review services during peak business hours through the asynchronous processing mechanism.
It is very simple to use the FFmpeg command for frame captures, but it requires strong CPU computing power. According to various tests from the Yidui Technical Team, capture services with 16-core ECS deployment is a relatively cost-effective choice. If the fixed frequency is one frame capture per second, one ECS instance can simultaneously support about 200 channels of live broadcast capture. To guarantee resource reserve during peak business hours, thousands of ECS instances are deployed for the frame capture service in Yidui. Like most Internet applications, Yidui loads also exist with clear peaks and troughs. The peaks occur every evening. After midnight, the business volume declines. The business fluctuations pose a significant challenge to the overall resource planning of Yidui. There are two drawbacks if the frame capture service is deployed at fixed ECS cluster scales:
- To support peak business, the cluster size must be evaluated by the number of users at peak times, which can result in a huge waste during the business troughs.
- In some scenarios, such as scenarios with celebrity endorsements, there is a surge in business volumes. A temporary expansion of cluster scales may be required. Under this circumstance, the expansion speed often lags behind the growth rate of business flow, resulting in the degradation of some businesses.
To save the resource cost, the Yidui Team explored various auto scaling policies. For example, the cluster size can adapt to the change of real business volume and deploy applications through an elastic ECS instance in containerized mode. However, the implementations of these strategies are complex, and the scalability is weak. An application always runs for a long time after it is started in the traditional service architecture. During the running period, it will deal with multiple business requirements concurrently. The computing power occupied by the application doesn’t change substantially regardless of how the business volume changes.
Is there a straightforward way to pull up the computing power to perform frame captures while livestreaming and automatically release the computing power when it finishes? This method does not need the permanent running of application instances, achieving real on-demand computing resource allocations. It does not need to use additional means to dynamically adjust the cluster scales of the frame capture service. This is the most solution. As the representative of cloud-native Serverless technology, Alibaba Cloud Function Compute (FC) encompasses the idea.
FC is a fully-managed and event-driven computing service. With FC, users can focus on writing and uploading codes without purchasing and managing infrastructures, such as servers. FC prepares computing resources automatically and runs code elastically and reliably. It also provides features, including the log query, performance monitoring, and alerting. With FC, any application and service can be created quickly, with the resource payment that the tasks consume.
FC provides an event-driven computing model where functions are triggered by events. The function executions can be triggered by the user of the function or by some other event sources. Users can create a trigger in a specified function to describe a set of rules. When an event satisfies those rules, the event source triggers the corresponding function. For example, for HTTP triggers, a user’s HTTP request can trigger a function. For OSS triggers, a function can be triggered by adding or modifying a file on OSS. In the frame capture scenarios of Yidui, the function only needs to trigger the capture function through the business program before each livestream begins. Therefore, it can be migrated to the FC platform to enjoy the value of Serverless after adjustments to the architecture based on the previous capture service.
When the Yidui Technical Team first communicated with Alibaba Cloud about the Serverless scheme, Alibaba Cloud’s technical personnel recommended Python language to achieve the frame capture function. Since FC provides native operating environments for Node.js, Python, PHP, Java, and other languages, Python’s scripting language can modify the scheduling code directly on the FC platform. Yidui developers are more familiar with Golang, so they chose Golang for the frame capture service in the subsequent development. FC has no requirement for development language, and any mainstream development language can be supported. With the Custom Runtime provided by FC, a customized operating environment can be created for task languages. Custom Runtime is an HTTP server. This HTTP server takes over all of the requests from the FC system, including event calls and HTTP function calls.
Under the Serverless architecture, each livestream pulls up new computing resources to undertake the task of frame capture. It is unnecessary to use high-specification ECS instances to simultaneously and concurrently process multiple frame capture tasks. Through repeated tests, Yidui adopted the FC instance with the lowest specification, the instance with 128-MB memory, to complete the frame capture of each livestream.
FC made plenty of optimization to speed up the initialization of computing resources. With the cloud-based resource pool, it can schedule a large number of computing instances within 100 milliseconds to bear sudden business traffic surges under special circumstances. The Alibaba Cloud FC Team also provides the company with a timed preheating way to guarantee the performance of cold start computing resources in the peak business periods to a high degree to adapt the business scenarios of Yidui further. This extreme elastic scaling capability is the specialty of Serverless. Elastic scaling of traditional application architectures depends on the scheduling of underlying computing resources. And its complex initialization is far from the level of Serverless in startup speed of computing instances.
Under normal circumstances, the runtime of a common elastic instance on FC is ten minutes. A performance instance is provided to cope with higher resource demands, and the runtime of the performance instance is increased to several hours. In Yidui frame capture scenarios, the single instance does not need to have high performance, but it is necessary to run for a long time with the livestreaming. Therefore, Alibaba Cloud loosened the runtime limit of elastic instances for Yidui, which is up to one hour. It can also support live broadcasts over an hour long. During the frame capture, when a function instance is about to reach the runtime limit, it only needs to pull up a new function instance to continue the task of frame capture. This does not have any impact on the normal operation of the frame capture.
In the live broadcast scenarios, the value of Serverless technology to the company is clear. An instance scheduling engine of FC innovation brings the advantages of cloud computing efficiency, performance, cost, openness, and other aspects. Since the company decided to pilot Serverless technology for the capture business, it only took them one week to complete the processes of pre-research, development, debugging, testing, and launch. Then, they began to enjoy the dividends brought by Serverless technology in the cloud computing era, with the negligible workload of code transformation.
FC utilizes on-demand computing resource scheduling and pay-as-you-go. Moreover, it reduces costs through the model of reserved instances. According to the preliminary evaluation, the Serverless architecture based on the FC can help reduce the resource cost by more than 20% in the business scenario of frame captures.
The FC does not need to reserve computing resources or maintain the underlying hardware and software, which reduces the operating cost. It also lets the Yidui Technical Team focus on the implementation of complex business logic. This is one of the great values that Serverless technology brings to enterprises and developers.
After the success of the frame capture through Serverless technology, Yidui continues to explore the matching scenes of Serverless technology in more business areas. In a recent transformation, the company also migrated the business of short video transcoding to the FC platform. Hundreds of thousands of short videos are transcoded through Serverless technology every day. The FFmpeg command is very efficient for transcoding operations on videos, which takes an average of 1.6 seconds. The increase in resource utilization is clearer with Serverless technology. In this business scenario, resource costs can be reduced by at least 60%. In the future, Yidui will continue to explore Serverless architecture based on its technical characteristics. When Yidui embraces new technologies, it also enjoys the dividends of cloud computing.