By Fengming from Xianyu Technology
In the northern ocean there is a fish called the k’un. This k’un changes into a bird called the p’eng. This is what Zhuang Zi wrote in “A Happy Excursion”. Inspired by the k’un and the p’eng, Alibaba Cloud named one of its systems as Kunpeng and expects Kunpeng to take you around the world.
Xianyu is developing rapidly, with large-traffic scenarios such as the Xianyu homepage feeds, Xianyu search results page, feed streams of recommendations from Guess What You Like, and key position section on Xianyu homepage. These scenarios have become traffic entries for which the product and operation teams of every business line are scrambling. When massive delivery requests were submitted to the technical team, the drawback of the old development and operation mode surfaced. The existing technical capabilities cannot ensure the fulfillment of page view (PV) targets of delivery and a high click-through rate (CTR) and conversion rate (CVR) in the same scenario for which multiple business lines compete. To achieve a high CVR, the business side has to apply for more product positions to deliver relevant business materials. This leads to a vicious cycle that affects the exposure efficiency of other business materials and ordinary products. As delivery requirements continue to grow, it is difficult to achieve efficient, collaborative development and raise the efficiency of development and delivery in the old technical R&D system. This situation has been complained about by the business side. Therefore, we urgently need to design a suite of new technical solutions to these problems.
The fundamental problem that we need to solve is how to control the overall traffic from a global perspective and maximize the global traffic conversion efficiency when the total traffic is fixed.
To solve this problem, we start from the following perspectives:
- Ensure that the traffic targets delivered by multiple parties are achieved in a single scenario (such as the homepage feeds or search results page) and improve conversion efficiency while meeting the delivery targets.
- Connect multiple scenarios to improve the global conversion efficiency while meeting the requirements of all data indicators of a single scenario.
- Empower the business side through engineering methods to speed up the launch of new businesses, adjust delivery strategies promptly, and reduce trial and error costs.
Based on these considerations, we decided to engineer the Dujiangyan algorithm platform in collaboration with the algorithm team of Alibaba DAMO Academy. We wanted every business party to benefit from algorithms at a very low cost, achieve their business targets, and improve their conversion efficiency. At the same time, we designed a new delivery system based on a concept. Specifically, various operational capabilities can be rapidly expanded and accumulated as components. Special material delivery requirements are converted to technical requirements and the parallel development mode is used to implement these requirements, giving the business side more opportunities.
Building of the Kunpeng System
To achieve reusable, manageable, operational capabilities, and flexible delivery policies, we abstracted a three-level structure of the activity, scenario, and material. We also allow defining multiple templates in each scenario to manage materials.
“Scenario” is the most fundamental concept in the Kunpeng system and is the stage for delivering materials. Scenarios must be defined before the targeted business development and the preparation and delivery of materials. Take the search results page as an example. Five scenarios are defined on the page: giraffe operation, search result feeds, poplayer, query word intervention, and background and atmosphere, as shown in the figure below. Only after these scenarios are defined can the effective scope of business materials be determined.
When scenarios are in place, we can create personalized material templates for different business lines according to their different delivery requirements. Then, the operations team can create materials based on these templates. The operations team can use all available operational capabilities, such as filtering by unique visitor (UV) fatigue, filtering by search term, filtering by platform, and filtering by version, to configure materials and select or enter information on the page. After materials are created, the operations team only needs to create an activity, specify the target user group, set the effective time, and then deliver the materials to finish the delivery task.
Every business has expected goals when making a delivery plan. The operations team only needs to enter the business delivery goals in the console during the delivery process. The Kunpeng system will automatically collaborate with the Dujiangyan algorithm platform to achieve these goals and improve conversion efficiency.
The delivery target (such as one million exposure PVs in three days), delivery strategy (such as prompt, smooth, and free delivery), and materials configured by the operations team in the Kunpeng console will flow back to the algorithm platform through Kunpeng’s offline data link (T+1). Based on the target, strategy, and materials, the shuffling algorithm model can make a global delivery plan for the next day. The exposure and click data of each material contain a unique business identifier issued by Kunpeng. The data flows back to the Dujiangyan algorithm platform and will be used for continuous model optimizations and iterations. Data provided by each vertical algorithm and exposed after shuffling flows through Kunpeng’s general exposure and filtering data link back to the exposed data table to solve the problem of repeated exposure of vertical data. Each vertical algorithm is a provider of algorithm data customized for business, such as the vertical algorithm for live streaming and the vertical algorithm for purchase.
The following figure shows the data processing process for a user request after the trunk code of different scenarios connects to the Kunpeng platform.
As shown in the preceding figure, data processing involves many modules. During the design of the Kunpeng system, we tried to implement different templates by using an extensible architecture. We also built the functions customized for different business lines into public components to achieve fast iteration for more business lines.
The following figure shows the architecture design of the system.
DataFetcher Extension Point Subsystem
Kunpeng’s DataFetcher extension point subsystem is one of the keys to achieving low-cost and parallel development in which multiple people participate. Data delivery requirements of each business line, such as recommending the associated shopping guide information in real-time according to a user’s search terms, may involve the real-time acquisition of to-be-delivered materials from the remote service. If this is the case, relevant developers can call the remote service in the callback method of a subclass to obtain data and write the logic of DO conversion. As such, the developers can complete business development and delivery with ease. At the underlying layer of the Kunpeng system, we have encapsulated functions such as RPC concurrency, resource isolation, and metric monitoring of multiple DataFetchers. These functions are completely transparent to business operators, and business developers only need to focus on the business itself. Before the Kunpeng system was launched, developers had to be very familiar with the trunk code of scenarios corresponding to the delivery requirements of each business line. As a result, only scenario technology owners were capable of development and delivery in the scenarios, such as the search results page and homepage feeds. Now, Kunpeng has eliminated this single-point resource bottleneck in development.
The DataFetcher subsystem is an important implementation of Kunpeng’s modular feature. For example, a batch of handpicked products recommended in real-time needs to be delivered to the Xianyu homepage feeds scenario for new users. A DataFetcher is used to implement this. If the operations team needs to deliver similar handpicked products on the search results page or the Guess What You Like page, the team can directly reuse this DataFetcher by registering it to the corresponding scenario through the Kunpeng console. This method effectively saves development resources and improves business launch efficiency.
The business side has an increasingly higher demand for refined operations, so we built a modular filter subsystem on the Kunpeng platform and made this suite of filters into basic components available for all business lines.
In addition to the well-known group-oriented targeted delivery, we also provide various prefilters, such as filtering by platform (iOS or Android), filtering by version, filtering by canary traffic ratio, strict search term matching, fuzzy search term matching, filtering by page number, and filtering by UV fatigue. If the business side has requirements for targeted delivery, it can choose the corresponding filters in the Kunpeng console and use them directly.
If the business side has special requirements for targeted delivery that the existing filters cannot meet, developers can easily implement the special business logic inheriting the MatFilter base class. The developers can make it a general filter component that can be reused by all business lines. Then, the capability of Kunpeng’s filter subsystem will keep accumulating and will be shared among business lines.
For example, the game operations team has a requirement that when users search “Honor of Kings” and “running karts” on Xianyu, the bamboo dragonfly game is exposed to certain game enthusiasts on Android 6.6.7 to 6.7.1 clients with a 15% canary traffic ratio and a maximum of 10 exposures every three days for each user. This requirement involves the use of multiple filters. Powered by Kunpeng, these filters can be reused directly without the need for development. Before the use of Kunpeng, similar requirements would take effect only after the product requirement document (PRD) review, development, joint debugging, testing, and release.
Connection with the Dujiangyan Algorithm Platform
An important goal of Kunpeng is to help achieve business goals and optimize global traffic allocation by leveraging the algorithm capability of Alibaba DAMO Academy. We built the Dujiangyan algorithm platform with Alibaba DAMO Academy and implemented a set of general shuffling algorithms to connect offline data paths with online service processes from the engineering perspective. This allows business lines to reuse this algorithm capability at a low cost. During the delivery in the scenarios of Xianyu homepage feeds and the key position section on the Xianyu homepage, multiple business lines have used the general shuffling algorithm capability provided by the Kunpeng system. This capability has helped the business side to achieve PV targets while raising the PV CTR by 60% to 100%. As the algorithm model continues to improve in the next iterations, the PV CTR indicator will also rise to a great degree.
Group Management and Approval Flow
The Kunpeng system is used at multiple large-traffic interfaces such as the homepage, search results page, and Guess What You Like. Each interface is used for delivery by operators from multiple business lines. Therefore, appropriate group-based permission management is necessary to avoid mistaken modification of other business materials.
The Kunpeng system is built with a group management subsystem. Each operator of the business side must belong to a business group, and members of a business group can see materials only in this group. Also, materials and activities created for a business line must belong to the same group. As such, people and materials are effectively managed by groups.
When the configurations of a material or activity have changed, an approval ticket is automatically generated and flows to the auditor of the corresponding group. This approval process avoids the risk of human error.
Division of Roles after Connection with Kunpeng
In the old development mode, there was a blurred zone between operations and development. Due to incomplete capabilities of delivery platforms, many operational configuration changes must be made by using Switch on the server side or by modifying the code.
After the connection with Kunpeng, the division of roles between development and operations is clear:
- The scenario development owner is responsible for connecting the Kunpeng system.
- The business development team is responsible for implementing business requirements at Kunpeng’s extension points.
- The operations team is responsible for configuring materials in the Kunpeng console, delivering the implementations that the developers have achieved at extension points, and choosing appropriate basic components for effective delivery.
After the development team implements business requirements, the operations team will independently complete all the subsequent changes in operations in the console. If the delivery does not meet expectations, the operations team can modify or cancel the delivery without the development team’s participation throughout the process.
Currently, the Kunpeng system is used in many scenarios such as the Xianyu homepage feeds, key position section on Xianyu homepage, Xianyu search results page, Guess What You Like, and search by buzz word in the search bar on Xianyu homepage. This system empowers the business side to achieve goals, already increasing the PV CVR by 60% to 100% and reducing the UV conversion cost by about 40%. Moreover, there is still plenty of room for optimization. Multiple basic components for operations are reusable without the need for development. The DataFetcher extension point mechanism enables business lines to concurrently invest their development resources. As such, the overall time required for delivery has been reduced by more than 50%.
The algorithm empowerment, parallel development of extension points, material management and delivery, and modular operational capabilities provided by the Kunpeng system have greatly enhanced delivery effects and R&D efficiency. In the future, we will continue to explore the general capabilities of vertical algorithms, and adopt a more flexible development mode for extension points. We will also connect more scenarios horizontally and optimize the global throttling effect to bring more growth points to business lines.