How to Sustain a Growing Platform and Gain Online Users

Image for post
Image for post

By Cheng Zhe, nicknamed Lanhao at Alibaba.

In this article, we are going to be looking at experiments done at Alibaba, which were intended at helping increase online users on its second-hand buy-and-sell platform, Xianyu, which literally translates to “Idle Fish.” But, before we get ahead of ourselves, let’s discuss some of the dynamics of the buy-and-sell platform of Xianyu:

  • As a buy-and-sell platform, sellers on the Xianyu platform generally are individual consumer sellers, rather than commercial store sellers. Therefore, it is difficult to organize sellers into marketing promotions in a unified manner.
  • At present, the number of daily active users, or DAU, of Xianyu exceeds 20 million, so appropriately supporting such a large number of users is a major test for the operation personnel of the platform.

In early 2019, the Xianyu team at Alibaba conducted multiple experiments on user growth, including the following two experiments, shown in the figures below:

Image for post
Image for post
Image for post
Image for post

The team conducted the preceding two experiments with the aim of retaining users on Xianyu for a longer time. The longer time that users spend on browsing on Xianyu, the more likely they are to discover interesting content, including products and posts in our various curated item groupings, or what are referred to in Chinese as “fish ponds.” As such, users may be attracted to return to Xianyu at some later time, and Xianyu can achieve a greater level of user growth. Most of the experiments we conducted produced good business results. However, two problems were also found with these experiments:

  • Long research and development period: In the beginning, our team used the fastest implementation solution in order to quickly verify the effectiveness of rule policies. We did not use big and comprehensive designs, but met each requirement by writing code case by case. As such, the period from development to launch may be a lengthy three weeks, mainly because this is the time window for releasing new versions of the client.
  • Low operation efficiency: Due to a slow launch, it can be a long time before we can analyze performance after obtaining business data, and it can take even longer to make adjustments based on the data. Given this, only a few rule policies can be implemented in a year.

One Solution, a Rule Engine Based on Event Streams

Image for post
Image for post

We engineered the business abstraction layer to have improved R&D efficiency and operation efficiency. To this end, we developed the first solution, which was a rule engine based on event streams. We took user behavior as being a series of sequential behavior event streams. We can define a complete rule by using a simple event description in Domain Specific Language (DSL) and then incorporate input and output definitions.

Image for post
Image for post

Let’s look at the second experiment in user growth as an example. This example can be briefly expressed in DSL as shown in the following figure.

Image for post
Image for post
Image for post
Image for post

Limitations of the Rule Engine

Image for post
Image for post
Image for post
Image for post

In the C2C security service, a rule abstraction operation, which is obtained from a series of behaviors.

Despite our best efforts, these security rules could not be used in the rule engine. Consider this example rule, for instance. If a user is blacklisted twice within one minute, this user will be marked with a high-risk tag. When the first blacklisting event occurs, the rule engine matches the event. Then when the second blacklisting event occurs, the rule engine also matches this event. As such, the rule should be met from the perspective of the rule engine and subsequent operations can be performed. However, one important aspect is that the blacklistings should be performed by two different users to prevent one user from maliciously blacklisting another with multiple devices.

Image for post
Image for post

However, this is difficult for the rule engine to discover, as the rule engine only knows that two blacklisting events are matched and the rule is met. This is because the rule engine can match only stateless events and cannot trace back the details of these events for further aggregate computing purposes.

A New Proposed Solution

  • SQL is a fully-semantic programming language that does not require additional syntax design.
  • SQL is a simple language that can be easily learned.
  • The operation personnel of Xianyu are proficient in SQL, which can improve the launch efficiency.
Image for post
Image for post
Image for post
Image for post

Compared with the previous rule engine, the new DSL solution has the following strengths:

  • Addition of conditional expressions: More rich and complex event descriptions are supported, and more business scenarios are supported.
  • Addition of time expressions: The WITHIN keyword is used to define a time window. When we use keywords such as DISTINCT following HAVING, aggregate computing can be performed for events in the time window. Our new solution can solve the preceding problem of rule description for the C2C business.
  • Enhanced scalability: Our new solution complies with industry standards and is unrelated to the input and output of specific businesses, which facilitates promotion.

The example below shows how our new solution resolves the problem we discussed in the previous section.

Image for post
Image for post

Overall Layered Architecture

The layered architecture comprises the following layers from the top down:

  • Business application: This layer is the business end of the entire system and has been implemented in multiple business scenarios.
  • Task delivery: This layer provides DSL statement and delivery capabilities for the business application layer and can be used to select target users and associate with the user outreach module.
  • User outreach: This module is used to receive computing results from the EPL engine and implement associated actions. This module can also independently provide services for business applications. Each business application can have its own logic and perform user outreach by using the user outreach module.
  • EPL engine: Currently, the EPL engine is already able to implement cloud parsing and computing. It can receive the DSL statements in task delivery and then parse and run the DSL statements on Blink.
  • Event collection: This module collects behavior events from the server logs and behavior tracking and then outputs the events to the EPL engine in a normalized manner.
Image for post
Image for post

Event Collection

Image for post
Image for post

EPL Implementation

Image for post
Image for post

User Outreach

Image for post
Image for post

Use Case

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

From the above example of Fish Ponds, we can see that this solution is somewhat like algorithm recommendation. In the preceding rental example, the rule is too complex and it is difficult to express the rule in DSL. Therefore, the rule is configured to collect only four browses of different houses for rental. After the rule is triggered, the collected data is provided for the business team that developed the house rental application. This is also the boundary we found during the implementation.

Summary

  • High performance: The end-to-end computing process takes five seconds on average.
  • High reliability: Based on the high reliability of Blink, this solution supported hundreds of millions of operations per second during the Double 11 Shopping Festival.

By implementing this solution in multiple businesses, we found its appropriate boundaries. This solution is applicable to businesses that:

  • Have high real-time requirements.
  • Have rules formulated by a strong operations team.
  • Can be expressed in SQL.

Future Plans

  • The end-to-end computing process takes five seconds on average. This cannot meet the needs of game task scenarios that have high real-time requirements. For instance, assume that a task today is to browse the details of 10 items. When a user browses the 10th item, the user must wait five seconds to receive a response, which is unacceptable in terms of user experience. Therefore, we need to further improve the overall performance and provide a response within milliseconds.
  • Xianyu’s business has maintained high growth for successive years. In the future, Xianyu may face user traffic that is three times our current level. If all computing is still completed on the cloud, this will pose a significant challenge to the computing power of the cloud.
  • The current design does not include algorithm access, but is simply used for selecting target users. To more accurately deliver rules and improve the effectiveness of the rules on users, we need to combine the solution with algorithms.

Therefore, in the future, we will focus on exploring real-time computing capabilities on the client and the integration of algorithm capabilities.

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store