By Wan Xiaoyong, nicknamed Wubai at Alibaba.
Xianyu is a popular second-hand buy-and-sell platform started by Alibaba. As the platform as grown, we have seen a trend, which is that, as the activity of sellers increases, the ratio of successful transactions also increases, which in turn also grows the number of users on the platform as a whole. To all of this, the key is transaction efficiency. However, given the nature of the second-hand market, individual sellers make up the majority of sellers, and as one would expect individual sellers tend to have a much lower transactions efficiently than that of professional sellers. This, of course, makes sense, as most individual sellers probably do not consider selling their used items as their main source of income. Therefore, as platform creaters, At Alibaba, we want to be able to help sellers improve their transaction efficiency, and quickly add more scenarios for sellers.
By analyzing online data, we found some interesting phenomena. For example, we found that more users, which are, strategically speaking, potential sources of traffic, are likely to access the homepages or items of sellers that use real profile pictures. Moreover, we also found that sellers that actively respond are more likely to complete transactions. And, last, we also found that items that are closer to users or that have more detailed descriptions are more likely to be sold.
Closed Loop of Seller Behavior
The activity of sellers is closely related to the sellers will actually complete transactions. Therefore, we need to increase the online activity of sellers based on the number of transactions they make. Our observations have shown that sellers’ behaviors may have major or minor impacts on transactions. Therefore, our work needs to take these different behaviors into account.
Efforts Based on Seller Behavior
The aim of sellers, naturally, is to complete transactions. Therefore, with the goal of completing transactions in mind, we perform a round of simulation based on an online algorithm model by using two key metrics.
- Online status: The current online status of a user and the duration from the last online status.
- Statistics on inquiry replies: The inquiry reply information of the seller to a potential buyer in the last half an hour.
For reasons of confidentiality, this algorithm is only described in simplified terms. In this article, we will be focusing on the engineering side of this algorithm. We will be defining four elements of sellers’ behaviors on Xianyu, which are the when, where, what, and who of the seller’s behaviors. As you would expect, the when and where define the temporal and spatial dimensions of a seller, that is, where is the seller located in China and when did he or she list his or her posting on the platform, so on. Next, what describes content of the seller’s behavior, including what the seller is actually buying, and, last, who indicates the entity, that is the seller himself or herself, who conducts the behavior. Information related to the seller can be found based on his or her profile information on the platform.
Thinking relatively broadly, it is generally quite easy to describe the behaviors of a seller. However, it is not so easy to correctly identify the behaviors of the seller. What do we mean by this? Well, consider this. The complete reply behavior of a seller can be split as follows:
- A potential buyer on a platform starts a conversation with the seller based on an item that the seller has posted.
- In the new conversation window, the potential buyer sends a message to the seller, such as a question or a automatic message that indicates that the buyer is interested in the item that is posted.
- The seller replies to the buyer in the conversation window.
The three behaviors above must meet the following constraints:
- The time sequence is appropriate for the behaviors.
- Behavior 2 from above can be repeated.
- Interference such as automatic replies and security reminders may occur among behaviors 1, 2, and 3.
To this end, we use complex event processing, or CEP to match a complex event model. After comparing Siddhi and Flink CEP based on size, flexibility, and cost, we selected Siddhi as our CEP engine in the early phase.
The above figure shows the simulation results of the algorithm based on the two metrics. Data is desensitized for security reasons. The larger the number on the vertical axis, the more transactions completed. The horizontal axis shows the multi-dimensional characteristics about the seller based on the online status and reply behaviors of the seller. From the simulation results, we can see that sellers who are online and actively respond are more likely to complete transactions. This also shows that sellers’ behaviors do have a potential impact on transaction efficiency.
Building a Complete Closed Loop
The results of the preceding analysis indicate that we are going in the right direction, but it is far from enough. For example, consider these problems. How can sellers identify behaviors that are helpful for completing transactions? And how can sellers perceive the positive benefits of these behaviors?
Here we will be focusing on promoting sellers to complete transactions. Therefore, the core idea is to build the preceding closed loop to cultivate attentive and stable sellers and help them rapidly complete transactions.
Our core path is guiding sellers > seller behavior > benefits > perception of benefits. Based on this core path, we designed the activity, task, and data collection modules. For this, an activity is associated with several tasks and serves to guide sellers. A task is used to standardize seller behaviors. Last, data is used to identify seller behaviors.
Moreover, we will comply with the following principles in the entire implementation process. First, the modules are decoupled from each other. For this, the implementation depends on many external systems, with clear boundaries and minimal coupling. Second, each module assumes a single role. The core of the entire system is data streaming. The simpler the system, the more stable it is. We want to reduce maintenance costs and risks.
With this, we hope to achieve two goals. The first goal is to help sellers improve their transaction efficiency, and the second one is to quickly add new scenarios to our buy-sell platform of Xianyu.
The following figure shows the overall architecture used.
The activity module helps a seller to perform corresponding actions and provides corresponding interest points. It is used to:
- Manage activity metadata.
- Provide the query service for the upper layer.
- Synchronize activity information from the lower-layer data synchronization channel.
- Perform disaster recovery calculations.
Although tasks can be synchronized from the lower-layer synchronization channel to the activity module, a mechanism is required to calculate task-level activity data in real time when the channel is abnormal. In fact, the overall system performance degrades when disaster recovery calculation is started.
The task module is used to define encouraged seller behaviors in order to achieve the following:
- Manage metadata, including task definitions and callback after a task is completed.
- Define a task, including the basic elements of the task.
- Perform callback, which synchronizes the task execution result after the task is completed.
- Initialize a task. All data is generated by third-party systems that can be classified into the following parts: the output of the algorithm model, the data analysis, the buyer’s feedback.
- Use a third-party system, which uses a standard interaction protocol for decoupling.
- Ensure data consistency. The task module involves reads and writes by multiple parties. Therefore, eventual data consistency must be ensured.
- Task initialization, update, and query are combined into one-way data streams, which use conditional updates to ensure data consistency. The effect is similar to that of optimistic locking, but the overall system will not encounter performance bottlenecks due to concurrent writes.
Data Synchronization Channel
The data synchronization channel synchronizes data from the task module to the activity module based on activity metadata. Two synchronization modes are supported: full and incremental synchronization.
- Incremental synchronization: In this mode, task data is synchronized in real time, including task initialization and tasks completed by sellers.
- Full synchronization: This mode is generally triggered when activities are added or activity rules are changed. In this mode, tasks are delivered in distributed mode. Full synchronization is completed in distributed mode.
Besides this, idempotence operations are performed. The operations can be repeated without affecting the final result. Full synchronization and incremental synchronization are isolated and do not affect online services.
The data layer needs to converge the behavior data of the platform sellers. Differences in business complexities lead to diversified data sources. First, Standard data is derived from the Omega system developed inhouse for Xianyu. Second, differentiated data is derived from diversified data sources, including log collection, Message Queue (MQ) messages, and persistent storage facilities.
Data is converged based on the four elements of seller behavior in Xianyu we mentioned earlier. After that, the data needs to be normalized and mapped to standard behavior data. Due to the differences in data magnitude and complexity, we use different methods to identify data based on the different complexities of behavioral data.
- Real-time stream processing: Real-time stream processing is used to match simple events. For this, statistics data such as convergence and association is used. Real-time stream processing uses less resources, but is not flexible. It can complete complex events, but involves highly complex code, which is difficult to maintain.
- CEP engine: The CEP engine can be used for cross-event pattern matching for complex events. It provides flexible features and high readability, but uses more resources.
- Persistence: Stateful data must be persistent so that the data can be used for subsequent match calculation.
In addition to the three modules mentioned above, we have also added the following features:
- Benefit perception: To enhance the seller’s perception of interest points, we added a peripheral benefit perception module. Data is loaded from third-party online systems to display the activity effect and stimulate seller behaviors. We think that this will encourage sellers to pay attention to potential sellers that have started conversation with them.
- Group selection: This module is used to deliver activity information to sellers.
- Service layer: This supports standard client protocols, and provides features such as rendering templates, activity queries, and action configurations.
So far, we have implemented similar item association, a traffic distribution model based on seller behavior, specifically online status metric and reply data statistics and real profile pictures.
The test data on bucket-based traffic distribution based on seller behavior shows that transaction conversion has increased by 3%. At present, the development personnel are not required when adding a new scenario, and the new scenario can be quickly implemented in the system as long as the operation personnel explore key behavior data.
The closed loop of seller behavior means constant feedback. The logic of this closed loop requires us to better understand sellers and learn more about how to describe their behaviors. Behavior data allows us to better understand sellers. But in the future we need better seller behavior data and feature vectors to serve as the underlying infrastructure.
Moreover, with a complete seller behavior feature library, we need to find behavior points that are more valuable to sellers and help us provide better guidance to them. Different sellers will benefit from different information, so we must take an individualized approach.
The current benefit feedback mechanism is simple. Increasing a seller’s perception of behavior interest points helps cultivate more attentive and stable sellers. But, really, understanding sellers is only the first step. In the e-commerce field, both customer-to-customer and customer-to-business scenarios are focusing on shopping guide and personalization features. Such increased level of personalization helps maximize the traffic value, which is also full of uncertainties. As such, a definite understanding of sellers is very helpful for matching traffic and performing seller operations.