A Deep Dive into How the Search Engines Alibaba Developed Work

11.11 Big Sale for Cloud. Get unbeatable offers with up to 90% off on cloud servers and up to a $300 rebate for all products! Click here to learn more.

By Qing Gang.

Alibaba has developed and put into motion several different search engines on its various e-commerce platforms over the past 10 plus years. These search engines are the culmination of much technological and commercial value, which have been generated over the past decade. Behind many of search capabilities and scenarios of 1688.com, Alibaba’s largest online wholesale marketplace in China, are the many capabilities of the search middle end employed in the background. In fact, there’s a lot to say about the middle end and other technologies you cannot see on the surface.

In this article, we’re going to explore the entire process behind searches, including all the related technologies, made using the primary search engine of 1688.com.

The Architecture behind 1688.com’s Search Engine

Let’s Talk about TisPlus

Data sources may fail to be produced occasionally during routine maintenance. This is most often because access permissions for the data source table expiring or because of zk data jitters. In terms of performance, the execution time of dumping is reduced after the Blink Batch model is introduced by the Alibaba search middle-end team. The following table lists the specific indicators using the Buyoffer engine as an example.

On the TisPlus platform, you can start offline dumping as instructed in the following figure.

The following figure shows an example of the directed acyclic graph (DAG) of data sources.

The below sections will explore the data source processing steps for offline dumping, which involve Bahamut, MaaT, and the data output.

Bahamut for Data Source Graph Processing

  1. Data input: datasource (where Taobao Distribute Data Layer (TDDL) and MaxCompute are supported)
  2. KV input: HbaseKV (HBase data table)
  3. Data processing: Rename (data field renaming), DimTrans (one-to-many data aggregation), Functions (simple field processing), Selector (field selection), UDTF (data logic processing), Merge (data source aggregation), and Join (left join).
  4. Data output: HA3 (HDFS or Swift)

The following figure shows how data source processing works.

Next, what happens between the Bahamut and Blink stages is demonstrated in the following figure.

To make clear, this is what all goes on. Bahamut splits the task and throws it to the JobManager to convert the logical node to a physical node. After producing several nodes, Bahamut merges these nodes into a complete SQL statement. For example, as shown in the above figure, Kratos_SQL is a complete SQL statement for incremental Join, and is submitted with resource files by using BayesSDK. In addition, the platform provides a weak-personalized configuration function that allows users to control parameters such as the concurrency, node memory capacity, and allocated CPUs of a specific task.

The MaaT Distributed Process Scheduling System

A comparison between Airflow (which MaaT is based on) and other workflow systems is shown in the following table.

Next, the following figure shows the Maat scheduling page for the Feed engine.

When a task fails, you can “set specified steps to fail” on this page and re-execute all tasks, or find the cause of the task failure by viewing the log entries for a specific step.

HA3 Doc for Data Output

"1649992010": [
"data": "hdfs://xxx/search4test_st3_7u/full", // hdfs路径
"swift_start_timestamp": "1531271322", //描述了今天增量的时间起点
"swift_topic": "bahamut_ha3_topic_search4test_st3_7u_1",
"swift_zk": "zfs://xxx/swift/swift_hippo_et2",
"table_name": "search4test_st3_7u", // HA3 table name,目前与应用名称一样
"version": "20190920090800" // 数据产出的时间

Let’s Talk about Suez

The following figure shows the logic for building an offline table on Suez.

Next, the following figure shows the logic for providing online services on Suez.

And last, the below sections describes the offline service (Build Service) and the online service (HA3).

Build Service for Indexing

  • admin: This role controls the overall building process, switches between the full state and the incremental state, initiates regular tasks, and responds to control requests from users.
  • processor: This processes data and converts users’ original documents to lightweight buildable documents.
  • builder: It builds indexes.
  • merge: It consolidates indexes.
  • rtBuilder: It builds online indexes in real time.

The roles admin, processor, builder, and merger run on Hippo as binary programs, and the rtBuilder role is provided for online services as a library.

A generationid is generated in a complete full or incremental process. This generation goes through the following steps in order: full, builder full, merge full, process inc, builder inc, and merge inc. After the inc step, builder inc and merger inc appear alternately. A “build too slow” error may occur before the HA3 upgrade in 1688.com when faulty nodes are allocated or the engine may get stuck at builder inc or merger inc.

HA3 for Online Search Service

The primary search engine in 1688.com consists of QRS, Searcher, and Summary.

  • QRS parses and verifies input queries, forwards the verified queries to the corresponding Searcher, collects and merges the results returned by Searcher, processes the results, and returns the processed results to the user. You can also intervene in the merging rule by developing a merger plug-in.
  • Searcher can be a document recall service (Searcher), a document scoring and ranking service (Ranker), or a document summary service (Summary).
  • Searcher and Summary are separated in the primary search engine in 1688.com. The Summary cluster provides only the service for obtaining product details.

Devices such as QRS, Searcher, and Summary are mounted to a CM2 (a name discovery server) to provide services. For example, QRS supports an external CM2 and can provide services to callers such as SP. Searcher and Summary support an internal CM2 and can receive requests from QRS and provide services such as recalling, sorting, and details extraction.

A query service for the caller needs to go through the following steps in order: QRS, query analysis, seek, filter, rank (rough ranking), Agg, ReRank (refined ranking), ExtraRank (final ranking), merger, and Summary (obtaining details). The following figure shows the process.

ReRank and ExtraRank are implemented by the Hobbit plug-in and the Hobbit-based war horse plug-in. The service provider can re-develop the features of the war horse plug-in as required and specify a weight for each feature to obtain the final product score.

Let’s Talk about Drogo

The following figure shows the deployment of major service platforms on the search link in 1688.com.

Related Reading

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.