Real-Time Marketing Data Analysis Based on Hologres + Flink


During the 2020 Double 11 Global Shopping Festival, Alibaba’s cloud-native real-time data warehouse was first implemented in core data scenarios and reached a new record for the big data platform. This data warehouse is built based on Hologres and Realtime Compute for Apache Flink, and has set a new record for the big data platform. This article focuses on the best practices of Hologres in Taobao’s marketing analysis scenarios. The article also discusses the technical challenges behind the first-time implementation of Apache Flink + Hologres stream-batch unification on the marketing analysis screen during Double 11.

1. Real-Time Marketing Data Analysis

Big promotions are crucial for business operations and user number growth for Taobao business operations. Marketing analysis services are the core data services used for decision-making and operation instructions during big promotions. They cover the analysis works of the whole process before, during, and after the promotions. Different requirements for data timeliness and flexibility must be met in different phases.

  • High Maintenance Costs: As the business volume increases, the original database system cannot support complex and changing application scenarios quickly and flexibly. Conventional Apache HBase, MySQL, and AnalyticDB databases cannot meet all requirements for supporting massive-data, high-concurrency point queries, and OLAP queries. Therefore, when dealing with extremely complex services, multiple databases are needed. As a result, the costs of overall maintenance and dependency become very high.
  • Poor Extensibility: Products with FW framework are constructed with high logical complexity and poor extensibility, and the maintenance cost during operation is high.
  1. A more powerful data warehouse is required to support highly-concurrent writes, and query large amounts of data with quick response time.
  2. The existing product construction logic needs to be simplified, and the product implementation complexity needs to be reduced.

2. Stream-Batch Unification Solution

After deeply analyzing business requirements for data and exploring data models multi-dimensionally, the overall technical framework for marketing analysis reconstruction is finally determined, as shown in the figure below. The core points of the framework include:

  • Data storage-query unification is achieved through Hologres.
  • With the capabilities of FBI, the construction cost is reduced, and the high flexibility of the business, as well as the needs for reports of different roles, are all met.

1) Stream-Batch Unified Technical Framework

The following figure shows the traditional architecture of data warehouses. The core issues of the traditional data warehouse architecture are listed below:

  • Streaming and batch processing logics cannot be reused. Different SQL standards and computing engines lead to separated development for real-time and offline processing. The logics are similar in many cases, but cannot be flexibly converted, resulting in duplicated work.
  • Compute-layer clusters are separated, and there are different peak use durations of real-time and offline resources. This causes low resource utilization, clear peaks and troughs.
  • Secondly, a development platform (Dataphin stream-batch unified development platform) is needed to support logical table-based code development, the personalized configuration of streaming and batch computing modes, and different scheduling policies. Thus, a convenient development-O&M unified system can be constructed.
  • Finally, the storage layer unification based on data modeling specifications is achieved not only in model specifications but also in the storage media. They are seamlessly connected.

2) Implementation of Hologres Stream-Batch Unification

The entire data layer is unified through the stream-batch unification data architecture. Another product is also needed to unify the entire storage layer. This product needs to support high-concurrency writes, real-time queries, and OLAP analysis.

  • The OLAP real-time query of Hologres is used to handle complex and changing marketing methods.
  1. Minimize the Count Distinct Times: The SQL is converted through the multi-layer GROUP BY operation to reduce the cost of COUNT DISTINCT.
  2. Shard Prunning: In some scenarios, some of the primary keys of a table will be targeted during the query. If the key combination in these scenarios is set as the distribution key, shards hit by this query can be determined when the query is processed. Thus, the number of RPC requests is reduced, which is crucial for scenarios with high QPS.
  3. Generate the Optimal Plan: Marketing analysis includes point query or range query based on summarized data, OLAP query based on raw data, and TopN query after single-table aggregation. For different query types, Hologres can generate optimal execution plan based on the collected statistics to ensure the QPS and latency of queries.
  4. Optimize the Writing: The writing operations of marketing analysis are based on the UPDATE operations on column storage tables. In Hologres, the UPDATE operations find the corresponding uniqueid based on the specified primary keys (PK). Then, they find the corresponding record mark for deletion based on the uniqueid and finally insert a new record. In this case, if an incremental segment key can be set, the file can be located quickly based on the segment key during querying. This improves the record locating speed based on the PK, as well as writing performance. The peak writing traffic in the system stress testing of marketing analysis can reach 8 million records per second based on UPDATE operations.
  5. Compact Small Files: Some tables are written infrequently because the key updates during a fixed period of time is relatively constant. In such a case, the memory table would flush small files sometime. However, the default compaction strategy of Hologres does not always work on these files, resulting in a relatively large number of small files. Through in-depth optimization of the compaction parameters and increasing the compaction frequency, small file number is reduced, improving the query performance significantly.

3) FBI Analysis Big Screen

FBI is the preferred data visualization platform in the Alibaba ecosystem. It can quickly create reports for data analysis and support fast access and expansion of multiple data sets. It also supports the construction of various analytical data products through the product construction feature.

  1. Plug-In Exclusive Logic: Some customized capabilities, such as activity parameters, result displaying and hiding, and result sorting, are used for blocks. These capabilities can be transformed into data plug-ins.
  2. FBI’s high security system has been developed and tested. The release control, monitoring and alert, and change notification features have been upgraded with support for 1–5–10.

3. Test-Side Ensuring

To further ensure the quality of the marketing analysis service, the test side strictly compares and verifies data at the DWD layer, the DWS layer, and the product layer. A full range of monitoring is also carried out on the core data of the big promotion.

4. Business Feedback and Value

During 2020 Double 11, the stream-batch unified marketing analysis service based on Apache Flink + Hologres supported an average of hundreds of PV access per capita for thousands of Tmall listings. The service also achieved zero P1/P2 failure. At the same time, several advantages of the service were shown during the activity compared with previous years:

  • Stability: Based on the contiguous high-stability capability of Hologres, both real-time data writing and data reading were highly stable during Double 11. The engineering side also monitored user access and data response efficiency to analyze and resolve business problems in real-time. Moreover, product inspection covered the core data of the service to further ensure stability.
  • High Efficiency: By applying the streaming and batch processing, and the unified connection of Hologres, the service greatly improved demand access efficiency during Double 11. The overall demand capacity during 2020 Double 11 was three times higher than 2019. In addition, the overall timeliness of problem feedback and resolution was improved three to four times overall.

5. Future Prospects

Even though it was tested by 2020 Double 11, it is still necessary to continuously improve the technologies to deal with more complex business scenarios:

  1. The Hologres resource separation, and isolation of read and write resources will be improved to better ensure query SLA. Hologres and MaxCompute interconnection can support metadata interconnection to have high guarantee on the product metadata. Dynamic scaling can be applied to flexibly respond to peak and daily business needs.
  2. FBI tools can improve the version management function of products. The same page supports multi-person editing and enables more efficient product construction.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.