OPPO’s Use of Flink-based Real-time Data Warehouses

1) Evolution of the OPPO Real-time Data Warehouse

1.1 OPPO’s Businesses and Data Scale

1.2 OPPO Data Mid-End

  • The underlying layer is a unified tool system, covering the entire data process of ingestion, governance, development, and consumption.
  • The data warehouse is built based on the tool system and is divided into the raw layer, details layer, summary layer, and application layer, which is a typical data warehouse architecture.
  • The upper layer is the panoramic data system. The panoramic data system integrates all business data as unified data assets, such as ID-Mapping and user tags.
  • Eventually, scenario-driven data products and services are required to apply data to businesses.

1.3 Construction of OPPO Offline Data Warehouses

1.4 Demand for Real-time Data Warehouses

1.5 Smooth Migration from Offline Data Warehouses to Real-time Data Warehouses

1.6 Construction of OPPO Real-time Data Warehouses

2) Flink SQL-based Extension

2.1 Why Use Flink SQL?

2.2 Web-based Development IDE

2.3 AthenaX: RESTful SQL Manager

  • For SQL job submission, AthenaX provides a job abstraction to encapsulate the information such as the SQL statements to be executed and the job resources. All jobs are hosted by a JobStore, which regularly matches the running applications in YARN. If any job does not match the applications, the JobStore submits the corresponding job to YARN.
  • The core of metadata management is the injection of external databases and tables into Flink so that they can be identified in SQL. Flink reserves the capability of connecting to external metadata, providing the ExternalCatalog and ExternalCatalogTable abstractions. Then, AthenaX encapsulates a TableCatalog and extends it at the API layer. When submitting an SQL job, AthenaX automatically registers the TableCatalog with Flink, calls the Flink SQL API to compile the SQL into a Flink-executable unit JobGraph, and submits it to YARN to generate a new application.

2.4 Registering Databases and Tables in Flink SQL

2.5 Connection Between Flink SQL and External Data Sources

2.6 Real-time Tables — Dimension Table Association

2.7 UDF-based Dimension Table Association

2.8 SQL Conversion-based Dimension Table Association

3) Cases of Real-time Data Warehouse Creation

3.1 Real-time ETL Splitting

3.2 Real-time Metric Statistics

3.3 Real-time Tag Import

4) Thoughts and Prospects for the Future

4.1 End-to-end Real-time Stream Processing

4.2 Kinship Analysis for Real-time Streams

4.3 Integration of Offline and Real-time Data Warehouses

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com