Yangqing Jia Launches the New-Generation Cloud-Native Data Warehouse

Image for post
Image for post

Relive all product launches in the Alibaba Cloud Summit 2020 at https://www.alibabacloud.com/campaign/summit-live-2020/live-streaming/product-launches

At the Alibaba Cloud Summit on June 9, 2020, Jia Yangqing, Vice President of Alibaba and Senior Researcher of the Alibaba Cloud Computing Platform Division, announced the launch of the new generation of cloud-native data warehouse . Based on the innovative technical architecture, Hologres, the new-generation cloud-native data warehouse supports PB-level data correlation analysis and real-time query, integrating the offline data, real-time data, analytics, and serving.

Image for post
Image for post

The following article is the full text of Jia Yangqing’s speech.

Most of today’s economic activities, data analysis, and services are inseparable from the industrialization of digital technologies and digitalization of various industries. However, we must take these changes step by step. Most enterprises advance cautiously in the process of business and technology upgrading. When we encounter different data analysis and service requirements, we seek a single point system to solve our single point problems. In this process, data silos appear behind a seemingly complete system. The data connectivity and real-time data transmission among silos have become a major problem.

Image for post
Image for post

From the perspective of an enterprise, the business system faces difficulties in data insights, while the system sees the cost of data splitting. As a data warehouse is so important for enterprises, I think we need a top-level design to reconstruct the data warehouse. Today, we will introduce a concept of real-time as a service that integrates offline data, real-time data, analytics, and serving based on Hologres, MaxCompute, and Realtime Compute. If we go back to the most essential requirements of data warehouses, the core issue is not complex. A data warehouse needs to integrate data from multiple sources and incorporate them into a set of storage systems in real-time. Meanwhile, it performs offline, real-time, or interactive analysis, and displays results, and provides services. We used to hear about a concept called Hybrid Transaction and Analytical Process (HTAP.) Transactions consider more specific aspects of data. For example, a database has some indicators, such as read and write performance and security. Today, we see analytics and serving become more integrated. Analytics means that we need to gain insights into the laws of such massive data, thereby providing services. Both data dashboards and operation analysis are processes of displaying service data. To eliminate data silos, we must integrate analysis with serving more closely. We call this mode Hybrid Serving Analytical Processing (HSAP). With data warehouses based on Hologres and MaxCompute, we can connect Hologres with MaxCompute and implement high-performance and low-latency data analysis through Hologres. In addition, we can perform large-scale and low-cost offline computing through MaxCompute. On this basis, we can push the results of data analysis and the data accumulated in real-time to different services, such as data dashboards and operation dashboards.

Image for post
Image for post

Within Alibaba Group, the biggest demand for data was during the Double 11 Global Shopping Festival, when a large amount of data was transferred and business decisions were complex. In 2019, we upgraded the business support system through data warehouses based on Hologres and MaxCompute. On the day of Double 11, a set of systems supported 145 million online queries, which supported complex business analysis and decision-making. These analyses are also supported by 130 million real-time records. With the correct top-level design, performance is not a problem. An entire data warehouse system based on MaxCompute, Realtime Compute, and Hologres can deal with data silos. In the absence of data redundancy, we can simplify the system, reduce costs, and improve data analysis efficiency. We also think open source, community, and ecosystem are crucial in building a data warehouse. When we built Hologres, we adopted a fully compatible open-source ecosystem based on PostgreSQL. Data engineers and upper-layer BI tools can easily and seamlessly connect their existing systems to Hologres and MaxCompute, seamlessly migrating the analysis and serving.

Image for post
Image for post

With real-time as a service and HSAP, we can greatly simplify the design of the data warehouse and build a system integrating offline data, real-time data, analytics, and serving in the entire data lifecycle.

Today, we launched the new generation of cloud-native data warehouse. This solution adopts a system design featuring one set of storage systems to eliminate data silos, multiple computing concepts, and real-time as a service. We also provide DataWorks, Machine Learning Platform for Artificial Intelligence (PAI), and other platforms on Alibaba Cloud. With a complete set of data products, we can provide digital and intelligent applications. We believe that every enterprise will build a data warehouse solutions on the cloud to solve numerous and complicated data problems.

Catch the replay of all product launches in the Alibaba Cloud Summit 2020 at https://www.alibabacloud.com/campaign/summit-live-2020/live-streaming/product-launches

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store