Integrating Data Banks with SaaS-based Cloud Data Warehouses
By Zhiqiang Long
This article explains how MaxCompute, a SaaS-based cloud data warehouse, powers the SaaS-based cloud strategy of data banks and all-in-one open data scenarios.
Cloud Data Warehouses
This part describes the benefits and solutions of cloud data warehouses such as MaxCompute.
SaaS-based enterprise-level cloud data warehouses are applicable to scenarios such as:
- Advertising for user tag computing and analysis
- Business operations for business metric computing and query
- Construction of data warehouses for various industries and scalable big data computing and storage on the cloud.
Advantages of Cloud Data Warehouses
-Ultimate elasticity in cloud native: A cloud native design with serverless architecture supports auto scaling within seconds for large-scale elastic loads;
-Easy-to-use and multifunctional computing: Preset multiple computing models and data channels are ready to use;
-Enterprise-level platform service: Open ecosystem that provides enterprise-level security management capabilities is supported;
-Seamlessly integrate with various big data service of Alibaba Cloud;
-Security: Effective security control is in multitenancy environments;
-Large-scale clusters with high performance and stable comprehensive procedures are verified in Double 11.
Recommended combinations including BI analysis scenarios and machine learning scenarios are the integration of MaxCompute, Hologres, Flink, DataWorks, Quick BI, and the integration of MaxCompute, PAI, DataWorks, respectively.
The following figure shows the product solution of MaxCompute computing power resources. The first solution is a subscription. This solution meets routine business demands and maintains stable expenses. Additionally, it supports preferential jobs for stable output of key tasks and the purchase of storage and computing resource packages.
The second solution is on-demand usage. A serverless architecture with large-scale storage and scalability is adopted, automatically meeting business requirements and perfectly matching the high-speed business changes in a pay-as-you-go manner.
The third solution consists of multiple computing resources. The integration of the subscription with on-demand elastic resources allows for a more balanced resource solution between costs and performance.
The fourth solution is the preemption of idle resources. Non-reserved computing resources that preempt and use the services’ free computing resources at a 74% reduction in price from the standard subscription.
So the question is, how can cloud-based big data services can effectively secure the data and services of enterprises in the face of frequent security issues? MaxCompute provides comprehensive and multi-layer security management capabilities for continuous data and service protection on the cloud. It consists of three parts including the MaxCompute security ecosystem, MaxCompute system security, and infrastructure security.
The solution of the Apsara big data platform applies to data-based operations in Internet industries like e-commerce, games, and social networks. It can be used in scenarios such as intelligent recommendation, log collection and analysis, user profiling, data governance, business dashboards, and search.
Apsara big data platform is a best practice for Alibaba’s big data platform with advantages including advanced technologies, cost reduction and efficiency improvement, and high value-added business benefits. Moreover, this platform covers various products such as Log Service (SLS), Data Transmission Service (DTS), DataHub, Realtime Compute for Apache Flink, Hologres, MaxCompute, DataWorks, Quick BI, DataV, Elasticsearch and, Platform of Artificial Intelligence (PAI).
This part describes the definition and industry applications of data banks.
Data banks help to realize the revitalization of asset operation and data transactions and release data value by aggregating internal and external data and then integrating and sharing them. By doing so, data banks build a data asset realization trading platform for enterprises, industries, the ecosystem and society. Data banks are designed to maximize the commercial value of data through data integration, sharing, and transactions.
The service scope includes data transactions and data increment. The data transaction provides data asset display, API transmission, and data transaction services to connect supply and demand and achieve data value realization. The data increment enhances data connotation and improves data value through internal and external data integration and in-depth mining.
Data integration, transaction realization, and in-depth mining are features of the data bank, maximizing the data value and empower industrial development. Specifically, it equips with data asset revitalization, data value improvement, industry development empowerment, and three major data services, as shown in the following figure.
This is the industry application architecture, namely Umeng. The specific architecture is shown in the following figure.
3 MaxCompute and Umeng Data Bank
This part describes practices of the integration of MaxCompute with data bank (Umeng).
Typical thematic data packages and data sources consists of three parts, including statistical analysis, developer tools, and marketing growth.
“One-click” mode in product functions and values is realized in Umeng. The integrated consumption experience consists of three parts. First, thematic data packages perform daily high-performance collection and processing of large amounts of data and automatic production of theme data packages for applications, websites, mini programs, advertisements, and push. Second, it supports a one-click data subscription for seamlessly interconnection to Maxcompute and DataWorks. The third part is theme analysis templates and self-service analysis. With preset analysis templates and pull-type self-service analysis by Umeng, business staff can complete analysis independently.
Umeng brings a better user experience by integrating with MaxCompute, as shown in the following figure. From account login to application configuration, it is now more intelligent and convenient than before.
Multi-terminal and multi-topic detailed data and metric data help developers to build a private domain data system. Benefiting from metric data, Umeng shares more than 9 years of industry experience with developers, including dashboard display of real-time metric data, and metric data analysis and monitoring in multiple dimensions. Detailed data is available, helping developers with integration and self-service analysis of business data. The specific services are real-time channel return on investment (ROI) analysis, release, use and conversion of advertisements, user hierarchy operation, and real-time recommendation service.
Seamless interconnection to cloud data warehouses of Umeng data bank provides developers with open experience of one-click data model systems. Cloud data warehouses for developers provide cost-effective interactive query services and are compatible with heterogeneous data sources for queries and analysis. Therefore, cloud data warehouses provide fast and fully-managed data warehouse solutions for petabyte-level data warehouses and economical and efficient batch analysis of large amounts of data.
This part describes two successful cases of the integration of MaxCompute with Umeng.
Case Study 1: A local lifestyle industry customer requires data-based business and visual data.
The customer is a smart community service platform in the local life industry.
Its pain points include low data-based operations, scattered data, and a long implementation period of personnel data requirements.
There are three procedures of the solution. The first step is a standardized multi-terminal data collection, which uses the tracking solution based on business requirements to carry out multi-terminal data collection from the applications, HTML5 and mini programs. Then, through subscription return of real-time and offline data, deliver the collection data from the Umeng unified extract-transform-load (ETL) service to customers real-time SLS and offline DLA. The last is the design and development of data reports. Offline data is automatically connected to qualified business income (QBI) according to the specific business construction demands of business analysis monitoring besides except for four preset versions.
Consequently, the data-based business allows that multi-terminal collected behavior data can be added to the data warehouse system. Another result is visual data. Dashboard for daily data monitoring allows business personnel to quickly view the results of product iterations and operation actions.
Case Study 2: A gaming industry customer requires multi-source data integration.
The customer is an independent game studio.
Its pain points include the separation of application behaviour data and backend business data.
The solution consists of three processes as well. The first is to use the game industry tracking solution to carry out data collection on the application for multiple types of user identification IDs. Then, data such as user payment and advertising revenue stored in other cloud vendors is migrated to Alibaba Cloud. Finally, deliver collected behaviour data to Alibaba Cloud database with one click and integrate the data through the unique user identification IDs.
As a consequence, data integration analysis is achieved. The life cycle value of users can be calculated and the payback period and the placement of preferred advertisements can be determined by combining with retained user behaviour and revenue data.