Integrating BI with SaaS-based Cloud Data Warehouses
This article, written by Haiqing, an Alibaba Cloud technical expert, introduces SaaS-based Cloud Data Warehouses integrated with business intelligence (BI). It includes the overview of cloud data warehouses, scenarios and trends of BI, features based on MaxCompute-based cloud data warehouses and BI, and practices.
Overview of Cloud Data Warehouses
This article aims to explore the integration of SaaS-based cloud data warehouses with business intelligence (BI).
Let’s jump right in by giving an overview of cloud data warehouses. It is predicted that by 2025, global data volume will increase to 175 ZB and Chinese data volume to 48.6 ZB. With a sharp increase in data volume, the BI market size is also growing. It is predicted that by 2023, BI software market in China will grow at a compound annual growth rate of 32%. Cloud computing accelerates its growing speed. In the fourth quarter of 2019, the growth rate of the Chinese cloud computing market is 66.9%.
Enterprises are able to create and use data warehouse services in minutes through cloud data warehouses, such as MaxCompute. It allows them to focus on their business and gain business insights quickly by processing, mining, and analyzing large amounts of data at a lower cost. A cloud data warehouse has four major features, that is, large-scale data analysis, high performance, elastic scaling, and low cost.
Scenarios and Trends of BI
BI is an information system established to provide operational data of decision analysis. As our society develops and data volumes explode, enterprises are hoping to get more value from the data for making better decisions. Furthermore, BI provides benefits for enterprises in various areas such as refined operations, customer relationship maintenance, and cost control.
The following is the main process of establishing an information system by BI. The first is data access, which integrates various internal and external enterprise data. Then it comes a data preparation stage that performs data extraction, transformation and loading (ETL). The next step is to analyze data and finally submits these results for decision-making. These data can be used to refine operations, maintain customer relationships, and control costs.
The surge in data volume leads to rapid business growth and a wide range of analysis demands. In addition to diversified analysis, real-time analysis demands are also required, such as second-level instant queries. A large amount of data increases the importance of data security compliance. Therefore, it is necessary to quickly integrate multi-system data to achieve information transparency, and build a unified and easy-to-use visual analysis platform to improve reporting efficiency. The demands mentioned above have become new trends in BI systems.
Features of MaxCompute-based Cloud Data Warehouse and BI
MaxCompute, previously known as Open Data Processing Service (ODPS), is a big data computing service. It provides flexible, fully-managed, high-performance, cost-effective and secure solutions for petabyte-level data warehouses to help process data economically and efficiently.
The following figure shows the basic architecture of a MaxCompute-based cloud data warehouse. The underlying cluster is built by MaxCompute itself without awareness by users. The upper are many computing engines. Above engines, there are various APIs and a DataWorks that deeply integrates an all-in-one intelligent cloud R&D platform for big data. With the cloud data warehouse system, data can be prepared and then cleansed, processed, and analyzed before consumption.
In conclusion, the MaxCompute-based cloud data warehouse has various characterstics. First, it is a ready-to-use and online service with platform-free O&M and low total cost. Second, the ultimate elasticity helps deal with rapid changes in business scale without capacity planning.
Third, the MaxCompute-based cloud data warehouse is easy-to-use and provides multi-functional computing services, which supports multiple computing models, various data channels, and federated computing with external data sources. Fourth, with enterprise-level security, the warehouse provides a multi-tenant security guarantee mechanism, fine-grained authorization, data encryption and data masking, as well as data backup and restoration. Fifth, the ecological integration of the warehouse supports multiple data sources, ecological tools and standards.
So how does Alibaba Cloud products connect to BI tools through a MaxCompute-based cloud data warehouse? MaxCompute is a storage and computing service that, together with DataWorks, a data development platform, forms an offline cloud data warehouse. On the basis of these features, Quick BI of Alibaba Cloud has been integrated, which is a report analysis tool to directly connects MaxCompute data tables.
Some third-party tools are also supported to use, including FineReport and Tableau. In addition, Java Database Connectivity (JDBC) is also supported in the ecosystem. There are some enterprises and customers who have more diversified and personalized demands for BI, while the existing connection tools may not support them. Thus, connection through an SDK can achieve a BI information platform that integrates with a MaxCompute-based cloud data warehouse.
Moreover, the MaxCompute-based offline data warehouse can achieve a high-performance and low-latency analysis query. It can directly read offline data warehouses and supports various analysis queries, including simple queries, complex queries, point queries, and federated queries. Meanwhile, an interactive analytics can also be realized by the integration with MaxCompute and Hologres from multiple data sources at the underlying layer of the warehouse. By doing so, under the big data ecosystem, it can be seamlessly interconnected with report analysis tools such as Quick BI, Tableau, and FineReport. Therefore, an information platform of an enterprise through the combination above can be realized quickly.
Before we conclude, let’s take a look at several best practices below.
For the new retail industry, its demands are based on Hadoop open-source ecosystem. Under this ecosystem, the maintenance costs of software and hardware are high with continuous stability issues, seriously affecting business operation analysis for the sector. With an outbreak of online business and serious backlogs of demands, an overall solution is expected to quickly and flexibly support technical extensions required for business development.
To meet these requirements, Alibaba Cloud’s Quick BI is directly applied to achieve rapid intelligent data transformation for embracing the new retail sector and reduce the total cost of ownership (TCO). Moreover, better reliance on the cloud ecosystem realizes a closed loop of data assets. As a result, the development efficiency of data business for the new retail sector has been improved based on the integration with MaxCompute and DataWorks.
Another practice is in the new finance. Strong demand for security control and a comprehensive system for security management with personalized security requirements are needed to process financial business data. In addition, rapid business development requires a data middle platform system with quick construction, low cost, and scaling in seconds. Therefore, ready-to-use MaxCompute applications meet the data security requirements during a security audit, shorten the demand response time, and meet personalized data security requirements.