Analyses in this article are based on Now Tech: Cloud Data Warehouse, Q1 2018 (Published by Noel Yuhanna, March 13, 2018). The views and opinions expressed herein are those of the author.
On March 13, 2018, Forrester issued the Now Tech: Cloud Data Warehouse Q1 2018 report. In this report, Forrester comprehensively assessed Cloud Data Warehouses (CDWs) in aspects such as main features, regional performance, market segmentation, and customers.
Alibaba Cloud, AWS, Google and Microsoft are selected as the four global first-tier CDW service providers. Alibaba Cloud DataWorks and MaxCompute are the only products from a Chinese company recognized in the report.
In this report, Forrester highlighted four core CDW features:
- Flexible deployment
CDWs are expected to have several flexible deployment modes. For small enterprises, CDWs should provide the online multi-tenant mode to allow these customers to quickly mobilize computing resources and implement data warehouse deployment in just several minutes. For medium and large enterprises, CDWs should support the exclusive or local deployment mode to provide robust computing performance and absolute security as well as leave out technical details of high complexity
- Efficient data migration to cloud
For customers that have not yet migrated their data warehouses to cloud or customers that adopt online and offline hybrid architectures, CDWs should provide a fast and low-cost approach to help users implement data collection.
- Diverse analysis methods
CDWs should support multiple technical means to help users get desired data processing capabilities in various business scenarios.
- Excellent security
CDWs should provide security in various aspects, including data encryption, auditing, data desensitization and access control.
Before analyzing DataWorks, we will first take a quick look at its role in the Alibaba Cloud CDW service system and its product architecture.
Among a variety of Alibaba Cloud products, DataWorks and MaxCompute make up the core of CDW service capabilities. As a storage computing engine, MaxCompute is responsible for supporting the IaaS layer and provides users with numerous and reliable big data table storage and SQL execution capability. However, MaxCompute alone cannot meet data processing requirements. Data development, data integration and other CDW services are also required to empower customers with big data. To this end, DataWorks provides a relatively complete solution.
Specifically, DataWorks includes 8 major modules:
- Data integration: Integrate heterogeneous data to collect numerous data from various source systems on big data cloud platforms
- Data development: Data warehouse design and ETL development
- O&M monitoring: O&M monitoring over jobs in the ETL process
- Real-time analytics: Real-time data exploration and analysis
- Data asset management: Metadata management, data map, data lineage, data asset graph, etc.
- Data quality: The system for data quality control, monitoring, verification and assessment
- Data security: data permission management, classified data marking, data desensitization and data audits
- Data service: data sharing, data switching and data API services
This Forrester report gives lengthy explanation of the necessity of multiple deployment modes, and includes the comparison among CDWs from several service providers. DataWorks is one of the first-tier products that provide multiple deployment modes.
Serving as the core of the Alibaba Group’s data middleware system, DataWorks has been used to support business operations in enterprises like Alibaba Group, Ant Financial, and Cainiao since 2009. If you’ve used data services provided by Taobao, Tmall, Ant Financial, and other companies, you may have indirectly used the computing service provided by DataWorks.
DataWorks also supports private cloud. As an important empowering means of big data, DataWorks is utilized in Alibaba Cloud’s private cloud solutions including Apsara Enterprise. Since 2015, DataWorks has been providing support for important enterprise and government projects including the Alibaba Cloud ET City Brain and “Easy municipal service access”.
With flexible deployment modes, DataWorks can meet a wide variety of customers’ needs. For small enterprises, public cloud solutions can be used flexibly to provide services and support; for medium and large enterprises, private cloud or hybrid cloud solutions can fully meet customers’ needs.
Efficient Data Migration to the Cloud
It is obvious that efficient data integration methods can significantly facilitate the migration of enterprise data to cloud. During the initial migration stage, enterprises need to quickly and securely migrate their data assets to cloud; during the stage of continuous business operations, enterprises need to input various kinds of data into CDWs and then output processed data from CDWs to individual business units.
The Data Integration feature of DataWorks can be used to read/write multiple data sources, including relational databases, NoSQL databases, big data databases and text storage (FTP), uniformly check data resources in data sources, and synchronize and integrate heterogeneous data sources in complex network environments. As to scheduling a specific import task, DataWorks supports batch synchronization, full synchronization and incremental synchronization of offline data. Users can specify a custom synchronization time by minute, day, hour, week, or month.
In addition, the Data Integration feature of DataWorks provides data stream control to manage data stream behavior in dirty data, data velocity and number of concurrent threads, leading to all-round user cost reduction and lean management.
Diverse Analysis Methods
DataWorks provides powerful data development IDEs and supports visual editing of SQL code, integration tasks and business flow DAG graphs. Multi-user online cooperation and task script version management can meet practical needs of enterprise-level data development. In addition to the offline task processing feature, DataWorks provides the lightweight “Analytics Workbench” tool to fully utilize the computing capacity of MaxCompute and meet users’ instant data analysis needs.
It is reported that updates have recently been made to the drag-and-drop business flow editing feature in DataWorks to further improve user experience and provide a better data development IDE.
Sensitive data protection requires even better compliance with the industry standards and data privacy laws and regulations. Security is the top priority of DataWorks. DataWorks provides data security modules and implements all-round data security using the following security protection means:
- Multi-tenant isolation
DataWorks has its own multi-tenant permission model. Tenants can apply for resource quotas on demand and manage their own resources; tenants can also manage their own data, permissions, users and roles independently from each other to ensure data security.
- Data security level setting
Data security levels allow users to discover and locate sensitive data, and see the sensitive data distribution on data resource platforms. Auto-discover sensitive data based on specified insensitive data types and classify insensitive data. Appropriate security rules are applied based on secret levels such as Top Secret, Confidential and General.
- Data access audit
DataWorks will strictly examine privileged users’ access, including access time, executed operations and execution order. Recording and auditing privileged users’ access can ensure that appropriate operations are performed at the proper time by these privileged users, and check if abnormal operations are made, to further improve the security of data systems.
- Data desensitization
When failing to decide whether some users, access addresses, or even fields are distrustful or not, DataWorks will focus on data content itself, identify sensitive information points and block dynamic access to this information to ensure data security.
DataWorks has received a third-level information security certificate issued by the Ministry of Public Security.
With “Internet Plus” further applied in different industries, there is an increasing need for enterprises to manage, process and employ their data assets. Internet companies can quickly use their big data processing capability to meet other enterprises’ needs. That also explains why these four cloud service providers, instead of long-established data warehouse companies like Oracle and IBM, are listed in the Forrester report as first-tier CDW providers.
Thanks to years of data leveraging in Alibaba Cloud, DataWorks can fully meet enterprise-level requirements in deployment modes, data integration, analysis means, and data security.
It is said that DataWorks will continue to provide more advanced data management ideas, including real-time data integration and data asset analysis. DataWorks combines cloud computing with data warehouse management methodology to implement persistent innovations and create “platforms most suitable for big data warehouse development”. That is another reason why DataWorks is listed in this Forrester’s CDW report.
To learn more about the Big Data capabilities of Alibaba Cloud, read the Forrester report on MaxCompute.