Quick Implementation of Data Lake House Based on MaxCompute

  1. Trend analysis of Lake House
  2. Alibaba Cloud Lake House
  3. Case studies
  4. Lake House demo

A Brief History of Big Data and the Motivation behind Lake House

Data Lakes vs. Data Warehouses

Total Cost of Ownership (TCO) Comparison

The Unification of Data Lakes and Data Warehouses

Alibaba Cloud Lake House

Support for Cloud Native Data Lakes

Better Performance for Data Lake Queries

Ecological Openness

Data Development Platform Unified by DataWorks

Applicable Scenarios of Alibaba Cloud Lake house

Scenario 1: The Migration of Hadoop Clusters to the Cloud through Used Resources

Scenario 2: ETL and Ad-hoc Acceleration in a Data Lake

Scenario 3: Enterprise-level and Cross-platform Unified Big Data Platform

Case Study

Case 1: The Introduction to the Unification of MaxCompute Data Warehouses and Hadoop Data Lakes

  • Huge data synchronization is performed manually, which is a huge workload;
  • The volume of data that needs to be trained is large, which makes it too time-consuming to train data in real time;
  • The original Hive SQL query cannot be reused for newly written SQL data processing query.
  • Lake house can avoid data migration and job migration, ensuring that production operations are seamlessly and flexibly scheduled to MaxCompute clusters and EMR clusters, with performance improved;
  • Encapsulate and build the AI computing middle platform, greatly improving the business support capability of the team.

Case 2: Introduction to the Unification of MaxCompute Data Warehouse and OSS Data Lake

  • The algorithm team wants to focus on the business and algorithms, which requires a highly self-service and all-in-one machine learning platform;
  • Hadoop clusters are shared by multiple teams. The clusters used in Hadoop cannot support innovative businesses with large workloads in a short time due to strict control over the clusters.
  • Connect the new business platform with the original data platform through lake house. Machine Learning Platform for AI on MaxCompute + DataWorks provides agile, all-in-one machine learning model development, training, and model release for customers’ innovative businesses, large-scale computing capability and EAS model release process;
  • It sets up a good example and is quickly copied to other business lines, efficiently supporting the rapid growth of the customer’s business.

Case 3: Introduction to the Unification of MaxCompute Data Warehouse and OSS Data Lake

  • EMR metadata is integrated to DLF with OSS used for unified storage at the underlying layer. Beside, EMR Data Lake and MaxCompute data warehouse are integrated through the lake house so that data and Computing can flow freely between lakes and warehouses;
  • Implement tiered storage of lake house data. The data middle platform stores the intermediate table data generated from dimensional modeling of data lake on MaxCompute, and the modeling result tables are placed in the data lake for EMR or other engines to consume.

Demo of Lake house on Alibaba Cloud

Concluding Remarks

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

4.97K Followers

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com