All You Need to Know About MaxCompute

Key Concepts


MaxCompute Features

Data Storage

  • MaxCompute supports large-scale computing and storage and is suitable for storage and computing from the TB level up to the EB level. The same MaxCompute project supports the data scale requirements from entrepreneurial teams to unicorns.
  • MaxCompute features distributed data storage and multi-copy redundancy. For data storage, only operation interfaces for tables are made available and access interfaces for file systems are not provided.
  • MaxCompute uses the self-developed data storage structure and column-oriented table data storage. Data is highly compressed by default. MaxCompute will be compatible with the Ali-ORC storage format in ORC later.
  • Foreign tables are supported. Data stored in OSS and Table Store can be mapped into foreign tables.
  • Storage partitions and buckets are supported.
  • The underlying layer is the Apsara Distributed File System developed by Alibaba itself rather than HDFS. However, you can use HDFS to help you understand the file system structure under a specific table and the task concurrency mechanism.
  • Storage and computing are decoupled. You do not need to unnecessarily increase computing resources simply to handle storage requirements.

Multiple Computational Models

  • It is a self-developed compiler that is characterized by more flexible language feature development, faster iteration, more flexible and efficient syntax and semantics checks.
  • It is a cost-based optimizer that is more intelligent, more powerful, and more suitable for complex queries.
  • LLVM-based code generation makes the execution process more efficient.
  • It supports complex data types (array, map, struct).
  • It supports UDF, UDAF, and UDTF in Java and Python.
  • Syntax: Values, CTE, SEMIJOIN, FROM inversion, Subquery Operations, Set Operations (UNION/INTERSECT/MINUS), SELECT TRANSFORM, User Defined Type, GROUPING SET (CUBE/rollup/GROUPING SET), script running modes, and parameterized view
  • Foreign tables are supported (foreign data sources and StorageHandler supports unstructured data).
  • MapReduce programming interfaces are supported. (MaxCompute provides optimized and reinforced MapReduce for MaxCompute and MapReduce versions that are highly compatible with Hadoop).
  • The file system is not exposed and the input and output are all tables.
  • Jobs are submitted by using the MaxCompute client tool and Dataworks.
  • MaxCompute Graph is a processing framework designed for iterative graph computing. Graph computing jobs use graphs to build models. Graphs are composed of vertices and edges with values.
  • MaxCompute Graph iteratively edits and evolves graphs to obtain analysis results.
  • Typical applications include PageRank, the single-source shortest path algorithm, and the K-means clustering algorithm.
  • Use the Java SDK interface provided by MaxCompute Graph to write graph computing applications and submit tasks by using the jar command in the MaxCompute client tool.
  • PyODPS provides access to ODPS objects such as tables, resources, and functions.
  • It submits SQL through run_sql/execute_sql.
  • PyODPS allows uploading and downloading data by using open_writer, open_reader or native tunnel APIs.
  • PyODPS provides the DataFrame API, which provides interfaces similar to Pandas interfaces and can fully utilize the computing capability of MaxCompute for DataFrame computing.
  • PyODPS DataFrame provides many Pandas-like interfaces, but extends the syntax of these interfaces. For example, the MapReduce API is provided to adapt to the big data environment.
  • map, apply, and map_reduce make it very convenient to write functions and call function methods in the client. Users can invoke third-party libraries such as Pandas, SciPy, scikit-learn, and NLTK.
  • Multiple versions of native Spark jobs: Both Spark1.x and Spark2.x jobs are supported.
  • Experience with open-source systems: It provides Spark-submit (Currently spark-shell and spark-SQL interaction is not supported) and Native Spark WebUI is provided for users to view information.
  • Accessing external data sources like OSS, Table Store, and databases enables more complex ETL. It is supported to process unstructured OSS data.
  • Spark can be used to perform machine learning targeting internal and external MaxCompute data and expand application scenarios.
  • Compatibility with PostgreSQL: MaxCompute Lightning provides JDBC and ODBC interfaces that are compatible with the PostgreSQL protocol. Tools or applications based on PostgreSQL databases can easily be connected to MaxCompute projects by using the default driver. It supports connection and access to mainstream BI and SQL client tools such as Tableau, FineBI, Navicat, and SQL Workbench/J.
  • Significantly improved query performance: MaxCompute Lightning improves the query performance for a certain scale of data. Query results are available in seconds. It supports scenarios such as BI analysis, Ad-hoc, and online services.
  • MaxCompute provides hundreds of built-in machine learning algorithms. Currently, the machine learning capability of MaxCompute is enabled by PAI, which also provides elastic prediction services that supports deep learning frameworks, Notebook development environments, GPU computing resources, and online model deployment. PAI is seamlessly integrated with MaxCompute regarding projects and data.

Table of Comparison

Frequently Asked Questions





Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Create Database

What’s the difference between the terms, “African American” and “Black American”?

Java Lambda Expressions

How to convert Windows 2016 Server Evaluation edition to Standard licensed

Deriving Maximum Value from a Multi-Cloud Environment Approach

Top 10 Most Common Ruby on Rails Errors and Their Causes

[P.D.F] Download IP Routing on Cisco IOS, IOS XE, and IOS XR: An Essential Guide to Understanding…

New series of power supplies from Deepcool.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

More from Medium

Failure Modes for distributed applications


Big O notation : Understanding different time complexities

How to make estimates more accurate?

Using Chaos Engineering to Generate Resilient Solutions