All You Need to Know About MaxCompute

Key Concepts

Architecture

MaxCompute Features

Data Storage

  • MaxCompute supports large-scale computing and storage and is suitable for storage and computing from the TB level up to the EB level. The same MaxCompute project supports the data scale requirements from entrepreneurial teams to unicorns.
  • MaxCompute features distributed data storage and multi-copy redundancy. For data storage, only operation interfaces for tables are made available and access interfaces for file systems are not provided.
  • MaxCompute uses the self-developed data storage structure and column-oriented table data storage. Data is highly compressed by default. MaxCompute will be compatible with the Ali-ORC storage format in ORC later.
  • Foreign tables are supported. Data stored in OSS and Table Store can be mapped into foreign tables.
  • Storage partitions and buckets are supported.
  • The underlying layer is the Apsara Distributed File System developed by Alibaba itself rather than HDFS. However, you can use HDFS to help you understand the file system structure under a specific table and the task concurrency mechanism.
  • Storage and computing are decoupled. You do not need to unnecessarily increase computing resources simply to handle storage requirements.

Multiple Computational Models

  • It is a self-developed compiler that is characterized by more flexible language feature development, faster iteration, more flexible and efficient syntax and semantics checks.
  • It is a cost-based optimizer that is more intelligent, more powerful, and more suitable for complex queries.
  • LLVM-based code generation makes the execution process more efficient.
  • It supports complex data types (array, map, struct).
  • It supports UDF, UDAF, and UDTF in Java and Python.
  • Syntax: Values, CTE, SEMIJOIN, FROM inversion, Subquery Operations, Set Operations (UNION/INTERSECT/MINUS), SELECT TRANSFORM, User Defined Type, GROUPING SET (CUBE/rollup/GROUPING SET), script running modes, and parameterized view
  • Foreign tables are supported (foreign data sources and StorageHandler supports unstructured data).
  • MapReduce programming interfaces are supported. (MaxCompute provides optimized and reinforced MapReduce for MaxCompute and MapReduce versions that are highly compatible with Hadoop).
  • The file system is not exposed and the input and output are all tables.
  • Jobs are submitted by using the MaxCompute client tool and Dataworks.
  • MaxCompute Graph is a processing framework designed for iterative graph computing. Graph computing jobs use graphs to build models. Graphs are composed of vertices and edges with values.
  • MaxCompute Graph iteratively edits and evolves graphs to obtain analysis results.
  • Typical applications include PageRank, the single-source shortest path algorithm, and the K-means clustering algorithm.
  • Use the Java SDK interface provided by MaxCompute Graph to write graph computing applications and submit tasks by using the jar command in the MaxCompute client tool.
  • PyODPS provides access to ODPS objects such as tables, resources, and functions.
  • It submits SQL through run_sql/execute_sql.
  • PyODPS allows uploading and downloading data by using open_writer, open_reader or native tunnel APIs.
  • PyODPS provides the DataFrame API, which provides interfaces similar to Pandas interfaces and can fully utilize the computing capability of MaxCompute for DataFrame computing.
  • PyODPS DataFrame provides many Pandas-like interfaces, but extends the syntax of these interfaces. For example, the MapReduce API is provided to adapt to the big data environment.
  • map, apply, and map_reduce make it very convenient to write functions and call function methods in the client. Users can invoke third-party libraries such as Pandas, SciPy, scikit-learn, and NLTK.
  • Multiple versions of native Spark jobs: Both Spark1.x and Spark2.x jobs are supported.
  • Experience with open-source systems: It provides Spark-submit (Currently spark-shell and spark-SQL interaction is not supported) and Native Spark WebUI is provided for users to view information.
  • Accessing external data sources like OSS, Table Store, and databases enables more complex ETL. It is supported to process unstructured OSS data.
  • Spark can be used to perform machine learning targeting internal and external MaxCompute data and expand application scenarios.
  • Compatibility with PostgreSQL: MaxCompute Lightning provides JDBC and ODBC interfaces that are compatible with the PostgreSQL protocol. Tools or applications based on PostgreSQL databases can easily be connected to MaxCompute projects by using the default driver. It supports connection and access to mainstream BI and SQL client tools such as Tableau, FineBI, Navicat, and SQL Workbench/J.
  • Significantly improved query performance: MaxCompute Lightning improves the query performance for a certain scale of data. Query results are available in seconds. It supports scenarios such as BI analysis, Ad-hoc, and online services.
  • MaxCompute provides hundreds of built-in machine learning algorithms. Currently, the machine learning capability of MaxCompute is enabled by PAI, which also provides elastic prediction services that supports deep learning frameworks, Notebook development environments, GPU computing resources, and online model deployment. PAI is seamlessly integrated with MaxCompute regarding projects and data.

Table of Comparison

Frequently Asked Questions

Conclusion

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Hiring: Survival Guide in the Digital Future

Deploy CI/CD infrastructure for DevOps Engineers

Introducing C2 Object Storage

Enreach: 2nd round!

If it works, fix it! — Refactoring Is Always Worth It

Failures to Apps

Troubleshooting Google Colab for the Total Newbie

Column and Row operations in Pandas

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

Exploring Key Stores and Public Certificates — JKS

Working with Cosmos DB-Cassandra

How to use code-free Datadog Synthetic Monitoring for simulated API and browser testing

Why Nanoservices Might Legitimately Share an S3 Bucket