Meet MaxCompute: The AI Platform Bringing Big Data Analysis to the Masses

Image for post
Image for post

Data is the new corporate currency but many businesses are failing to effectively analyze and capitalize on petabytes of information because, quite frankly, they don’t know where to start.

This is where Alibaba Cloud’s MaxCompute[1] can help. It’s an AI-enabled big data processing platform to help enterprises unlock the immense value of their data.

The platform offers a combination of data intelligence services, mainly for batch structural data storage and processing. It’s cheap to use and can process 100PB data in six hours. That’s roughly the same amount of data as 100 million HD movies, or one-third of Facebook’s entire data warehouse.

Let’s look at an example. What do you do with all the data captured from your social media streams? With MaxCompute, you could upload every Facebook like or retweet in a matter of minutes. And, using its machine learning tools, gain insights into how the market responds to your promotions and products.

You could break down this information by campaign or date or even mine user characteristics and spending habits to further optimize and personalize your social media streams.

The Benefits of MaxCompute

MaxCompute is an incredibly low-cost service. Costing just USD $1.44 to sort 1TB of data, the platform set a new low-price record in the 2016 CloudSort Sort Benchmark competition[2].

You can create an ever-expanding ecosystem as project owners, data analysts and developers can work concurrently using MaxCompute. The platform also provides powerful security services and disaster recovery to protect your data.

A single MaxCompute cluster can scale up to 10,000 servers. And your data analysts do not need to adopt a distributed computing model to overcome the limited processing capacities of a single server for big data applications. That’s because MaxCompute uses a distributed model so you can analyze your data without worrying about the service requirements or the underlying model.

With usability and scalability on this scale, MaxCompute is bringing big data analysis to the masses.

Alibaba Cloud launched MaxCompute in Mainland China and Singapore at the start of 2017. In China, the platform has already been used to help ease traffic congestion, diagnose diseases using medical imagery and predict the winner of a singing talent competition.

The MaxCompute service is now available in Hong Kong, Europe and Australia through the Internet, a classic network or VPC. If you’re not located in those regions you can still connect to the service over the Internet.

Image for post
Image for post

How Does MaxCompute Work?

MaxCompute is incredibly easy to learn as it is based on traditional SQL syntax and uses a Java programming interface. It uses a relational DBMS as its primary database model with a simple additional key-value store.

There are three core components: the MaxCompute proprietary TUNNEL function for data uploads and downloads; a combination of MaxCompute SQL, Google’s MapReduce data processing model and a graph function for computing and analysis; and an SDK toolkit for developers.

MaxCompute does not collect data, it only processes it. You can upload offline data into the system, or download offline data from MaxCompute using the TUNNEL data channel. You can only upload and download data in tables.

MaxCompute SQL acts and feels just like a traditional piece of database software — but you are now querying and analyzing terabytes or petabytes of data. It supports the data definition language DDL so you can use the ALTER, CREATE and DROP commands to manage tables and partitions, as well as your traditional SELECT, JOIN, GROUP BY and WHERE clauses.

The MaxCompute SQL syntax is intuitive if you are familiar with standard database operations, though there are small differences, for example, in that MaxCompute SQL does not support transactions, index and UPDATE/DELETE operations.

Using the MapReduce programming interface, you can effectively process your data by splitting the input dataset into independent chunks. These can then be processed by the map tasks in a completely parallel manner. Graph is a set framework for iterative graph computing and effective data modeling.

There are Eclipse plugins for developers and DataHub Services are also available so you can publish and subscribe to real-time data.

This is a powerful tool. Going back to our social media example, imagine if you could optimize your products, prices and promotions for every user on the fly?

Your conversion rates would skyrocket!

[1] MaxCompute is available from Alibaba Cloud at https://www.alibabacloud.com/product/maxcompute

[2] See the Top Results in the “Cloud” section of the latest Sort Benchmark results through a team effort involving Nanjing University and Databricks Inc, running on 394 Alibaba Cloud ECS ecs.n1.large nodes with 8 GB memory, 40GB Ultra Cloud Disk, 4x135GB SSD Cloud Disk and published at http://sortbenchmark.org

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store