Working with Big Data on Alibaba Cloud

5 min readJan 19, 2018

You know Alibaba Cloud can be used to deploy applications. You may be less familiar with its Big Data storage and management options.

In fact, Alibaba offers a range of Big Data solutions. This article outlines them and explains which types of Big Data services on the Alibaba Cloud align with various workloads.

Data Storage

Let’s start with storage, since that is the most fundamental requirement of Big Data. OSS (Object Storage Service) is Alibaba’s high-volume, cloud-based data storage service. It is available for storing extremely large quantities of data of any type, and from any source.

OSS can be used for data that must be accessed frequently (such as multimedia files), as well as for archival and other low-use purposes. It includes tools for migrating large quantities of data to and from the OSS storage system, along with an SDK, and a REST API.

OSS SDK

The SDK includes full interfaces with the major front- and backend website and web-service languages, as well as Android and iOS. SDK commands for these languages and platforms cover a wide range of functions, including object upload, download, and management, complex and sophisticated image processing and manipulation, and web-oriented features, such as static website hosting and access management.

Multimedia and Image Files

OSS is particularly well-suited for such things as handling high volumes of multimedia and image files. It can be used in conjunction with both websites and apps for storage, streaming and other forms of serving, transcoding, and image format conversion. OSS can also be used to provide large volumes of data for rapid download.

OSS, however, is simply one part of Alibaba Cloud’s rich Big Data infrastructure. Storage may be fundamental, but it is what you can do with the stored data that makes all the difference:

Data IDE and MaxCompute

Data IDE is Alibaba Cloud’s overall framework for managing Big Data, and for taking care of such basic functions as scheduling, monitoring, and control of access permissions. It handles much of the underlying architecture, as well as many basic management tasks, allowing you to concentrate on the development and operation of large, data-oriented projects.

Data Processing Tools

Data IDE works closely with MaxCompute, Alibaba’s platform for processing Big Data. MaxCompute includes a variety of tools for analyzing and processing very large volumes of data, including its own version of SQL, graphing and MapReduce functions, and concurrent upload and download functions. It includes an extensive SDK, and a full set of security features.

Working together, Data IDE and MaxCompute allow you to manage, process, and query large amounts of data. Because they simplify many of the processes involved in handling Big Data, they can significantly reduce the time required to mount a large, complex, and data-intensive website. They can also help to reduce the volume and cost of storage and data processing and provide a solid basis for in-depth analytics.

E-MapReduce

Alibaba Cloud also offers E-MapReduce, a very rich framework for managing and processing Big Data, based on Hadoop and Apache Spark. Hadoop and Spark cluster services form the core of E-MapReduce. The advantage of E-MapReduce is that it takes care of many of the low-level tasks required for cluster creation and provisioning, while at the same time providing an integrated framework for managing and using clusters.

Because E-MapReduce is based on Hadoop clustering and Spark cluster-oriented services, you can effectively use the storage and computation space it provides as if it were a self-contained system running on its own host, rather than being standard cloud-computing storage.

E-MapReduce Architecture

Architecturally, E-MapReduce consists of an agent layer at the base, with the HDFS and Tachyon file systems sitting directly above it. Above those sit the full Hadoop ecosystem, along with Spark and a wide variety of Apache tools. The top layer is the web-based user-administration interface, which makes it easy to use and manage the underlying tools and systems.

Full Hadoop/Spark Capabilities — The Easy Way

What this means is that if you can do it using Hadoop, Apache Spark, or their associated tools, you can do it in E-MapReduce — and you can do it much, much more easily than you could if you had to set up and provision Hadoop or Spark from scratch.

Needless to say, E-MapReduce integrates very easily with other Big Data-oriented elements of Alibaba Cloud. It can work with Alibaba Elastic Computing Services (ECS) apps, and it can process data stored in OSS. It can also send data to MaxCompute, and take MaxCompute output for further processing.

E-MapReduce can be used to process and serve massive amounts of data. Its Spark-based features make it particularly suitable for such things as streaming large volumes of data.

The Big Data Picture

What can you do with Alibaba’s Big Data tools and services? E-MapReduce and MaxCompute both provide a very wide range of tools for performing such fundamental Big Data-oriented tasks as rapidly sorting, searching, and analyzing extremely large volumes of data.

You can use Alibaba Cloud’s Big Data features to set up and manage backend services for high-volume, data-intensive websites which provide streaming services, generate large amounts of user upload and download traffic, or that rapidly return search results from massive quantities of data.

You can also use the same features to process and manage large media files, to efficiently handle extremely large databases in situations where rapid retrieval is important, or to deal with the processing and storage requirements of unique or industry-specific streams of high-volume data.

What does Alibaba Cloud do for you when it comes to Big Data? It can give you the tools, the storage, and the services you need to get your Big Data operation up and running exactly the way that you want it to run — quickly, easily, and with a minimum of overhead in terms of time, effort, or expense.

Michael Churchman

Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ’90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. He is a regular Fixate.io contributor.

Reference:

https://www.alibabacloud.com/blog/Working-with-Big-Data-on-Alibaba-Cloud_p253218?spm=a2c41.11162091.0.0