Drilling into Big Data — A Gold Mine of Information (1)

By Priyankaa Arunachalam, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud’s incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

The volume of data generated every day is a mystery as it is increasing continually at a rapid rate. Although data is everywhere, the intelligence that we can glean from it matters more. These large volumes of data is what we call “Big Data”. Organizations generate and gather huge volumes of data believing that this data might help them in advancing their products and improving their services. For example, a shop may have its customer information, stock details, purchase history, and website visits.

Often times, organizations store these data for regular business activities but fail to use it for further Analytics and Business Relationships. This data which is unanalyzed and left unused is what we call “Dark Data”.

“Big Data is indeed a buzzword, but it is one that is frankly under-hyped,” Ginni Rometty

The problem of untangling insights from data obtained from multiple sources has been around from the day when software applications were found. This is normally time consuming and becomes obsolete for any form of decision making with the data moving so fast. The main aim of this blog series is to make effective use of big data and extend the use of business intelligence to decipher insights quickly and accurately from raw enterprise data on Alibaba Cloud.

What Is Big Data?

In the simplest terms, when the data you have is too large to be stored and analyzed by traditional databases and processing tools, then it is “Big Data”. If you have heard about the 3Vs of big data, then it is simple to understand the underlying definition of big data.

Big Data and Analytics

Every individual and organization has data in one form or another, which they tried managing using spreadsheets, Word documents, and databases. With emerging technologies, the size and variety of data is increasing day by day, and it is no longer possible to analyze the data through traditional means.

The most important aspect of big data analytics is understanding your data. A good way to do this is to ask yourself these questions:

Before exploring Alibaba Cloud’s E-MapReduce, in this article we will target answering the above listed questions to get started with big data.

Data Sources and Types

Data is typically generated when a user interact with a physical device, software, or system. These interactions can be classified into three types:

For most enterprises, data can be categorized into the following types.

Big Data Ecosystem


Whenever we talk about big data, it is not uncommon to hear the phrase Hadoop.

Hadoop is an open source framework that manages distributed storage and data processing for big data applications running in clusters. It is mainly used for batch processing. The core parts of Apache Hadoop are

Since data is large, Hadoop splits the files into blocks and distributes them across nodes in a cluster, which means every node has a copy of the data.

How to Get Data into a Big Data Environment?

Where to Store the Data?

How to Process the Data?

Data Analytics and Business Intelligence Tools

Now that we have figured out how to collect, store and process the data, we need some tool for visualizing the data to make business intelligence possible. There are various business intelligence tools which can add value to big data like Alibaba Cloud’s DataV and QuickBI.

Resource Management and Scheduling

Apart from this main cycle, we will also be focusing on some Resource Management tools like

Other scheduling tools like Oozie, Azkaban, Cron and Luigi which plays a major role in scheduling the Hadoop and Sqoop jobs when you have ’n’ number of tasks listed.

Big Data in Today’s Business

At the end of the day, it’s up to organizations to use all these data to create valuable insightsand transform their businesses. Every organization has its own data in huge volumes; the more efficient the data is used, the more potential the company has to grow. Business insights produced by this entire play can be utilized by organizations to increase their efficiency and make better decisions — a better way to outsmart their peers and competitors in the market.

In the next article, we will show you how to build a big data environment on Alibaba Cloud with Object Storage Service and E-MapReduce.


Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store