Deciphering Data to Uncover Hidden Insights — Data Modeling

9 min readOct 23, 2018

By Ranjith Udayakumar, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud’s incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

This multi-part article talks about how to collect data, wrangle the data, ingest the data, model the data, and visualize the data from three viewpoints (conceptual, practical, and best practice).

In the first article of this series, we have seen how to understand data conceptually through an example from the Banking, Financial services and Insurance (BFSI) domain. In the second article, we have learned how to wrangle the data (i.e. cleaning the data) according to your business scenario to Alibaba Cloud Quick BI.

In this article, I will share how to model the data according to your business scenario in Quick BI. We will be using Quick BI for this tutorial, so please ensure that you have registered for an Alibaba Cloud account. If you haven’t, sign up for a free account through this link.

Model the Data (Conceptual)

A data model for one line of business is hardly appropriate for another line of business. Often times, the needs of different businesses, or even departments, do not closely align together. Because of this, understanding and analyzing how an organization should collect, update, and store data becomes a critical problem.

In the previous articles, we have seen how to collect, cleanse, and organize our data. However, we’re still missing an essential part of big data analysis — data modeling. Data modeling has become a critical skill for the business analysts who are involved with discovering, analyzing, and specifying changes to how software systems create and maintain information. To better understand data modeling, let’s quickly recap the concepts of OLTP and OLAP.

What Is OLTP?

OLTP stands for Online Transactional Processing and is designed to serve as a persistent state store for front-end applications. They can quickly look up transactional procedures like INSERT, UPDATE, or DELETE.

Some common tasks asked of OLTP systems include:

Finding the age of the current user when given a user ID.
Update the billing addresses for a set of clients.
Delete the users who are not accessing the page more than 1 year.

These types of problems require a system that can look up and update one or more columns within one or many rows. The strength of OLTPs is that they support fast writes. A typical workload for OLTP is both frequent reads and writes, but the reads tend to be more of looking up a specific value rather than scanning all values to compute an aggregate.

Common OLTP systems are:

MySQL
PostgreSQL

Alibaba Cloud provides the ApsaraDB for RDS service for OLTP systems, including both MySQL and PostgreSQL databases.

What Is OLAP?

In contrast to an OLTP database, an OLAP database is designed to process large datasets quickly to answer questions about data.

Common use cases for an OLAP database are:

What’s the average cost spent on maintenance?
What’s the user’s overall active hours?

An OLAP database is optimized for scanning and performing computations across many rows of data for one or multiple columns. Since OLAP is optimized for analyzing data, basic transactional procedures like writes or updates tend to be done in infrequent batches, typically once a day or an hour. OLAP shines when it comes to reads and analytical calculations like aggregation. Several well-known OLAP systems are:

AnalyticDB
Teradata

Alibaba Cloud provides OLAP services with HybridDB and DataWorks.

What is Data Modeling

The term data model refers to two very different but related concepts: a description of data structure and the way data are organized.

Types of Data Modeling

Conceptual Data Models: High-level, static business structures and concepts.
Logical Data Models (LDMs): Entity types, data attributes and relationships between entities.
Physical Data Models (PDMs): The internal schema database design.

Data Modeling in Business Intelligence

Data modeling is the key to success in Business Intelligence (BI). It is paramount that the process is business-centered. Data modeling in BI is a way of mapping out all the different data sources and how these sources of data will fit together and flow into one another. We need to get the data from the places where a software or application stores the information into a centralized place (Data Warehouse) for querying and analyzing the data to derive business values.

What Is Data Warehousing

Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making.

Facts: A fact table consists of facts of a business process. A fact table record captures a measurement or a metric.

Dimensions: A dimension table contains the textual descriptor of the business. The fields of dimension table are designed to satisfy these two important requirements:

Query constraining / grouping / filtering.
Report labeling

For example, if we were to examine “Sales Revenue by Product”, we would most probably see this structure: Sales Revenue (Fact) and Product (Dimension).

Schemas in Data Warehousing

Schema is a logical description of the entire database. It includes the name and description of records of all record types including all associated data-items and aggregates. Like a database, a data warehouse also requires maintaining a schema. A database uses relational model, while a data warehouse uses Star and Snowflake schema.

Star Schema: In Star Schema, the dimension is represented with only one-dimension table. Each dimension table is connected to only one Fact Table in star like structure.

Snowflake Schema: Unlike Star schema, the dimensions table in a snowflake schema are normalized. The normalization splits up the data into additional tables.

Model the Data (Practical)

Before entering the practical session of modelling the data, we need to know about the process flow of Quick BI.

The basic work flow of Quick BI is as follows:

Create a data source.
Create a dataset.
Create a workbook, worksheet, or dashboard.
Building a data portal.
Subscribe for timely reports.

Note: Quick BI supports OLAP Data modelling, semantics such as dimensions (dates and locations), measures, star schema, and snowflake schema. Allows you to define calculated fields. The product allows you to edit the existing SQL script of the data source to customize your dimensions and measures.

UseCase-1: ATM Analytics

We will look at our first use case, ATM Dataset. We already ingested the data to the Quick BI. We are going to model the data and for that you need to save the data as dataset.

Procedure for creating data set from file:

Select the Cube Icon
Enter the name and specify the location
Click Ok

Dataset is created

Modeling the data:

We are going to model the data by editing the dataset accordingly to meet the requirements of business scenario, such as joining the two tables or creating a column.

Select the Dataset
Click the Edit Icon

The dataset management page is displayed.

Now we are going to model the data. I will explain about various things like changing the data type, creating a calculated column, and creating hierarchy.

Changing the Dimension or Measure Name:

Click the Gear Icon
Click Edit to Change the Name and to add notes

Click Ok
Name is Changed

Basic Functions:

Click the Gear Icon

Duplicate Dimension or Measure (It will create a copy of the column)
Hide in Analysis (It will hide the column in report interface)
Cancel all hidden (It will remove all the hidden columns and make it available for reporting)
Delete (It will delete the column)

Creating a Calculated Column (Dimension)

Click the Gear Icon -> Create a Calculated Column or + icon

Search the function on the right-side pane for reference
Name the Calculated column
Write the expression (you can add column by simply double clicking it in the left side pane)
Click ok

Joining the dataset:

Let’s look at the ATM_Master Details

Ingest the ATM Master Details data into Quick BI
Click the Join Icon
Select the Dataset Field From ATM
Select the Other Dataset ATM_Master Details
Select the Join on Field
Click Ok

Changing the Dimension Type:

Click the Gear Icon
Select the Change Dimension Type
Select Location
Click the Country

Creating a Calculated Column (Measure):

Click the Gear Icon -> Create a Calculated Column or + icon

Search the function on the right-side pane for reference
Name the Calculated column
Write the expression (you can add column by simply double clicking it in the left side pane)
Click ok

Creating the Hierarchy:

Click the Gear Icon
Select Create Level

Hierarchy Level is Created
To add other column in hierarchy, Select the column Gear icon
Click to Move to

Column is added to Hierarchy.

Changing the Aggregation Type:

Click the Gear Icon
Select Aggregations
Select Any one from the list (e.g. Average)

Changing the Number Format:

Click the Gear Icon
Select Number Format
Select Any one from the list (e.g. #,###0)

UseCase-2: Customer 360

We can apply the same concepts to our next use case, Customer360. The steps are similar, we just need to replace the ATM data to perform all the basic tasks for data modelling. With this understanding I think we can model the data to our business scenario.

Model the Data (Best Practices)

Validate the data types.
Create Calculated columns whenever necessary.
Change the Locations as Location Type in QuickBI otherwise it will not detect it.
Numbers should be in integer.
Format string as per the business scenario.
Format the date as per the business scenario.
Star Schema is preferable than snowflake schema.
Use custom SQL to model the data.

Conclusion

If you followed the steps correctly, you should have successfully modeled your data in Quick BI as per the demands of your businesses. In my opinion, this is the toughest part in the process of deciphering data to uncover hidden insights. We will explore how we can visualize our data and narrate data stories in future articles of this multi-part series.

Reference:https://www.alibabacloud.com/blog/deciphering-data-to-uncover-hidden-insights-%E2%80%93-data-modeling_594073?spm=a2c41.12144458.0.0