Analyzing Data on Tableau with Data Lake Analytics

By Sai Sarath Chandra, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud’s incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

In this tutorial, we will analyze raw data with Tableau using Data Lake Analytics. We will analyze the files that are available in Object Storage Service (OSS).

We will be performing the following steps in sequence:

  1. Activate Data Lake Analytics
  2. Upload data files to OSS
  3. Create relevant schema and tables
  4. Connect Tableau with Data Lake Analytics
  5. Analyze data using Tableau

Activate Data Lake Analytics

Firstly, you have to activate Alibaba Cloud Data Lake Analytics using the Console or through the product page. After successful activation, you will receive a mail containing the credentials for accessing Data Lake Analytics.

Upload Data Files to Object Storage Service (OSS)

You have to create a bucket in OSS console/OSS Browser in the same region. So assuming if you are using Data Lake Analytics outside China, you can use the Singapore region. This means that the data has to be stored in the OSS Bucket in Singapore. The choice of Availability Zone does not matter.

Grant permissions to Data Lake Analytics to access OSS in the RAM Console or by clicking the below link:

https://ram.console.aliyun.com/?spm=a2c63.p38356.a3.4.655c55b1YWtGFm#/role/authorize?request=%7B%22Requests%22:%20%7B%22request1%22:%20%7B%22RoleName%22:%20%22AliyunOpenAnalyticsAccessingOSSRole%22,%20%22TemplateId%22:%20%22OSSRole%22%7D%7D,%20%22ReturnUrl%22:%20%22https:%2F%2Fwww.alibabacloud.com%2Fhelp%2Fdoc-detail%2F70386.htm%22,%20%22Service%22:%20%22OpenAnalytics%22%7D

Upload the respective files into the bucket, either by console or browser.

Image for post
Image for post

The data which I used for this article is available in the GitHub link below:

https://github.com/saichandu415/Public-Datasets

Create Relevant Schema and Tables

We use Data Lake Analytics SQL to create the schema and tables. All the information related to the SQL are found here: https://www.alibabacloud.com/help/doc-detail/72005.htm

We will create a new schema named ‘ecomm’

The schema creation is different, you need to specify the LOCATION of where data files are located or in which bucket they are in. Make sure It should be ending with ‘/’ and don’t forget to mention the catalog section. By default we can create 100 schemas, 4096 tables under each schema, and 990 columns in each table

We will import the file(geolocation_olist_public_dataset.csv) data contains all the locations of all the orders, by creating a table under the schema by using the following command

All the SQL data types are available and will soon be updated in the documentation, since all the .csv files are updated with the Header data in the first entry of data. We can ignore that by using the detail by using the TBLPROPERTIES with skip.header.line.count

You can query the data by running following command and verify the data

Image for post
Image for post

We also have file(olist_public_dataset_v2.csv) that consists of all the order details with all the below columns.

TBLPROPRTIES("skip.header.line.count") can also be used to skip multiple lines, the logic of the skip.header.line.count is as below:

Suppose x is the value of skip.header.line.count; n is the number of lines in the data file:

  1. If x<=0, then NO lines will be skipped
  2. If 0
  3. If x>=n, all the lines will be skipped

You can verify the data from by executing,

Image for post
Image for post

We need to import seller order list by using the following command

You can verify the data by running a select command

Image for post
Image for post

Connect Tableau with Data Lake Analytics

Install latest version of Tableau.

Data Lake Analytics currently support only JDBC Connection to other analysis tools, you can select the ‘JDBC’

Image for post
Image for post

The only database connection string is below, since Data Lake Analytics is only supported in Singapore.

You need to create a JDBC connection in Tableau. Once after connection you will be able to see the below dashboard

Image for post
Image for post

For some reason the Data Lake Analytics Data source, double clicking the table didn’t show the data. This is a simple fix. You can copy the SQL statement from the error and can update by adding “ecomm”

Image for post
Image for post

After copying, you need to add the table name highlighted above. Switch to sheet and select map type.

Image for post
Image for post

You need to choose “Longitude” to the columns and “Latitude” to the rows and select the “Number of Records” & “City” to the marks section.

Now we analyze the orderlist, using the prod_ordr_lst. We analyze the data using the “packed bubbles” type. Both the products value and the city

Image for post
Image for post

We can do some interesting things like representing the data by sorting we can use ascending format.

Image for post
Image for post

We can also analyze the seller order list(slr_ordr_lst) and find out how many products are being sold from each state as below, the custom SQL query is

Image for post
Image for post

Important Observations and Tips:

  1. The current release version doesn’t support UPDATE and DELETE Statements but they are in pipeline.
  2. Data Lake Analytics performs fetches data in much faster speed compared to traditional systems due to high-performance engine Fully leveraging Massive Parallel Processing (MPP) architecture, providing vectorized execution optimization, operator pipelined execution optimization, multi-tenancy resource allocation and priority scheduling.
  3. With Data Lake Analytics you don’t need to worry about maintenance, it is powered by Serverless Architecture, Elastic scalability and pay-as-you-go mode. You only get charged for the queries you execute.
  4. Data Lake Analytics connects seamlessly with other Alibaba Cloud products, Like QuickBI and DataV. You can also store the output onto the OSS bucket. DLA helps to use heterogenous data sources unlike the data warehousing where you see only structured data. The typical use cases will be like below
Image for post
Image for post

To learn more about Alibaba Cloud Data Lake Analytics, visit www.alibabacloud.com/products/data-lake-analytics

Reference:https://www.alibabacloud.com/blog/analyzing-data-on-tableau-with-data-lake-analytics_594166?spm=a2c41.12287098.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store