Written by: Bima Putra Pratama, Data Scientist, DANA Indonesia
This is a series, please keep following the link to see all the steps.
Imagine we have a retail store that selling various products. To be more successful in your business, we have to understand our customers well. Especially in today’s competitive world. So that we can answers:
- Who are our best customers?
- Who are our potential customers?
- Which customers that need to be targeted and to be retained?
- What are the characteristics of our customers?
One way to understand our customers is by conducting customer segmentation. Segmentation is a process of categorizing customers into several groups based on common characteristics. We can use many variables to segment our customers. The information such as customer demographic, geographic, psychographic, technographic, and behavioral are often used as a differentiator to segment our customers.
By enabling customer segmentation in the business, we will be able to personalized your strategy to suit each segment’s characteristics. So that customer retention can be maximized, customer experience can be improved, have better ad performance, and marketing costs can be minimized.
So, how can we do this customer segmentation?
We will be applying unsupervised machine learning techniques to make customer segmentation on the retail dataset. We will use Recency, Frequency, and Monetary (RFM) that proven as a useful indicator of customer transaction behaviors.
We will leverage the following products to build this use case:
- Object Storage Service (OSS). OSS is an encrypted, secure, cost-effective, and easy-to-use object storage service that enables you to store, back up, and archive large amounts of data in the cloud, with guaranteed durability.
- MaxCompute (previously known as ODPS). It is a general-purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing. MaxCompute supports various data importing solutions and distributed computing models, enabling users to effectively query massive datasets, reduce production costs, and ensure data security.
- DataWorks is a Big Data platform product launched by Alibaba Cloud. It provides one-stop Big Data development, data permission management, offline job scheduling, and other features. Also, it offers all-around services, including Data Integration, DataStudio, Data Map, Data Quality, and DataService Studio.
- Machine Learning Platform for AI (PAI) provides end-to-end machine learning services, including data processing, feature engineering, model training, model prediction, and model evaluation. Machine Learning Platform for AI combines all of these services to make AI more accessible than ever.
- Data Lake Analytics is an interactive analytics service that utilizes serverless architecture. DLA uses SQL interfaces to interact with user service clients, which means it complies with standard SQL syntax and provides a variety of similar functions. DLA allows you to retrieve and analyze data from multiple data sources or locations such as OSS and Table Store for optimal data processing, analytics, and visualization to give better insights and ultimately guide better decision making.
We will start by preparing our data and then doing model training, followed by creating a pipeline for serving the model.