Bitmap-Based Data Processing in MaxCompute

Image for post
Image for post

By Qu Ning.

Bitmap is a technology commonly used by data developers to encode and compress user data. With the rapid processing speeds of AND, OR, and NOT operations of bitmaps, developers can filter user by such user information as profile tags and analyze weekly activity.

This article has an example that illustrates how you can encode and compute bitmaps of active user IDs from different dates using the MapReduce module of MaxCompute. We hope that this example can be helpful to you or to any other developer.

Consider the code example below:

import com.aliyun.odps.OdpsException;
import com.aliyun.odps.mapred.JobClient;
import com.aliyun.odps.mapred.MapperBase;
import com.aliyun.odps.mapred.ReducerBase;
import com.aliyun.odps.mapred.conf.JobConf;
import com.aliyun.odps.mapred.utils.InputUtils;
import com.aliyun.odps.mapred.utils.OutputUtils;
import com.aliyun.odps.mapred.utils.SchemaUtils;
import org.roaringbitmap.RoaringBitmap;
import org.roaringbitmap.buffer.ImmutableRoaringBitmap;

Now lets talk about this code. After packaging Java applications and uploading the package to a MaxCompute project, developers can call this MapReduce job, the one given above, in MaxCompute. For data in the input table, user IDs are encoded by using the date as the key, and an OR operation is performed on the bitmap-encoded user IDs of the same date. Alternatively, an AND operation can be performed as required, for example, in retention cases. Then, processed data is written to the target structural table for further processing.

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store