Supporting 200 Billion Data Records in a Single RDS PostgreSQL Instance

By Digoal

Alibaba Cloud ApsaraDB for PostgreSQL is capable of handling large amounts of records at a time. How much? Well, let’s look at an example to find out. Let’s assume we have 2 billion users, each with 1000 tags, and we need to perform user tagging and pivoting based on any combination of tags (the business requirement is to compute a combination of up to 100 tags at a time).

This is equivalent to handling 200 billion records at a time, each requiring real-time response.

You may think that this would need at least a hundred machines. But in fact, this amount of data only requires one ApsaraDB for RDS PostgreSQL instance. This article discusses the cutting-edge RDS PG technology that helps us solve this sort of business requirements while consuming minimum resources.

Optimization Solution to Improve Response Speed

  1. bitmap segmentation
  2. Use parallel computing to determine the USER COUNT which meets the tag conditions (using dblink asynchronous calls)
  3. Return streaming cursors when determining user IDs.

Sample Demo

Install required plug-ins

Create a tag table, and split it into segments, for example a table of 2 billion users may be split into 400 segments, with each segment having 50 million user BITs.

Create an index (constraint)

Create BITMAP data for 1000 tags, with 400 pieces of data for each tag, and each piece of data having a length of 50 million bits.

Create a function that generates dblink connections (does not report errors for repeated creations).

Parallel computing function for AND tag combinations (dblink asynchronous parallelism) returns the USERID bitcount.

Parallel computing function for OR tag combinations (dblink asynchronous parallelism) returns the USERID bitcount.

Parallel computing function for AND/OR tag combinations (dblink asynchronous parallelism) returns the USERID bitcount.

Performance of bitcount pivoting is as follows: only 1.5 seconds for a 50-tag combination, and 2.6 seconds for a 100-tag combination:

It only takes 2.6 seconds to count the combination of 200 billion user_tags (one record for each user, and 1000 tags for each record)!

The performance of the AND/OR combination is as follows, and is also very good:

The AND function that determines USERID is as follows. In order to achieve high speed response, we return cursors.

The OR function that determines USERID is as follows. In order to achieve high speed response, we return cursors.

The AND_OR function that determines USERID is as follows. In order to achieve high speed response, we return cursors.

The following example very quickly determines the USERID in only 88 ms.

It takes only 692 ms to determine the cursor values for 50 million IDs:

If bit position translation is done on the client, then we only need to get the resulting bitmap, which is much faster. It only takes 224 ms to return a bitmap for 50 million bits. This can be made concurrent, so that each client can get different ofids.

We can also use the BIT operation to get USERIDs for users that contain a certain tag but do not contain another.

Example:

Summary

varbitx, an extension provided by Alibaba Cloud ApsaraDB for RDS PostgreSQL, enables real-time tagging for trillions of USER_TAGs using a single RDS PG.

We can use BITMAP segmentation, DBLINK asynchronous query, cursor, and other technologies to improve performance.

Performance indicators:

  1. Response speed of 2.6 seconds for determining the count of users that meet the tag conditions for 200 billion (2 billion users, a 100-tag combination) USER_IDs.
  2. It only takes 692 ms to determine USERID details, and return BIT positions of 50 million user IDs.
  3. It only takes 224 ms to determine USERID details and return a 50 million bit BITMAP.

Reference:https://www.alibabacloud.com/blog/supporting-200-billion-data-records-in-a-single-rds-postgresql-instance_594279?spm=a2c41.%2012435726.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store