Implementing a Highly-Compressed Data Storage

1. MySQL Big Data Storage — Compression up to 10 times smaller

Alibaba Cloud ApsaraDB for RDS for MySQL supports the TokuDB engine to store data that is compressed to 5 to 10 times smaller than its original size. It also supports highly concurrent writes through the caching of intermediate nodes.

TokuDB is an optional storage engine for RDS for MySQL. With a disk-optimized index structure Fractal Tree, TokuDB’s intermediate nodes can cache data processing requests (insert/update/delete/on-line add index/on-line add column), improving the performance of high-concurrency writes by three to nine times. The node size is 4 MB (configurable), and data can be compressed by 5 to 10 times through a variety of compression algorithms such as zlib/quicklz/lzma/zstd/snappy. TokuDB also supports multiple versions of MVCC and the four isolation levels UR, RC, RR, and Serializable.

In addition to features found in the Community Edition, the source code team for RDS for MySQL also implemented a number of customized optimizations to respond to common use cases:

• Hot backup
TokuDB Community Edition does not provide hot backup. RDS for MySQL enables a hot backup solution based on the TokuDB internal checkpoint mechanism by copying the binlog, redo log, and data files. RDS for MySQL also obtains the TokuDB checkpoint lock to prevent the sharp checkpoint issue during the hot backup process.

• Improve query response on the client
The source code version provided by RDS limits TokuDB to performing sharp checkpoint once every second to avoid using too much disk bandwidth, which can result in query response failures on the client.

• Set buffer pool ratio
RDS provides a parameter tokudb_buffer_pool_ratio for you to set the percentage of memory occupied by the TokuDB engine buffer pool within the range [0,100]. The minimum TokuDB buffer pool size is 64 MB, and the minimum InnoDB buffer pool size is 64 MB (V5.6) and 128 MB (V5.7) to meet the InnoDB/TokuDB initialization requirements.

Switch RDS for MySQL to the TokuDB engine in three steps

1.Set the “loose_tokudb_buffer_pool_ratio”, that is, the proportion of the TokuDB and InnoDB shared cache occupied by TokuDB.

select sum(data_length) into @all_size from information_schema.tables where engine=’innodb’;

select sum(data_length) into @change_size from information_schema.tables where engine=’innodb’ and concat(table_schema, ‘.’, table_name) in (‘XX.XXXX’, ‘XX.XXXX’, ‘XX.XXXX’);

select round(@change_size/@all_size*100);

2.Restart the instance.

3.Alter the storage engine.


Specifically, “XXX.XXXX” refers to the name of the database and table to be altered to the TokuDB storage engine.

2. MySQL MaxCompute

Alibaba Cloud provides the MaxCompute service for storage and calculation of batch structured data. The service offers mass data warehouse solutions and analytical modeling services for big data. You can import RDS data into MaxCompute through the Data Integration service and simple settings on the interface to achieve large-scale data computing.

Image for post
Image for post

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store