Alibaba Cloud Tablestore: A Case Study on Backing up a Massive Volume of Structured Data

Requirements

  • Data Reliability of Storage Systems: Alibaba Cloud Tablestore is a serverless NoSQL multi-model database independently developed by Alibaba Cloud for storing massive structured data. It provides 99.9999999% data reliability, which is a very high standard in the industry.
  • Data Restoration After a Misoperation: A misoperation at some point is inevitable. Data backup restores data as soon as possible when a misoperation occurs. Two backup solutions are available. The first one is to deploy local disaster recovery or geo-disaster recovery. This can be costly and is primarily used for backup of basic social or financial information. The second one is to back up data to another inexpensive system for restoring the data in case of misoperation. For the latter option, users generally go with a file storage system like Alibaba Cloud Object Storage Service (OSS).

Tablestore Backup and Restoration Solution

Case Study: Tablestore Backup and Restoration Solution

  • Determine a backup plan and policy.
  • Use the Tunnel Service SDK to write code.
  • Execute the backup policy and monitor the backup process.

Determining a Backup Plan and Policy

  • Full Backup: This copies all the files, folders, or data on a hard disk or database at once.
  • Incremental Backup: This backs up subsequently updated data after the last full or incremental backup.
  • Differential Backup: This backs up changed files after a full backup.
  • Selective Backup: This backs up a part of the system data.
  • Cold Backup: This acks up data when the system is in a stopped or maintenance state. The backup data and the system data from this period must necessarily be the same.
  • Hot Backup: This backs up data when the system runs properly. The system data updates at any time so that the backup data may lag behind the actual system data.
  • Backup Content: Data tables in Tablestore
  • Backup Time: This includes the following:
  1. A full backup is performed periodically. The interval is adjustable and the default value is one week.
  2. Incremental backup is also performed periodically based on the configuration. Tunnel Service of Tablestore ensures a strict data order. Each incremental file is appended to the stream and is consumed at any time.
  • Backup Methods: There’s full backup, incremental backup, and hot backup.

Using the Tunnel Service SDK to Write Code

private static void createTunnel(TunnelClient client, String tunnelName) {
CreateTunnelRequest request = new CreateTunnelRequest(TableName, tunnelName, TunnelType.BaseAndStream);
CreateTunnelResponse resp = client.createTunnel(request);
System.out.println("RequestId: " + resp.getRequestId());
System.out.println("TunnelId: " + resp.getTunnelId());
}
  1. Progress loss may occur when Gson deserializes the Long type into the Number type. However, it is possible to resolve this problem by using several different methods. One effective method is to serialize the Long type into the String type.
  2. Base64 encodes and decodes the binary data.
  3. Write the data directly to OSS, which reduces resource consumption during local persistence.
this.gson = new GsonBuilder().registerTypeHierarchyAdapter(byte[].class, new ByteArrayToBase64TypeAdapter())
.setLongSerializationPolicy(LongSerializationPolicy.STRING).create();
// ByteArrayOutputStream到ByteArrayInputStream会有一次array.copy, 可考虑用管道或者NIO channel.
public void streamRecordsToOSS(List<StreamRecord> records, String bucketName, String filename, boolean isNewFile) {
if (records.size() == 0) {
LOG.info("No stream records, skip it!");
return;
}
try {
CsvWriterSettings settings = new CsvWriterSettings();
ByteArrayOutputStream out = new ByteArrayOutputStream();
CsvWriter writer = new CsvWriter(out, settings);
if (isNewFile) {
LOG.info("Write csv header, filename {}", filename);
List<String> headers = Arrays.asList(RECORD_TIMESTAMP, RECORD_TYPE, PRIMARY_KEY, RECORD_COLUMNS);
writer.writeHeaders(headers);
System.out.println(writer.getRecordCount());
}
List<String[]> totalRows = new ArrayList<String[]>();
LOG.info("Write stream records, num: {}", records.size());
for (StreamRecord record : records) {
String timestamp = String.valueOf(record.getSequenceInfo().getTimestamp());
String recordType = record.getRecordType().name();
String primaryKey = gson.toJson(
TunnelPrimaryKeyColumn.genColumns(record.getPrimaryKey().getPrimaryKeyColumns()));
String columns = gson.toJson(TunnelRecordColumn.genColumns(record.getColumns()));
totalRows.add(new String[] {timestamp, recordType, primaryKey, columns});
}
writer.writeStringRowsAndClose(totalRows);
// write to oss file
ossClient.putObject(bucketName, filename, new ByteArrayInputStream(out.toByteArray()));
} catch (Exception e) {
e.printStackTrace();
}
}

Executing the Backup Policy and Monitor the Backup Process

public class TunnelBackup {
private final ConfigHelper config;
private final SyncClient syncClient;
private final CsvHelper csvHelper;
private final OSSClient ossClient;
public TunnelBackup(ConfigHelper config) {
this.config = config;
syncClient = new SyncClient(config.getEndpoint(), config.getAccessId(), config.getAccessKey(),
config.getInstanceName());
ossClient = new OSSClient(config.getOssEndpoint(), config.getAccessId(), config.getAccessKey());
csvHelper = new CsvHelper(syncClient, ossClient);
}
public void working() {
TunnelClient client = new TunnelClient(config.getEndpoint(), config.getAccessId(), config.getAccessKey(),
config.getInstanceName());
OtsReaderConfig readerConfig = new OtsReaderConfig();
TunnelWorkerConfig workerConfig = new TunnelWorkerConfig(
new OtsReaderProcessor(csvHelper, config.getOssBucket(), readerConfig));
TunnelWorker worker = new TunnelWorker(config.getTunnelId(), client, workerConfig);
try {
worker.connectAndWorking();
} catch (Exception e) {
e.printStackTrace();
worker.shutdown();
client.shutdown();
}
}
public static void main(String[] args) {
TunnelBackup tunnelBackup = new TunnelBackup(new ConfigHelper());
tunnelBackup.working();
}
}

Restoring the Files

  • Download data from Alibaba Cloud OSS in streaming or multipart mode, and increase the number of concurrent downloads when necessary.
  • Write data to Tablestore through BatchWrite. The following snippet shows the Restore instance code (excluding a few details) that is used to perform the following operations in sequence:
  1. Compute the name of the file to download based on the backup policy.
  2. Read the file from OSS through streaming download.
  3. Write the file to the Restore table in Tablestore through BatchWrite.
public class TunnelRestore {
private ConfigHelper config;
private final SyncClient syncClient;
private final CsvHelper csvHelper;
private final OSSClient ossClient;
public TunnelRestore(ConfigHelper config) {
this.config = config;
syncClient = new SyncClient(config.getEndpoint(), config.getAccessId(), config.getAccessKey(),
config.getInstanceName());
ossClient = new OSSClient(config.getOssEndpoint(), config.getAccessId(), config.getAccessKey());
csvHelper = new CsvHelper(syncClient, ossClient);
}
public void restore(String filename, String tableName) {
csvHelper.parseStreamRecordsFromCSV(filename, tableName);
}
public static void main(String[] args) {
TunnelRestore restore = new TunnelRestore(new ConfigHelper());
restore.restore("FullData-1551767131130.csv", "testRestore");
}
}

Summary

  1. The requirements of backup through Tablestore.
  2. The principles of data backup and restoration through the Tunnel Service of Tablestore.
  3. A step-by-step process for how you can develop a data backup and restoration solution based on Tablestore by using real code snippets.

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com