How Table Store Implements Cross-Region Disaster Tolerance

What Is Table Store

Disaster Tolerance Implementation Methods and Scenarios

  1. Use a distributed system and spread multiple copies of data across different regions to implement disaster tolerance. A distributed system generally applies consistency algorithms like paxos to maintain consistency among data copies. Therefore, this method can have an RPO of zero. When a failure occurs in a specific data center or area, if most of the clusters are still functional, affected services can automatically recovery from the failure. This works like failover.
  2. Use two (or more) separate systems and establish synchronization between these systems, and switch to another functioning system in the case of system failure. With this model, the two systems are independent of each other, and either of the two systems is guaranteed to have nearly consistent data simply by synchronizing data. Since data synchronization is usually asynchronous, in the event of real failures, the RPO is not quite zero, which means that a small amount of data may be lost. This architecture also has two access models. One model is that only one system is active for daily access, with the other acting as the standby system. Under the other model, both the systems share a portion of traffic and perform bidirectional data synchronization. In the event of failures, the failing system’s traffic will be switched to the other functional system. Simply put, the first model uses an active-standby mechanism, while the second model uses an active-active mechanism.

Implementing Disaster Tolerance Based on Multiple Copies of Data in a Distributed System

Implementing Disaster Tolerance through Data Synchronization and Switching among Multiple Systems

Scenario 1: Active-Standby Model

  1. The (previously) active cluster holds all the data written before switching as well as the data that has not been synchronized to the standby cluster before the switch.
  2. In addition to data that wasn’t synchronized before the switch, the new active cluster (the standby cluster) contains all the data written to it.
  1. Use the data in the standby cluster to rebuild the standby cluster based on the active cluster. This means that we should synchronize full data again based on the data in the standby cluster and then perform real-time incremental synchronization. In this case, the data in the active cluster that has not been synchronized is eliminated and no longer visible.
  2. Generally, active cluster failures will not cause the loss of table data. Therefore, we can only incrementally synchronize the data written into the standby cluster after the switch back to the active cluster. In this case, we can quickly re-establish the active-standby relationship because only incremental data (the size is relatively small) needs to be synchronized. When the active cluster is switched back, the data that has not been synchronized to the standby cluster becomes visible again or is overridden with the updated data in the standby cluster (if it’s in the same row).
  3. We can get the data that has not been synchronized from the active cluster to the standby cluster, add it to the standby cluster or do some service-related processing, and then perform incremental synchronization. This is an optimization of the second option and can solve the inconsistency problem.
  1. Because users access the Table Store service by using instance domains, the cname records of the instance domains can be changed on the server side to point to the standby cluster. This method is only suitable for the “dual clusters in the same city” plan. The names of the active instance and the standby instance are the same, but the back-end clusters use the active-standby mechanism.
  2. The application layer adds a proxy layer in its own code. This proxy layer has an internal “switch”, and supports connection to both the active and standby systems. When the “switch” is on, the other system will be automatically accessed. In this case, the active instance and the standby instance can be different. The application layer only needs to prepare configuration respectively for the active instance and the standby instance.
  3. Similar to the second method, we can use some configuration push middleware to dynamically change access configuration.

Scenario 2: Active-Active/Multi-Active Model

  1. In the active-active model, both sets of services accept access. Specifically, both Table Store instances in North China 2 and East China 1 in the preceding diagram allow read and write access, while accessing the standby instance in the active-standby model is not allowed.
  2. In the active-active model, data synchronization is bidirectional, while in the active-standby model, data synchronization is unidirectional.

How Table Store Implements Incremental Data Sync

Stream in Table Store: Servitization of Incremental Data Synchronization





Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Day 19: RayCast, Reloaded

How to Deploy Multiple EIPs on Fortinet FortiGate NGFW

How to Deploy QuickSight Dashboard Across AWS Accounts

Unity iOS Notifications with Big Image Attachments

Deploying Static Websites Using OSS and CDN on Alibaba Cloud

Public, Private, and Hybrid Cloud on Alibaba Cloud

Install GCC Man Pages on Debian 10 Buster

What Is SLA and Why Is It Important for Your Business?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

More from Medium

Use your own connector with Twitter and Aiven for Apache Kafka®

Stream avro data from kafka over ssl to Apache pinot

Outgrowing Postgres? Keep using Postgres!

Automatically restarting failed Kafka Connectors and Connect Tasks