Minimizing Business Downtime with Alibaba Cloud Hybrid Disaster Recovery Solution

Alibaba Cloud
11 min readNov 27, 2018

By Alibaba Cloud Storage Team

At The Computing Conference 2018 in Hangzhou, Alibaba Cloud presented its Hybrid Disaster Recovery (HDR) and Hybrid Backup Recovery (HBR) solution by demonstrating a full disaster recovery process on stage. The five-minute HDR demonstration presented the end-to-end process of enterprise application disaster recovery with Alibaba Cloud’s second-level recovery point objective (RPO) and minute-level recovery time objective (RTO), covering the core steps of a typical cloud disaster recovery scenario.

We all know that downtime of key business brings great losses to enterprises. Traditional self-built disaster recovery solutions are costly and require complex operation and maintenance (O&M), so high-performance cloud disaster recovery services are becoming the preferential choice for enterprises to ensure business continuity. The HDR solution for key business fully demonstrates the real-time disaster recovery replication of an accounting system running on a local server to Alibaba Cloud, and quickly recovers the business on Alibaba Cloud after the downtime.

The entire demonstration was divided into three phases:

  1. Disaster recovery replication: When the accounting system was running, the engineer started disaster recovery replication with one click to quickly and fully replicate all the data on the disk, including the operating system (OS), applications, and files, to a cloud disk on Alibaba Cloud. The replication speed was limited to 35 Mbit/s to ensure that the business was not affected. After the full replication was completed, the system entered the real-time replication status and the RPO reached about 5 seconds.
  2. Business downtime: The field engineer pulled out the server hard disk, causing the server to go down and business to be interrupted. Within a few seconds, the monitoring system detected that services were unavailable and the client no longer performed any reimbursement tasks.
  3. Disaster recovery: The engineer initiated on-cloud disaster recovery. HDR created an Elastic Compute Service (ECS) instance on Alibaba Cloud that had the same configuration as the off-cloud server, attached the cloud disk that contained the replicated disk data of the local server to the ECS instance, and started the instance. When HDR detected that services on the ECS instance had been started, it switched to Domain Name System (DNS). Within 1.5 minutes, the business resumed operation on the cloud and the client continued to perform reimbursement tasks. In the actual drill, the RTO reached about 90 seconds.

Data: The Core of Digital Operations

In the digital economy era, data is growing exponentially. In just a few years, the amount of data has jumped from the TB level to the PB or even ZB level.

According to surveys conducted by research institutions, the total amount of data exceeded 15.2 ZB in 2017, with a year-on-year increase of 35.7%. The total amount of global data will reach 19.4 ZB by 2018. In the next few years, the global data will grow at a rate of more than 25% per year. The global data amount is expected to hit 50 ZB by 2020.

Undeniably, data is the core of digital operations and data security determines the survival of enterprises.

Case Study: Data Center Accidents

In August 2018, an international cloud service provider reported data leakage because its sales staff did not follow the specifications for a bucket.

In July 2018, a serious failure was exposed on a Chinese cloud platform. The failure directly led to the loss of all data of a startup company, which then faced an unprecedented shutdown crisis.

On May 12, 2017, the WannaCry worm incident caused chaos across the world. ATMs in banks were out of order, computers in gas stations broke down, and students’ theses were encrypted.

In January 2017, the O&M personnel of a code hosting platform mistook the production environment for a test environment during multi-terminal switching and accidentally deleted the database of the production environment.

In November 2014, a financial payment company experienced a system failure, causing nearly 400 million duplicated receipts.

According to IDC statistics, 55% of the companies that experienced disasters in the past decade collapsed right away. In the remaining 45%, 29% also went bankrupt within two years due to data loss. Only 16% survived.

According to a Gartner report, two-fifths of the companies that experienced system downtime due to major disasters never resumed operations again, and one-third of the remaining companies went bankrupt within two years.

In this context, enterprises urgently need to strengthen data protection.

The hybrid cloud backup and disaster recovery solution is the first choice for enterprises’ digital transformation. In the past, the traditional backup and disaster recovery solution was based on a disaster recovery center built with an architectural system similar to a production center. Although this solution meets the data backup and replication needs of the production center, it poses many challenges to enterprises because of its inconvenient and long implementation period, expensive devices, and complex O&M.

Compared to the traditional backup and disaster recovery solution, the hybrid cloud backup and disaster recovery solution features high efficiency, high availability, high cost effectiveness, and no O&M. This modern solution helps customers securely and efficiently back up files, databases, virtual machines (VMs), and even the whole system, locally or on the cloud. In addition, the application server system backed up on the cloud can work as a virtual server, which delivers the required RPO and RTO to ensure business continuity and disaster recovery on the cloud.

Alibaba Cloud HBR and HDR

Hybrid Backup Recovery (HBR) is an easy-to-use and cost-effective online backup service. This service helps customers back up data from desktops, servers, or VMs to a backup repository on Alibaba Cloud, which ensures secure and efficient cloud storage, backup, and management of customer data.

Typical Application Scenario for Hybrid Backup Recovery (HBR)

Data Protection in Various IT Environments

A customer requires data protection in different IT environments, including physical machines and virtual platforms in a local Internet data center (IDC), ECS servers on Alibaba Cloud, and servers on other public cloud platforms.

The RPO and RTO are not demanding, but need to ensure data security and recoverability.

In this scenario, HBR has the following advantages:

  1. Features easy-to-use, out-of-the-box services, and rapid deployment within minutes, reducing learning costs.
  2. Uses encryption and multiple copies to ensure secure and reliable cloud backup. The data reliability is 99.999999999%.
  3. Uses deduplication and compression to reduce bandwidth occupation and backup costs, and backs up data on the cloud with very limited bandwidth.
  4. Supports elastic scaling and the Pay-As-You-Go mode, and provides multiple discount packages to meet the requirements of prepaid customers.

Centralized Backup of Multiple Branches + Cross-Region Disaster Recovery

A customer has established multiple branches in different cities, provinces, and regions. Each branch has data to be backed up.

Backup devices need to be deployed in each physical IDC, resulting in scattered and complex management of devices. It is also difficult to ensure successful data backup.

Based on the multi-region disaster recovery requirements, the customer hopes that a copy of the backup data can be stored in another region to prevent data or services from becoming unavailable due to regional failures.

In this scenario, HBR has the following advantages:

  1. Backup storage hardware devices do not need to be deployed or managed in each IDC, reducing the management complexity and O&M costs.
  2. Backup data can be managed in a uniform manner.
  3. Data sources with variable lengths can be efficiently deduplicated, with a deduplication ratio of up to 30:1. This reduces the network bandwidth and storage resource consumption, and shortens the backup window.
  4. The client data is permanently and incrementally backed up on the cloud. The HBR repository is fully replicated at each time point. This improves the backup and recovery efficiency, and achieves quick recovery.

Different from HBR, HDR provides “Cloud + Local” dual backup and cloud disaster recovery for enterprise applications. HDR protects servers, files, and applications. The disaster recovery server is deployed in the local IDC to quickly recover local data. At the same time, the backup data is synchronized to the cloud disaster recovery repository for disaster recovery on the cloud. If the IDC has a failure, HDR is able to recover the business server on the cloud to guarantee business continuity. HDR can also be used for disaster recovery drills or data analysis. To protect data in a big data cluster architecture, Alibaba Cloud has released the first big data backup and disaster recovery solution on the public cloud.

Typical Application Scenario for Hybrid Disaster Recovery (HDR)

Real-Time Replication of the Core Business on the Cloud + Cloud Disaster Recovery Takeover

A customer needs to replicate the production environment in real time to ensure that the business can be quickly taken over on the cloud if a business failure occurs in the local IDC.

The business system is built based on a complete set of servers, including database servers, file servers, and application servers.

In this scenario, HDR has the following advantages:

  1. Meets the customer’s requirements for core business data protection and enables remote disaster recovery with public cloud platform capabilities.
  2. Ensures continuous data protection, application consistency, and second-level RPO.
  3. Supports elastic scaling and on-demand configuration. During normal times, ECS instances do not need to be created on the cloud, thereby reducing the cost by at least 20% compared to traditional real-time replication solutions at the OS layer.
  4. Provides an orchestrated one-click disaster recovery mode that enables the customer to pre-deploy the key steps of the disaster recovery takeover, achieving minute-level RTO.

Local Backup + On-Demand Disaster Recovery Configuration on the Cloud

A customer needs to replicate the production environment locally to ensure quick recovery of data in the local IDC if the data is deleted by mistake or a disk failure occurs.

The customer also requires remote disaster recovery for the local backup data to ensure that business can be quickly restored in the disaster recovery center if disasters occur in the local IDC, shortening the business interruption time.

The business system is built based on a complete set of servers, including database servers, file servers, and application servers.

In this scenario, HDR has the following advantages:

  1. Meets the customer’s local backup requirements, and supports on-demand remote disaster recovery. Uses public cloud platform capabilities to allow the customer to recover business quickly if disasters occur, without the need to build disaster recovery IDCs, thereby reducing disaster recovery costs by 70%.
  2. Supports elastic scaling and on-demand configuration. During normal times, ECS instances on the cloud do not need to be restored or started. The customer uses only the disaster recovery repository resources on the cloud. Such resources support elastic scaling.
  3. Enables the off-cloud IDC to regularly back up data, ensuring hour-level data backup RPO and on-cloud or off-cloud system data recovery RTO.
  4. Features high reliability. Based on the Alibaba Cloud infrastructure, HDR ensures reliable, secure, and timely data recovery when disasters occur.

Local Backup + On-Demand Disaster Recovery Configuration on the Cloud

A customer has deployed a Hadoop big data cluster, which contains hundreds of TB of data, in the local IDC. If a remote disaster recovery cluster of the same size is built, a lot of resources are idle and the costs are high.

The customer’s required RPO is close to zero, which cannot be satisfied by the traditional DistCp solution.

In this scenario, HDR has the following advantages:

  1. Makes full use of public cloud resources to build a big data cluster on the cloud.
  2. Uses asynchronous real-time replication technology, enabling the RPO to be close to zero and smooth scaling to the cloud.
  3. Uses two active clusters on and off the cloud to run different types of business. No resources are idle and the total cost of operation (TCO) is low.
  4. Expands nodes elastically on the cloud to provide fast and stable computing resources to meet fluctuating traffic demands.

Combining Alibaba Cloud Hybrid Backup Recovery with Hybrid Disaster Recovery

Grading RPO and RTO

HBR and HDR support multi-level RPO and RTO to meet different business disaster recovery requirements.

HBR and HDR ensure real-time replication of the core business, regular disaster recovery of key business, and regular backup of common business.

Quick and Easy-to-Use

No hardware or gateway device is required.

The backup space is ready for use upon purchase and easy to expand.

Cost-Effective and Purchase on Demand

The deduplication ratio of up to 30:1 allows 30 copies to occupy the original space of only one copy.

The permanent incremental technology prevents redundant data from being uploaded repeatedly and delivers an extremely fast speed for access to the cloud without private lines.

The amount of data is calculated and charged based on the cloud storage space instead of the source backup amount, minimizing backup costs.

The infrastructure of public cloud platforms reduces more than 70% of costs compared to the traditional disaster recovery center.

Safe and Reliable, and Multi-Region Disaster Recovery

The data is backed up in multiple versions. A full copy is stored at each time point.

The cloud storage reliability is 99.9999999999%.

The multi-availability zone (AZ) backup repository for disaster recovery relies on the support of multiple zones on the cloud platform to protect data in multiple copies.

The configurable cross-region remote disaster recovery mode prevents data loss caused by major regional failures.

Whole-System Disaster Recovery; No Business Transformation Required

The whole system is backed up and recovered. Customers can recover the same business system on the cloud without the need to change applications or IP addresses.

When the whole system is on the cloud, the data of its offline IDC can reflow, which helps to restore applications to the original status after the offline IDC is recovered.

Mainstream Platform Support, and Protection of Structured and Unstructured Data

OS: Windows and Linux

System platform: VMware, Hyper-V, and physical servers

Database: SQL Server, Oracle, and other databases




Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: