Best Practices of Database Disaster Recovery in the DT Era
Double 11 The Biggest Deals of the Year. 40% OFF on selected cloud servers with a free 100 GB data transfer! Click here to learn more.
With the arrival of the Data Technology (DT) era, enterprises have become increasingly dependent on data. Data protection has become essential for enterprises, and only those who take preventive measures with sufficient preparations can survive in disasters. In the Best Practices for Enterprise Database Session at The Computing Conference 2018, topics related to disaster recovery attracted much attention. This article introduces the best practices of using Alibaba Cloud database cloud product portfolios to tailor the disaster recovery solutions conforming to the development status of enterprises.
The Value of Data for Enterprises
Data is important resources for production of an enterprise. Once data is lost, the enterprise’s customer information, technical documents, and financial accounts may get lost, which may hold back customer relation, transaction, and production. In general, data loss is classified into three levels:
- Logical errors, including software bugs, virus attacks, and corruption of data blocks
- Physical damages, including server damages and disk damages
- Natural disasters, such as fires and earthquakes that may tear down the data centers
To cope with economical loss caused by data loss, enterprises must take disaster recovery measures to protect data. The higher the enterprises’ degree of informatization, the more important the disaster recovery measures are.
Enterprise-Class Database Disaster Recovery System
Definition of Disaster Recovery
Disaster recovery involves two elements: disaster tolerance and backup.
- Backup is to prepare one or more copies of important data generated by the application systems or original important data.
- Disaster tolerance is to deploy two or more IT systems with the same functions at two places that are far away from each other in the same or different cities. These systems monitor the health status of each other and support switchover upon failure. In case that a system stops working due to an accident (a natural or man-made disaster), the entire application system is switched over to another system so that the services are provisioned without interruption.
Pain Points of Backup
- Backup failures
- Slow recovery speed
- Lossful recovery
- High costs of remote backup
- Low cost performance
Pain Points of Disaster Tolerance
- The disaster tolerance solution supports only a few scenarios and cannot meet requirements of scenarios with different data sizes.
- The disaster tolerance solution lacks global control and management over the system because the lack of monitoring of links and quick identification of faults.
- The inspection capability is lacking.
- The fault recovery costs are high, and it is difficult to make decisions in data verification, comparison, and correction.
- Collaboration is difficult in switchover of multi-layer disaster recovery tools.
- The contingency plan lacks properly control, and the O&M process cannot be automated.
An enterprise-class database disaster recovery system should be selected based on business requirements and full considerations must be given to the following factors: RPO, RTO, costs, and scalability. The system must also meet various requirements of database disaster recovery, including building of the disaster recovery environment, data synchronization, monitoring and alarms, drills, failover, and data verification and repairing.
Core Products for Enterprise-Class Database Disaster Recovery
After multiple rounds of iteration, the outstanding disaster recovery capabilities of Alibaba Cloud products are well proved. The following core products can help enterprises develop the database disaster recovery solutions for different scenarios or to meet different requirements.
- ApsaraDB for RDS is an on-demand database service that frees you up from the administrative task of managing a database, and leaves you with more time to focus on your core business. ApsaraDB for RDS is a ready-to-use service that is offered on MySQL, SQL Server and PostgreSQL. RDS handles routine database tasks such as provisioning, patch up, backup, recovery, failure detection and repair. ApsaraDB for RDS can also protect against network attacks and intercept SQL injections, brute force attacks and other types of database attacks.
- Data Transmission Service (DTS) is a data streaming service provided by Alibaba Cloud to support data exchange between different types of data sources. It provides data transmission capabilities such as data migration, real-time data subscription, and real-time data synchronization. In a database disaster recovery solution, you can use Data Transmission Service to implement data migration and real-time synchronization between various databases, laying a solid foundation for database disaster recovery.
- Hybrid Backup Recovery (HBR) is a simple and cost-effective Backup as a Service (BaaS) solution. It protects customer data in a number of scenarios: enterprise level data centers, remote centers, branch offices, or on the cloud. HBR supports data encryption, compression, and deduplication, and helps you back up your data to the cloud securely and efficiently.
In a disaster recovery scenario, we recommend that you integrate other Alibaba Cloud products such as DRDS and OSS. These products have undergone internal and external verifications of Alibaba Cloud and are proved to be highly reliable. You can use these products flexibly in the disaster recovery scenario.
Typical Application Scenarios
If you set high requirements for data backup, for example, continuous real-time backup without affecting business operations, you can buy Database Backup Service to implement hot backup of databases. This service supports real-time incremental backup and data recovery in seconds. The following figure shows the architecture of the solution:
The architecture design is described as follows.
Deployment of key components:
- Two databases, including the production database and recovery database, are deployed in the local area and used for storage of production data and data recovery after faults occur, respectively.
- The storage service is bought in two regions of Alibaba Cloud, for example, China (Shenzhen) and China (Qingdao). The storage service can be Object Storage Service (OSS) or Network Attached Storage (NAS).
- Database Backup Service is bought for real-time hot backup of the local databases to the cloud storage.
- Backup of the off-cloud production data onto the cloud:
- (You can use either of the following methods to back up the off-cloud production data onto the cloud.)
- Deploy one more local storage system to back up the production data to the storage of the local IDC, and then copy this backup from the storage of the local IDC to the cloud storage.
- Use Database Backup Service for direct hot backup of data from the local production database to the cloud storage in two regions.
- If the production database fails but the storage runs normally in the local IDC, recover data from the local storage to the local recovery database.
- If both the production database and the storage fail in the local IDC, or the local storage is not deployed, use Database Backup Service to recover data from the cloud storage to the local recovery database.
- Advantage: high technical requirements, good consistency, and short recovery time.
- Disadvantage: The RTO varies according to the size of the database.
- Application scenario: The real-time backup solution is a sophisticated solution applicable to most relational databases.
Multiple Remote Active Backups
You can find all the following solutions in the enterprise-class database disaster recovery system: on-cloud elastic disaster tolerance, dual or multiple active backups, and three centers in two locations. The following takes multiple remote active backups as an example to describe the solution. This solution supports data-level remote dual active backups and one-click switchover to another data center to realize flexible scale-up or scale-down and future linear expansion.
- Unit-based reconstruction is performed on applications.
- Data Transmission Service is deployed to realize bi-directional synchronization between databases in two or more locations, solving the intra-city single point problem.
- HDM is deployed to implement monitoring and management of the architecture with dual or multiple active backups and supports switchover and failover.
- The two data centers support read/write splitting, and local users read data from the nearest data center.
New Product: Database Backup Service
As a database on-cloud backup channel, Database Backup Service is used together with OSS to develop a cloud database backup solution. It takes only five minutes for such a solution to implement real-time backup with a second-level RPO. (The RPO indicates the maximum duration allowed for data loss when the database fails. A smaller RPO is often desired.)
When Database Backup Service is deployed, the entire backup process is unlocked and does not block any service requests on the databases. You can choose to back up the entire instance or a table. Once a misoperation is detected, you can use Database Backup Service to recover data at any time point. Data of the entire instance or the specified table can be recovered to the state one second before the misoperation. Database Backup Service is available in multiple specifications, which meet the backup requirements of the database with a size ranging from hundreds of MBs to hundreds of GBs.
Currently, the backup system time provided by Database Backup Service has been proved by massive users. Database Backup Service not only supports real-time backup and second-level RPO, but also has the table-level recovery capability. It helps users to recover only valuable data and the RTO can decrease to several minutes.
It is worth mentioning that real-time backup has been tested in years of Double 11 shopping festivals. Database Backup Service will further provide the online query function. After a data backup task is completed, you can immediately run SQL statements to query backup data without waiting. You can also export the query results into Excel or Word files for further analysis, or generate Insert and Replace statements to correct data.