Exploration and Practice of Database Disaster Recovery in the DT Era
As we enter into the Data Technology (DT) era, enterprises are increasingly dependent on data, and data protection becomes ever essential for enterprises. Only those enterprises that take preventive measures and make sufficient technical preparations can survive IT disasters. During the Best Practices for Enterprise Databases session at The Computing Conference 2018, topics related to disaster recovery attracted much attention. Based on the information shared by the Alibaba Cloud database team in the forum, this document explains how to use the database cloud product portfolios to tailor disaster recovery solutions to the development status of enterprises.
Data is an important production resource for an enterprise. When data is lost, the enterprise’s customer information, technical documentation, and financial accounts may be lost, which can inhibit customer relations, transactions, and production. In general, data loss can be classified into three types:
- Logical errors, including software bugs, virus attacks, and corrupted data blocks
- Physical damage, including server damage and disk damage
- Natural disasters, such as fires and earthquakes that can destroy data centers
To prevent the economic loss caused by data loss, enterprises must take disaster recovery measures to protect data. As the informatization of enterprises progresses, the importance of disaster recovery measures increases.
Enterprise-Class Database Disaster Recovery System
Definition of Disaster Recovery
Disaster recovery refers to creating backups and disaster tolerance.
- Backing up is to prepare one or more copies of the important data generated by application systems or those of critical original data.
- Disaster tolerance is to deploy two or more IT systems with the same functions in two separate locations that are distant from each other in the same or different cities. These systems monitor the health of each other and support switchover upon failures. In the event that a system stops working due to an accident (a natural or man-made disaster), the entire application system is switched to the other system so that the services are sustained.
Disaster Preparation Pain Points
- Backup pain points
- Backup failures
- Slow recovery
- Lossful recovery
- High remote backup costs
- Low cost performance
- Disaster tolerance pain points
- A single disaster tolerance solution supports only a few scenarios and cannot meet the requirements of scenarios with different data sizes.
- The disaster tolerance solution lacks global control and management over the system because of the lack of link monitoring and quick fault identification.
- The inspection capability is unavailable.
- The fault recovery costs are high, and it is difficult to make decisions regarding data verification, comparison, and correction.
- Collaboration is difficult for the switchover of multi-layer disaster recovery tools.
- The contingency plan lacks proper control, and the O&M process cannot be automated.
An enterprise-class database disaster recovery system should be selected based on business requirements and take the following factors into full consideration: RPO, RTO, costs, and scalability. The system must also meet the various requirements of database disaster recovery, including the setup of the disaster recovery environment, data synchronization, monitoring and alarming, drilling, failover, and data verification and correction.
Core Products for Enterprise-Class Database Disaster Recovery
After multiple rounds of iteration, the outstanding disaster recovery capabilities of Alibaba Cloud products have been well proved. The following core products can help enterprises to develop database disaster recovery solutions for different scenarios or requirements:
- Database Backup Service (DBS) is a backup service that provides continuous protection for databases at a low cost. It offers powerful protection for data in various environments, including enterprise data centers and other cloud vendors. Database Backup Service provides a total data backup and operation recovery solution, and supports real-time incremental backup and data recovery in seconds. You can use Database Backup Service for data backup between databases in a database disaster recovery solution.
- Data Transmission Service (DTS) is a data streaming service provided by Alibaba Cloud to support the data exchange between different types of data sources. It provides data transmission capabilities such as data migration, real-time data subscription, and real-time data synchronization. In a database disaster recovery solution, you can use Data Transmission Service to implement data migration and real-time synchronization between various databases, laying a solid foundation for database disaster recovery.
- Hybrid Cloud Database Management (HDM) is a platform that helps enterprises to connect different components in the hybrid cloud database architecture. Meanwhile, it supports the central management of multiple environments, quick and elastic data migration to the cloud, and failover. In the hybrid cloud disaster recovery scenario, you can use HDM to conveniently and quickly synchronize data from the local IDC to the cloud and conduct disaster recovery drills. When a fault occurs, you can implement failover on the HDM platform to maintain the availability of databases.
In the disaster recovery scenario, we recommend that you use HDM in conjunction with other Alibaba Cloud products, such as DRDS and OSS. These products have undergone both internal and external Alibaba Cloud verification and turned out to be highly reliable. With these products, you can benefit from high flexibility in the disaster recovery scenario.
If you need data backup, for example, if you need continuous real-time backup that does not affect business operations, you can purchase Database Backup Service to implement hot backups of your databases. This service supports real-time incremental backups and data recovery within seconds. The following figure shows the architecture of the solution:
The design of the architecture is described as follows:
- Deployment of key components:
- Two databases, the production and recovery databases, are deployed in the local area for production data storage and data recovery, respectively.
- The purchased storage service is available in two Alibaba Cloud regions, for example, China (Shenzhen) and China (Qingdao). The storage service can be Object Storage Service (OSS) or Network Attached Storage (NAS).
- In the meantime, Database Backup Service is purchased for the real-time hot backup of local databases to the cloud.
- Backup of the off-cloud production data onto the cloud:
(You can use either of the following methods to back up the off-cloud production data onto the cloud.)
- Deploy another local storage system to back up the production data to the storage space of the local IDC, and then copy this backup data from the local IDC to the cloud.
- Use Database Backup Service for direct hot backup of data from the local production database to the cloud storage spaces in both regions.
- Data recovery:
- If the production database fails but the storage space of the local IDC is operating normally, restore data from the local storage space to the local recovery database.
- If both the production database and the storage space in the local IDC fail or the local storage space is not available, use Database Backup Service to recover data from the cloud storage space to the local recovery database.
- Architecture characteristics:
- Advantage: Supports demanding technical requirements, good consistency, and quick recovery.
- Disadvantage: The RTO varies depending on the size of the database.
- Application scenario: The real-time backup solution is a sophisticated solution applicable to most relational databases.
Multiple Remote Active Backups
You can find all the following solutions in the enterprise-class database disaster recovery system: on-cloud elastic disaster tolerance, dual or multiple active backups, and three centers in two locations. The following example describes a solution using multiple remote active backups. This solution supports data-level remote dual active backups and one-click switchover to another data center for flexible scale-up or scale-down and future linear expansion.
- Unit-based reconstruction is performed on applications.
- Data Transmission Service is deployed for implementing bi-directional synchronization between databases in two or more locations, resolving the single-points-of-failure problem in the same city.
- HDM is deployed to monitor and manage the architecture with dual or multiple active backups. Meanwhile, HDM also supports switchover and failover.
- The two data centers support read/write splitting, and local users read data from the nearest data center.
New Product: Database Backup Service
As an on-cloud database backup agent, Database Backup Service is used with OSS to create a cloud database backup solution. This solution takes only five minutes to implement real-time backups with a second-level RPO (which indicates the maximum duration allowed for data loss when the database fails, with a smaller value desired).
When Database Backup Service is deployed, the entire backup process is opened up and does not block any service requests to databases. With this service, you can choose to back up either the entire instance or a single table. When a misoperation is detected, you can use Database Backup Service to restore data to the copy from any point in time. In this way, the data of the entire instance or the specified table can be restored to the state up to one second before the misoperation. In addition, Database Backup Service is available in multiple specifications, meeting the backup requirements of databases whose sizes range from hundreds of MBs to hundreds of GBs.
At present, the backup system provided by Database Backup Service has been proven by massive numbers of users. Database Backup Service not only supports real-time backups and second-level RPO but also offers a table-level recovery capability. This helps users to restore only valuable data, decreasing the RTO to minutes.
It is worth mentioning that real-time backup has been tested during Double Eleven over several years. Database Backup Service will also provide the online query function. By using this function, after a data backup task is completed, you can run SQL statements to instantly query the backup data. Also, you can export the query results into Excel or Word files for further analysis or generate Insert and Replace statements to correct the data.
To learn more about Alibaba Cloud Database Backup Service, visit https://www.alibabacloud.com/products/database-backup