How to Build the Most Effective Backup System — A Conversation with the Expert
Database backup is a hot topic. It may seem simple at first, but in practice, O&M personnel often encounter various problems when it comes to backing up databases. So, what typical challenges are presented and how can we build an effective backup system? Which solutions are applicable? To answer these questions, we interviewed Heng Tiegang, a database backup expert at Alibaba.
Heng Tiegang (nickname: Pei’en), an Alibaba database backup expert
Why Should Databases Be Backed Up?
I think the answer to this question is already obvious. So, rather than answering this question, I would like to answer another question: what risks can be prevented through database backup? In fact, since its generation, data has always been accompanied by the risks of data loss caused by natural disasters, power failures, network faults, hardware faults, software faults, and human faults.
The point is, even if your database survives from hardware bugs today, a lightning tomorrow, or a power failure the day after tomorrow, you may mistakenly delete data due to a slip of the hand three days from today.
Which Challenges Are Presented by Database Backups?
The first challenge is taking stock of database assets. For an individual user, all these database assets may just be one instance, and the user clearly knows the assets even without stocktaking. However, for an enterprise user, especially a user from a large-sized enterprise, the database can have multiple instances and various database types due to business diversity. In this case, the O&M personnel need to clearly know the numbers, distribution, types (production or core databases), and functions of different databases.
The second challenge is the evaluation of the backup system. While backup is a basic and daily practice, people often find that it does not help during crunch times. The reason is that backup, as a basic task, does not promote the business, and as long as no problems occur, few people remember it. However, once a problem occurs, backup immediately becomes the target of public attention. Backups often do not help during emergencies mainly because people do not take backups seriously enough, so investment in backup is insufficient. Many enterprises claim that backups are a top priority, but never implement them properly.
I recommend that you ask your technical team right away: Is your backup system really effective?
What Is an Effective Backup System?
Different databases can be used for different purposes, and the effectiveness of a backup system varies accordingly. According to their functions, databases can be classified as test databases, production databases, and core databases.
For test databases, you must learn the importance of the database based on its intended use. If the test database is used for personal tests, in most cases data is imported and cleared without being backed up. If the test database is used for R&D, we recommend that you enable the backup function and do not underestimate the importance of backups. This is because all development and testing personnel in the enterprise work on the test database, and a single data problem can immediately cause trouble for the entire team. In addition, a test database is likely to encounter more problems than a production database.
For a production database, first ensure that you have enabled the backup function. Then, evaluate whether the backup cycle meets the requirements, for example, full backup on a daily basis. When a failure occurs, the only up to one day of new data is lost. In this case, you need to check whether the last copy of backup data had been restored and whether the backup data is valid.
For a core database, its importance is higher than that of a test or production database. In addition to the preceding measures, you need to take some other measures. Real-time backup has become a mandatory option for an enterprise to select a database backup solution, because it minimizes the amount of data lost upon a fault. Fast recovery also plays an increasingly significant role for the core database. Based on the risks of potential faults, you can select the optimal recovery solution, perform regular drills on the entire backup and recovery system, and sample the backup data to test the recovery function. I recommend that you develop a policy, which automatically and regularly conducts the entire recovery process and provides drill reports.
Notes:
- Not verifying the validity of the backup data is even worse than not backing up the data. Imagine that all of your business data has been completely destroyed in a disaster. However, when you want to recover the data, you may find that the backup data is corrupted, the files that you backed up are incorrect, or some other terrible thing has happened. In this case, what can you do? A data backup solution without validation can be an even bigger disaster.
You must validate the backup content to ensure that the data has been properly backed up and can be used for recovery. Don’t wait until it is too late. - Don’t insist on large and comprehensive solutions. Diversified requirements must be met by a variety of solutions. In particular, for the core database, the entire instance must be backed up regularly to prevent hardware failures and damage to instances. In addition, each table must be backed up in real time, which often reduces the data recovery time at crunch time by up to 90%.
- Either manual or automatic data validation aims to verify the validity of the backup data used for recovery (also referred to as the recovery data). Verification of the integrity of the recovery data is pretty challenging. In most cases, the recovery data and production data are sampled and compared with each other based on the business characteristics. Alternatively, the recovery database serves as the secondary database and is synchronized with the primary database to verify data integrity.
Which Solutions Are Applicable?
Again, be prepared before the data is lost. Act now to protect your database. Here are some of the solutions that are deployed based on Alibaba Cloud products:
- If your database is located on an Alibaba Cloud ECS instance, use Database Backup Service (DBS) to back up the data to OSS. It takes as little as five minutes to purchase, configure, and start the backup service.
- If your database is located on a local IDC, and remote access to the Internet is enabled for the database, use DBS to directly back the data up. If you have activated Express Connect, use DBS to back up data to OSS. Depending on the DBS region of your choice, you can also implement remote backup.
- If your database is hosted by a cloud vendor other than Alibaba Cloud and remote access to the Internet is enabled for the database, use DDS to directly back up the database. If you have activated the deployment agent service or Express Connect, use DBS to back up data to OSS and implement cross-cloud backup on Alibaba Cloud.
Could You Give Us a Brief Introduction to Your Work?
I am currently in charge of an Alibaba Cloud product called DBS. Have you ever heard of it? As a database backup channel, DBS has been put into commercial use, and is used together with OSS to develop a cloud database backup solution. It takes only five minutes for such a solution to implement real-time backup with a second-level Recovery Point Objective (RPO). The RPO indicates the maximum duration allowed for data loss when the database fails. And of course, a smaller RPO is always desired.
In addition to providing continuous data protection and low-cost backup service for databases, DBS also provides powerful data protection in different environments, including public clouds, enterprise-created IDCs, and other cloud vendors. DBS features low cost, high performance, and zero risks. It provides users with an ideal cloud database backup solution.
Currently, the backup system time achieved by DBS has been tested by massive users. DBS not only supports real-time backup and second-level RPO but also incorporates the table-level recovery capability. It helps users to recover only valuable data and decrease the RTO to several minutes.
About the Author
Heng Tiegang (nickname Pei’en) joined Alibaba in 2011, and was once the MySQL DBA of Alibaba Group. He is currently a database product manager responsible for designing database backup products.