SQL Server Best Practices: Using Filegroups to Implement Cold and Hot Data Isolation Backup Solution

By Wang Jianming, Senior Engineer

In the SQL Server Best Practices Series, we have discussed three key topics, including: Using Asymmetric Keys to Implement Column Encryption, Database Recovery Models and Backups, and Monitoring Backup and Restore Progress.

In this article, we’ll share how SQL Server can use filegroups to implement the cold and hot data isolation backup solution.

Scenario Description

In our example scenario, we’ll assume that a company has a critical large database (over 10 TB) and faces the following scenarios:

  1. The database contains user payment information for nearly 10 years, which is crucial
  2. The annual data is archived and stored in the table for the current year. The data in tables for previous years is read only (the historical payment information no longer needs to be modified), and only the data in the current year’s table is both readable and writable
  3. Each full database backup takes too long, more than 20 hours; the database restore operation takes even longer, more than 30 hours

As a Database Administrator, you need to optimize the design of the database and Backup Recovery System to make backup and restore more efficient. But the question is, how can you do this on a SQL Server?

Introduction to Filegroups

This article will not focus on the details of filegroups. However, as it is the core technology involved, it is necessary to briefly introduce filegroups in SQL Server in terms of their advantages, creation, and usage.

Advantages of Using Filegroups

SQL Server supports the storage of table and index data in Non-Primary filegroups, which provides the following benefits when a database has multiple filegroups:

  1. Distributes I/O pressure to different filegroups. If files in different filegroups are on different disks, disk pressure can be distributed.
  2. DBCC CHECKFILEGROUP operations are performed for different filegroups, and the same database can process multiple processes in parallel, reducing the maintenance time for big data.
  3. Backup and Restore operations can be performed at the filegroup level to control backup and restore strategies in a more granular manner.

Creating a Filegroup When Creating a Database

We can create a filegroup directly when we create a database. The code is as follows:


To ensure the load balancing capability of the database filegroup I/O, keep consistent the initial size and increment parameters of all files to ensure the polling scheduling allocation algorithm works properly.

Creating a Filegroup Alone

If the database already exists, we can also add filegroups. The code is as follows:

The final filegroup information is shown as follows:

Using Filegroups

After filegroups are created, we can put the table and index into the corresponding filegroup. For example:

Scheme Design

Now that we have briefly introduced filegroups as described in the introduction, we are going to cover the use of SQL Server filegroups to implement the solution design for cold and hot data isolation backup.

Design Analysis

The payment database is too large (over 10 TB), so a single full backup takes more than 20 hours. If a full backup is performed normally, the backup file will be too large and the process will take too long, and even normal operations may be affected due to the I/O capacity consumption of the backup operation. If we think about it carefully, we find that, although the database is very large, the amount of data that is undergoing data change operations is not large compared with the entire database. This is because only the data in the table for the current year is constantly changing (hot data), and the data in tables for previous years are not modified (cold data). So, we design the database to put the data in tables for previous years into read-only filegroups and the data in the table for the current year into read-write filegroups. The backup system only needs to back up the Primary and the filegroup containing the table for the current year (but it still needs to perform a one-time full backup of the database the first time). This greatly reduces the I/O capacity consumed by the backup, implements the isolation backup operation for hot and cold data, distributes the I/O pressure of files, and ultimately achieves optimization of the database design and backup system.

According to the text analysis above, a beautiful design diagram and visual representation is shown as follows:

Description of the Design Diagram

The design diagram is detailed below for a more intuitive and thorough understanding of the design solution.

The entire database contains 13 files, including:

  1. 1 Primary filegroup: Users store object information, such as database system tables and views, and filegroups can be read from and written to.
  2. 10 user-defined read-only filegroups: Used to store the data in tables for previous years and the corresponding index data, and each year’s data is stored in a separate filegroup.
  3. 1 user-defined read-write filegroup: Used to store the data of the table for the current year and the corresponding index data. To ensure that the data in this table is readable and writable, the filegroup must be readable and writable.
  4. 1 database transaction log file: Used for the database transaction log, and we must back up the database transaction log regularly.

Solution Implementation

After the solution design is completed, the next step is its collective implementation. The specific implementation includes:

  1. Create a database
  2. Creating a table
  3. Filegroup settings
  4. Implementing cold and hot backup

Creating a Database

While creating the database, we create the Primary filegroup and the filegroup for 2008 to 2017. Note that the initial size and increment of files in all filegroups must be the same. The code is as follows:

Considering that we add new filegroups to the database every year, the filegroups for 2018 are created separately as follows:

Finally, reconfirm the database filegroup information. The code is as follows:

The result is shown in the figure below:

Creating a Table

After the database and corresponding filegroups are created, we create corresponding tables and insert some test data, as follows:

Two things need special attention here:

  1. Due to space limitations, the creation of tables for 2010 to 2017 is omitted from the table creation code. Please complete it yourself
  2. The initial value of the Payment_ID field is different for each table to avoid any duplication of this field value from querying all payment information

Next, we check the filegroup distribution on all the tables as follows:

The query results are partially truncated as follows, from which we can see that all tables and indexes are distributed to the corresponding filegroups as expected.

Finally, we enter some data in the corresponding tables for testing:

Filegroup Settings

After tables are created and the test data is initialized, we set the read and write properties of the filegroup. The code is as follows:

Eventually, the read and write properties of the filegroups are as follows:

The screenshot is shown in the following figure:

Implementing cold and hot backup

After all the filegroups have been created successfully and the read and write properties have been configured, we perform a full backup, differential backup, and database-level log backup for the readable and writable filegroups in the database. To facilitate testing, we insert a piece of data between the two backups. Generally, the backup operation consists of the following:

  1. Perform a one-time full backup of the entire database.
  2. Perform a periodic full backup of readable and writable filegroups.
  3. Perform a periodic differential backup of readable and writable filegroups.
  4. Perform a periodic transaction log backup of the entire database.

The advantage of this backup is that we only need to perform full and differential backups of the readable and writable filegroups (FGPayment2018) (Primary contains system objects with little change. In the actual scenario, Primary filegroup also needs to be backed up), while the other 9 read-only filegroups do not need to be backed up because the data will no longer change. In this way, we have implemented the cold and hot data isolation backup solution.

The next problem is, how can we recover the database from the backup set in case of Payment data disaster, resulting in data loss? We can recover the backup set as follows:

  1. Restore the one-time full backup of the entire database
  2. Restore the last full backup of all readable and writable filegroups
  3. Restore the last differential backup of readable and writable filegroups
  4. Restore all transaction log backups of the entire database

Finally, check the results of data restore. According to the test data we inserted, there should be four records.

According to the execution results, there are four result sets that meet our expectations. The screenshot is as follows:


This monthly report shows how to use SQL Server filegroup to implement and optimize cold and hot data isolation backup, which greatly improve the efficiency of database backup and restore and provides load balancing of I/O resources, improving and optimizing the performance of the entire database.

To learn more about Alibaba Cloud Database Backup product, visit https://www.alibabacloud.com/products/database-backup


Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store