SQL Server Best Practices: Using Filegroups to Implement Cold and Hot Data Isolation Backup Solution

By Wang Jianming, Senior Engineer

In the SQL Server Best Practices Series, we have discussed three key topics, including: Using Asymmetric Keys to Implement Column Encryption, Database Recovery Models and Backups, and Monitoring Backup and Restore Progress.

In this article, we’ll share how SQL Server can use filegroups to implement the cold and hot data isolation backup solution.

Scenario Description

In our example scenario, we’ll assume that a company has a critical large database (over 10 TB) and faces the following scenarios:

  1. The database contains user payment information for nearly 10 years, which is crucial

As a Database Administrator, you need to optimize the design of the database and Backup Recovery System to make backup and restore more efficient. But the question is, how can you do this on a SQL Server?

Introduction to Filegroups

This article will not focus on the details of filegroups. However, as it is the core technology involved, it is necessary to briefly introduce filegroups in SQL Server in terms of their advantages, creation, and usage.

Advantages of Using Filegroups

SQL Server supports the storage of table and index data in Non-Primary filegroups, which provides the following benefits when a database has multiple filegroups:

  1. Distributes I/O pressure to different filegroups. If files in different filegroups are on different disks, disk pressure can be distributed.

Creating a Filegroup When Creating a Database

We can create a filegroup directly when we create a database. The code is as follows:

USE master
GO
EXEC sys.xp_create_subdir 'C:\SQLServer\Data\'
EXEC sys.xp_create_subdir 'C:\SQLServer\Logs\'
CREATE DATABASE [TestFG]
ON PRIMARY
( NAME = N'TestFG', FILENAME = N'C:\SQLServer\Data\TestFG.mdf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FG2010]
( NAME = N'FG2010', FILENAME = N'C:\SQLServer\Data\FG2010.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FG2011]
( NAME = N'FG2011', FILENAME = N'C:\SQLServer\Data\FG2011.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FG2012]
( NAME = N'FG2012', FILENAME = N'C:\SQLServer\Data\FG2012.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB )
LOG ON
( NAME = N'TestFG_log', FILENAME = N'C:\SQLServer\Logs\TestFG_log.ldf' , SIZE = 5MB , FILEGROWTH = 50MB)
GO

Note:

To ensure the load balancing capability of the database filegroup I/O, keep consistent the initial size and increment parameters of all files to ensure the polling scheduling allocation algorithm works properly.

Creating a Filegroup Alone

If the database already exists, we can also add filegroups. The code is as follows:

-- Add filegroup FG2013
USE master
GO
ALTER DATABASE [TestFG] ADD FILEGROUP [FG2013];
-- Add data file to FG2013
ALTER DATABASE [TestDb]
ADD FILE (NAME = FG2013, SIZE = 5MB , FILEGROWTH = 50MB ,FILENAME = N'C:\SQLServer\Data\FG2013.ndf')
TO FILEGROUP [FG2013]
GO
USE [TestFG]
GO
SELECT * FROM sys.filegroups

The final filegroup information is shown as follows:

Image for post
Image for post

Using Filegroups

After filegroups are created, we can put the table and index into the corresponding filegroup. For example:

Put the clustered index into the PRIMARY filegroup; table and index data are placed in the FG2010 filegroup. The code is as follows:
USE [TestFG]
GO
CREATE TABLE [dbo].[ Orders_2010](
[OrderID] [int] IDENTITY(1,1) NOT NULL,
[OrderDate] [datetime] NULL,
CONSTRAINT [PK_Orders_2010] PRIMARY KEY CLUSTERED
(
[OrderID] ASC
) ON [PRIMARY]
) ON [FG2010]
GO
CREATE NONCLUSTERED INDEX IX_OrderDate
ON [dbo].[ Orders_2010](OrderDate)
ON [FG2010];

Scheme Design

Now that we have briefly introduced filegroups as described in the introduction, we are going to cover the use of SQL Server filegroups to implement the solution design for cold and hot data isolation backup.

Design Analysis

The payment database is too large (over 10 TB), so a single full backup takes more than 20 hours. If a full backup is performed normally, the backup file will be too large and the process will take too long, and even normal operations may be affected due to the I/O capacity consumption of the backup operation. If we think about it carefully, we find that, although the database is very large, the amount of data that is undergoing data change operations is not large compared with the entire database. This is because only the data in the table for the current year is constantly changing (hot data), and the data in tables for previous years are not modified (cold data). So, we design the database to put the data in tables for previous years into read-only filegroups and the data in the table for the current year into read-write filegroups. The backup system only needs to back up the Primary and the filegroup containing the table for the current year (but it still needs to perform a one-time full backup of the database the first time). This greatly reduces the I/O capacity consumed by the backup, implements the isolation backup operation for hot and cold data, distributes the I/O pressure of files, and ultimately achieves optimization of the database design and backup system.

According to the text analysis above, a beautiful design diagram and visual representation is shown as follows:

Image for post
Image for post

Description of the Design Diagram

The design diagram is detailed below for a more intuitive and thorough understanding of the design solution.

The entire database contains 13 files, including:

  1. 1 Primary filegroup: Users store object information, such as database system tables and views, and filegroups can be read from and written to.

Solution Implementation

After the solution design is completed, the next step is its collective implementation. The specific implementation includes:

  1. Create a database

Creating a Database

While creating the database, we create the Primary filegroup and the filegroup for 2008 to 2017. Note that the initial size and increment of files in all filegroups must be the same. The code is as follows:

USE master
GO
EXEC sys.xp_create_subdir 'C:\DATA\Payment\Data\'
EXEC sys.xp_create_subdir 'C:\DATA\Payment\Log\'
CREATE DATABASE [Payment]
ON PRIMARY
( NAME = N'Payment', FILENAME = N'C:\DATA\Payment\Data\Payment.mdf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2008]
( NAME = N'FGPayment2008', FILENAME = N'C:\DATA\Payment\Data\Payment_2008.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2009]
( NAME = N'FGPayment2009', FILENAME = N'C:\DATA\Payment\Data\Payment_2009.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2010]
( NAME = N'FGPayment2010', FILENAME = N'C:\DATA\Payment\Data\Payment_2010.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2011]
( NAME = N'FGPayment2011', FILENAME = N'C:\DATA\Payment\Data\Payment_2011.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2012]
( NAME = N'FGPayment2012', FILENAME = N'C:\DATA\Payment\Data\Payment_2012.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2013]
( NAME = N'FGPayment2013', FILENAME = N'C:\DATA\Payment\Data\Payment_2013.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2014]
( NAME = N'FGPayment2014', FILENAME = N'C:\DATA\Payment\Data\Payment_2014.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2015]
( NAME = N'FGPayment2015', FILENAME = N'C:\DATA\Payment\Data\Payment_2015.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2016]
( NAME = N'FGPayment2016', FILENAME = N'C:\DATA\Payment\Data\Payment_2016.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2017]
( NAME = N'FGPayment2017', FILENAME = N'C:\DATA\Payment\Data\Payment_2017.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB )
LOG ON
( NAME = N'Payment_log', FILENAME = N'C:\DATA\Payment\Log\Payment_log.ldf' , SIZE = 5MB , FILEGROWTH = 50MB)
GO

Considering that we add new filegroups to the database every year, the filegroups for 2018 are created separately as follows:

-- Add filegroup FGPayment2018
USE master
GO
ALTER DATABASE [Payment] ADD FILEGROUP [FGPayment2018];
-- Add data file to FGPayment2018
ALTER DATABASE [Payment]
ADD FILE (NAME = FGPayment2018, SIZE = 5MB , FILEGROWTH = 50MB ,FILENAME = N'C:\DATA\Payment\Data\Payment_2018.ndf')
TO FILEGROUP [FGPayment2018]
GO

Finally, reconfirm the database filegroup information. The code is as follows:

USE [Payment]
GO
SELECT file_name = mf.name, filegroup_name = fg.name, mf.physical_name,mf.size,mf.growth
FROM sys.master_files AS mf
INNER JOIN sys.filegroups as fg
ON mf.data_space_id = fg.data_space_id
WHERE mf.database_id = db_id('Payment')
ORDER BY mf.type;

The result is shown in the figure below:

Image for post
Image for post

Creating a Table

After the database and corresponding filegroups are created, we create corresponding tables and insert some test data, as follows:

USE [Payment]
GO
CREATE TABLE [dbo].[ Payment_2008](
[Payment_ID] [bigint] IDENTITY(12008,100) NOT NULL,
[OrderID] [bigint] NOT NULL,
CONSTRAINT [PK_Payment_2008] PRIMARY KEY CLUSTERED
(
[Payment_ID] ASC
) ON [FGPayment2008]
) ON [FGPayment2008]
GO
CREATE NONCLUSTERED INDEX IX_OrderID
ON [dbo].[ Payment_2008]([OrderID])
ON [FGPayment2008];
CREATE TABLE [dbo].[ Payment_2009](
[Payment_ID] [bigint] IDENTITY(12009,100) NOT NULL,
[OrderID] [bigint] NOT NULL,
CONSTRAINT [PK_Payment_2009] PRIMARY KEY CLUSTERED
(
[Payment_ID] ASC
) ON [FGPayment2009]
) ON [FGPayment2009]
GO
CREATE NONCLUSTERED INDEX IX_OrderID
ON [dbo].[ Payment_2009]([OrderID])
ON [FGPayment2009];
-- The creation of tables for 2010 to 2017 is omitted here. Refer to the above table creation and index code to complete it yourself
CREATE TABLE [dbo].[ Payment_2018](
[Payment_ID] [bigint] IDENTITY(12018,100) NOT NULL,
[OrderID] [bigint] NOT NULL,
CONSTRAINT [PK_Payment_2018] PRIMARY KEY CLUSTERED
(
[Payment_ID] ASC
) ON [FGPayment2018]
) ON [FGPayment2018]
GO
CREATE NONCLUSTERED INDEX IX_OrderID
ON [dbo].[ Payment_2018]([OrderID])
ON [FGPayment2018];

Two things need special attention here:

  1. Due to space limitations, the creation of tables for 2010 to 2017 is omitted from the table creation code. Please complete it yourself

Next, we check the filegroup distribution on all the tables as follows:

USE [Payment]
GO
SELECT table_name = tb.[name], index_name = ix.[name], located_filegroup_name = fg.[name]
FROM sys.indexes ix
INNER JOIN sys.filegroups fg
ON ix.data_space_id = fg.data_space_id
INNER JOIN sys.tables tb
ON ix.[object_id] = tb.[object_id]
WHERE ix.data_space_id = fg.data_space_id
GO

The query results are partially truncated as follows, from which we can see that all tables and indexes are distributed to the corresponding filegroups as expected.

Image for post
Image for post

Finally, we enter some data in the corresponding tables for testing:

USE [Payment]
GO
SET NOCOUNT ON
INSERT INTO [Payment_2008] SELECT 2008;
INSERT INTO [Payment_2009] SELECT 2009;
-- The code for 2010¨C2017 has been omitted. Please complete it yourself
INSERT INTO [Payment_2018] SELECT 2018;

Filegroup Settings

After tables are created and the test data is initialized, we set the read and write properties of the filegroup. The code is as follows:

USE master
GO
ALTER DATABASE [Payment] MODIFY FILEGROUP [FGPayment2008] READ_ONLY;
ALTER DATABASE [Payment] MODIFY FILEGROUP [FGPayment2009] READ_ONLY;
-- The setting of the read-only property for the 2010¨C2017 filegroups is omitted here. Please complete it yourself
ALTER DATABASE [Payment] MODIFY FILEGROUP [FGPayment2018] READ_WRITE;

Eventually, the read and write properties of the filegroups are as follows:

USE [Payment]
GO
SELECT name, is_default, is_read_only FROM sys.filegroups
GO

The screenshot is shown in the following figure:

Image for post
Image for post

Implementing cold and hot backup

After all the filegroups have been created successfully and the read and write properties have been configured, we perform a full backup, differential backup, and database-level log backup for the readable and writable filegroups in the database. To facilitate testing, we insert a piece of data between the two backups. Generally, the backup operation consists of the following:

  1. Perform a one-time full backup of the entire database.
-- Make a one-time full backup of payment database
USE [master];
GO
BACKUP DATABASE [Payment]
TO DISK = N'C:\DATA\Payment\BACKUP\Payment_20180316_full.bak'
WITH COMPRESSION, Stats=5
;
GO
-- For testing, init one record
USE [Payment];
GO
INSERT INTO [dbo].[ Payment_2018] SELECT 201801;
GO
-- Make a full backup for each writable filegoup (just backup FGPayment2018 as an example)
BACKUP DATABASE [Payment]
FILEGROUP = 'FGPayment2018'
TO DISK = 'C:\DATA\Payment\BACKUP\Payment_FGPayment2018_20180316_full.bak'
WITH COMPRESSION, Stats=5
;
GO
-- for testing, insert one record
INSERT INTO [dbo].[ Payment_2018] SELECT 201802;
GO
--Take a differential backup for each writable filegoup (just backup FGPayment2018 as an example)
BACKUP DATABASE [Payment]
FILEGROUP = N'FGPayment2018'
TO DISK = N'C:\DATA\Payment\BACKUP\Payment_FGPayment2018_20180316_diff.bak'
WITH DIFFERENTIAL, COMPRESSION, Stats=5
;
GO
-- for testing, insert one record
INSERT INTO [dbo].[ Payment_2018] SELECT 201803;
GO
-- Take a transaction log backup of database payment
BACKUP LOG [Payment]
TO DISK = 'C:\DATA\Payment\BACKUP\Payment_20180316_log.trn';
GO

The advantage of this backup is that we only need to perform full and differential backups of the readable and writable filegroups (FGPayment2018) (Primary contains system objects with little change. In the actual scenario, Primary filegroup also needs to be backed up), while the other 9 read-only filegroups do not need to be backed up because the data will no longer change. In this way, we have implemented the cold and hot data isolation backup solution.

The next problem is, how can we recover the database from the backup set in case of Payment data disaster, resulting in data loss? We can recover the backup set as follows:

  1. Restore the one-time full backup of the entire database
-- We restore full backup
USE master
GO
RESTORE DATABASE [Payment_Dev]
FROM DISK=N'C:\DATA\Payment\BACKUP\Payment_20180316_full.bak' WITH
MOVE 'Payment' TO 'C:\DATA\Payment_Dev\Data\Payment_dev.mdf',
MOVE 'FGPayment2008' TO 'C:\DATA\Payment_Dev\Data\FGPayment2008_dev.ndf',
MOVE 'FGPayment2009' TO 'C:\DATA\Payment_Dev\Data\FGPayment2009_dev.ndf',
MOVE 'FGPayment2010' TO 'C:\DATA\Payment_Dev\Data\FGPayment2010_dev.ndf',
MOVE 'FGPayment2011' TO 'C:\DATA\Payment_Dev\Data\FGPayment2011_dev.ndf',
MOVE 'FGPayment2012' TO 'C:\DATA\Payment_Dev\Data\FGPayment2012_dev.ndf',
MOVE 'FGPayment2013' TO 'C:\DATA\Payment_Dev\Data\FGPayment2013_dev.ndf',
MOVE 'FGPayment2014' TO 'C:\DATA\Payment_Dev\Data\FGPayment2014_dev.ndf',
MOVE 'FGPayment2015' TO 'C:\DATA\Payment_Dev\Data\FGPayment2015_dev.ndf',
MOVE 'FGPayment2016' TO 'C:\DATA\Payment_Dev\Data\FGPayment2016_dev.ndf',
MOVE 'FGPayment2017' TO 'C:\DATA\Payment_Dev\Data\FGPayment2017_dev.ndf',
MOVE 'FGPayment2018' TO 'C:\DATA\Payment_Dev\Data\FGPayment2018_dev.ndf',
MOVE 'Payment_log' TO 'C:\DATA\Payment_Dev\Log\Payment_dev_log.ldf',
NORECOVERY,STATS=5;
GO
-- restore writable filegroup full backup
RESTORE DATABASE [Payment_Dev]
FILEGROUP = N'FGPayment2018'
FROM DISK = N'C:\DATA\Payment\BACKUP\Payment_FGPayment2018_20180316_full.bak'
WITH NORECOVERY,STATS=5;
GO
-- restore writable filegroup differential backup
RESTORE DATABASE [Payment_Dev]
FILEGROUP = N'FGPayment2018'
FROM DISK = N'C:\DATA\Payment\BACKUP\Payment_FGPayment2018_20180316_diff.bak'
WITH NORECOVERY,STATS=5;
GO
-- restore payment database transaction log backup
RESTORE LOG [Payment_Dev]
FROM DISK = N'C:\DATA\Payment\BACKUP\\Payment_20180316_log.trn'
WITH NORECOVERY;
GO
-- Take database online to check
RESTORE DATABASE [Payment_Dev] WITH RECOVERY;
GO

Finally, check the results of data restore. According to the test data we inserted, there should be four records.

USE [Payment_Dev]
GO
SELECT * FROM [dbo].[ Payment_2018] WITH(NOLOCK)

According to the execution results, there are four result sets that meet our expectations. The screenshot is as follows:

Image for post
Image for post

Summary

This monthly report shows how to use SQL Server filegroup to implement and optimize cold and hot data isolation backup, which greatly improve the efficiency of database backup and restore and provides load balancing of I/O resources, improving and optimizing the performance of the entire database.

To learn more about Alibaba Cloud Database Backup product, visit https://www.alibabacloud.com/products/database-backup

Reference:https://www.alibabacloud.com/blog/sql-server-best-practices-using-filegroups-to-implement-cold-and-hot-data-isolation-backup-solution_594480?spm=a2c41.12584120.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store