SQL Server Best Practices: Using Filegroups to Implement Cold and Hot Data Isolation Backup Solution

Scenario Description

In our example scenario, we’ll assume that a company has a critical large database (over 10 TB) and faces the following scenarios:

  1. The database contains user payment information for nearly 10 years, which is crucial
  2. The annual data is archived and stored in the table for the current year. The data in tables for previous years is read only (the historical payment information no longer needs to be modified), and only the data in the current year’s table is both readable and writable
  3. Each full database backup takes too long, more than 20 hours; the database restore operation takes even longer, more than 30 hours

Introduction to Filegroups

This article will not focus on the details of filegroups. However, as it is the core technology involved, it is necessary to briefly introduce filegroups in SQL Server in terms of their advantages, creation, and usage.

Advantages of Using Filegroups

SQL Server supports the storage of table and index data in Non-Primary filegroups, which provides the following benefits when a database has multiple filegroups:

  1. Distributes I/O pressure to different filegroups. If files in different filegroups are on different disks, disk pressure can be distributed.
  2. DBCC CHECKFILEGROUP operations are performed for different filegroups, and the same database can process multiple processes in parallel, reducing the maintenance time for big data.
  3. Backup and Restore operations can be performed at the filegroup level to control backup and restore strategies in a more granular manner.

Creating a Filegroup When Creating a Database

We can create a filegroup directly when we create a database. The code is as follows:

USE master
GO
EXEC sys.xp_create_subdir 'C:\SQLServer\Data\'
EXEC sys.xp_create_subdir 'C:\SQLServer\Logs\'
CREATE DATABASE [TestFG]
ON PRIMARY
( NAME = N'TestFG', FILENAME = N'C:\SQLServer\Data\TestFG.mdf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FG2010]
( NAME = N'FG2010', FILENAME = N'C:\SQLServer\Data\FG2010.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FG2011]
( NAME = N'FG2011', FILENAME = N'C:\SQLServer\Data\FG2011.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FG2012]
( NAME = N'FG2012', FILENAME = N'C:\SQLServer\Data\FG2012.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB )
LOG ON
( NAME = N'TestFG_log', FILENAME = N'C:\SQLServer\Logs\TestFG_log.ldf' , SIZE = 5MB , FILEGROWTH = 50MB)
GO

Creating a Filegroup Alone

If the database already exists, we can also add filegroups. The code is as follows:

-- Add filegroup FG2013
USE master
GO
ALTER DATABASE [TestFG] ADD FILEGROUP [FG2013];
-- Add data file to FG2013
ALTER DATABASE [TestDb]
ADD FILE (NAME = FG2013, SIZE = 5MB , FILEGROWTH = 50MB ,FILENAME = N'C:\SQLServer\Data\FG2013.ndf')
TO FILEGROUP [FG2013]
GO
USE [TestFG]
GO
SELECT * FROM sys.filegroups

Using Filegroups

After filegroups are created, we can put the table and index into the corresponding filegroup. For example:

Put the clustered index into the PRIMARY filegroup; table and index data are placed in the FG2010 filegroup. The code is as follows:
USE [TestFG]
GO
CREATE TABLE [dbo].[ Orders_2010](
[OrderID] [int] IDENTITY(1,1) NOT NULL,
[OrderDate] [datetime] NULL,
CONSTRAINT [PK_Orders_2010] PRIMARY KEY CLUSTERED
(
[OrderID] ASC
) ON [PRIMARY]
) ON [FG2010]
GO
CREATE NONCLUSTERED INDEX IX_OrderDate
ON [dbo].[ Orders_2010](OrderDate)
ON [FG2010];

Scheme Design

Now that we have briefly introduced filegroups as described in the introduction, we are going to cover the use of SQL Server filegroups to implement the solution design for cold and hot data isolation backup.

Design Analysis

The payment database is too large (over 10 TB), so a single full backup takes more than 20 hours. If a full backup is performed normally, the backup file will be too large and the process will take too long, and even normal operations may be affected due to the I/O capacity consumption of the backup operation. If we think about it carefully, we find that, although the database is very large, the amount of data that is undergoing data change operations is not large compared with the entire database. This is because only the data in the table for the current year is constantly changing (hot data), and the data in tables for previous years are not modified (cold data). So, we design the database to put the data in tables for previous years into read-only filegroups and the data in the table for the current year into read-write filegroups. The backup system only needs to back up the Primary and the filegroup containing the table for the current year (but it still needs to perform a one-time full backup of the database the first time). This greatly reduces the I/O capacity consumed by the backup, implements the isolation backup operation for hot and cold data, distributes the I/O pressure of files, and ultimately achieves optimization of the database design and backup system.

Description of the Design Diagram

The design diagram is detailed below for a more intuitive and thorough understanding of the design solution.

  1. 1 Primary filegroup: Users store object information, such as database system tables and views, and filegroups can be read from and written to.
  2. 10 user-defined read-only filegroups: Used to store the data in tables for previous years and the corresponding index data, and each year’s data is stored in a separate filegroup.
  3. 1 user-defined read-write filegroup: Used to store the data of the table for the current year and the corresponding index data. To ensure that the data in this table is readable and writable, the filegroup must be readable and writable.
  4. 1 database transaction log file: Used for the database transaction log, and we must back up the database transaction log regularly.

Solution Implementation

After the solution design is completed, the next step is its collective implementation. The specific implementation includes:

  1. Create a database
  2. Creating a table
  3. Filegroup settings
  4. Implementing cold and hot backup

Creating a Database

While creating the database, we create the Primary filegroup and the filegroup for 2008 to 2017. Note that the initial size and increment of files in all filegroups must be the same. The code is as follows:

USE master
GO
EXEC sys.xp_create_subdir 'C:\DATA\Payment\Data\'
EXEC sys.xp_create_subdir 'C:\DATA\Payment\Log\'
CREATE DATABASE [Payment]
ON PRIMARY
( NAME = N'Payment', FILENAME = N'C:\DATA\Payment\Data\Payment.mdf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2008]
( NAME = N'FGPayment2008', FILENAME = N'C:\DATA\Payment\Data\Payment_2008.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2009]
( NAME = N'FGPayment2009', FILENAME = N'C:\DATA\Payment\Data\Payment_2009.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2010]
( NAME = N'FGPayment2010', FILENAME = N'C:\DATA\Payment\Data\Payment_2010.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2011]
( NAME = N'FGPayment2011', FILENAME = N'C:\DATA\Payment\Data\Payment_2011.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2012]
( NAME = N'FGPayment2012', FILENAME = N'C:\DATA\Payment\Data\Payment_2012.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2013]
( NAME = N'FGPayment2013', FILENAME = N'C:\DATA\Payment\Data\Payment_2013.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2014]
( NAME = N'FGPayment2014', FILENAME = N'C:\DATA\Payment\Data\Payment_2014.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2015]
( NAME = N'FGPayment2015', FILENAME = N'C:\DATA\Payment\Data\Payment_2015.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2016]
( NAME = N'FGPayment2016', FILENAME = N'C:\DATA\Payment\Data\Payment_2016.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB ),
FILEGROUP [FGPayment2017]
( NAME = N'FGPayment2017', FILENAME = N'C:\DATA\Payment\Data\Payment_2017.ndf' , SIZE = 5MB ,FILEGROWTH = 50MB )
LOG ON
( NAME = N'Payment_log', FILENAME = N'C:\DATA\Payment\Log\Payment_log.ldf' , SIZE = 5MB , FILEGROWTH = 50MB)
GO
-- Add filegroup FGPayment2018
USE master
GO
ALTER DATABASE [Payment] ADD FILEGROUP [FGPayment2018];
-- Add data file to FGPayment2018
ALTER DATABASE [Payment]
ADD FILE (NAME = FGPayment2018, SIZE = 5MB , FILEGROWTH = 50MB ,FILENAME = N'C:\DATA\Payment\Data\Payment_2018.ndf')
TO FILEGROUP [FGPayment2018]
GO
USE [Payment]
GO
SELECT file_name = mf.name, filegroup_name = fg.name, mf.physical_name,mf.size,mf.growth
FROM sys.master_files AS mf
INNER JOIN sys.filegroups as fg
ON mf.data_space_id = fg.data_space_id
WHERE mf.database_id = db_id('Payment')
ORDER BY mf.type;

Creating a Table

After the database and corresponding filegroups are created, we create corresponding tables and insert some test data, as follows:

USE [Payment]
GO
CREATE TABLE [dbo].[ Payment_2008](
[Payment_ID] [bigint] IDENTITY(12008,100) NOT NULL,
[OrderID] [bigint] NOT NULL,
CONSTRAINT [PK_Payment_2008] PRIMARY KEY CLUSTERED
(
[Payment_ID] ASC
) ON [FGPayment2008]
) ON [FGPayment2008]
GO
CREATE NONCLUSTERED INDEX IX_OrderID
ON [dbo].[ Payment_2008]([OrderID])
ON [FGPayment2008];
CREATE TABLE [dbo].[ Payment_2009](
[Payment_ID] [bigint] IDENTITY(12009,100) NOT NULL,
[OrderID] [bigint] NOT NULL,
CONSTRAINT [PK_Payment_2009] PRIMARY KEY CLUSTERED
(
[Payment_ID] ASC
) ON [FGPayment2009]
) ON [FGPayment2009]
GO
CREATE NONCLUSTERED INDEX IX_OrderID
ON [dbo].[ Payment_2009]([OrderID])
ON [FGPayment2009];
-- The creation of tables for 2010 to 2017 is omitted here. Refer to the above table creation and index code to complete it yourself
CREATE TABLE [dbo].[ Payment_2018](
[Payment_ID] [bigint] IDENTITY(12018,100) NOT NULL,
[OrderID] [bigint] NOT NULL,
CONSTRAINT [PK_Payment_2018] PRIMARY KEY CLUSTERED
(
[Payment_ID] ASC
) ON [FGPayment2018]
) ON [FGPayment2018]
GO
CREATE NONCLUSTERED INDEX IX_OrderID
ON [dbo].[ Payment_2018]([OrderID])
ON [FGPayment2018];
  1. Due to space limitations, the creation of tables for 2010 to 2017 is omitted from the table creation code. Please complete it yourself
  2. The initial value of the Payment_ID field is different for each table to avoid any duplication of this field value from querying all payment information
USE [Payment]
GO
SELECT table_name = tb.[name], index_name = ix.[name], located_filegroup_name = fg.[name]
FROM sys.indexes ix
INNER JOIN sys.filegroups fg
ON ix.data_space_id = fg.data_space_id
INNER JOIN sys.tables tb
ON ix.[object_id] = tb.[object_id]
WHERE ix.data_space_id = fg.data_space_id
GO
USE [Payment]
GO
SET NOCOUNT ON
INSERT INTO [Payment_2008] SELECT 2008;
INSERT INTO [Payment_2009] SELECT 2009;
-- The code for 2010¨C2017 has been omitted. Please complete it yourself
INSERT INTO [Payment_2018] SELECT 2018;

Filegroup Settings

After tables are created and the test data is initialized, we set the read and write properties of the filegroup. The code is as follows:

USE master
GO
ALTER DATABASE [Payment] MODIFY FILEGROUP [FGPayment2008] READ_ONLY;
ALTER DATABASE [Payment] MODIFY FILEGROUP [FGPayment2009] READ_ONLY;
-- The setting of the read-only property for the 2010¨C2017 filegroups is omitted here. Please complete it yourself
ALTER DATABASE [Payment] MODIFY FILEGROUP [FGPayment2018] READ_WRITE;
USE [Payment]
GO
SELECT name, is_default, is_read_only FROM sys.filegroups
GO

Implementing cold and hot backup

After all the filegroups have been created successfully and the read and write properties have been configured, we perform a full backup, differential backup, and database-level log backup for the readable and writable filegroups in the database. To facilitate testing, we insert a piece of data between the two backups. Generally, the backup operation consists of the following:

  1. Perform a one-time full backup of the entire database.
  2. Perform a periodic full backup of readable and writable filegroups.
  3. Perform a periodic differential backup of readable and writable filegroups.
  4. Perform a periodic transaction log backup of the entire database.
-- Make a one-time full backup of payment database
USE [master];
GO
BACKUP DATABASE [Payment]
TO DISK = N'C:\DATA\Payment\BACKUP\Payment_20180316_full.bak'
WITH COMPRESSION, Stats=5
;
GO
-- For testing, init one record
USE [Payment];
GO
INSERT INTO [dbo].[ Payment_2018] SELECT 201801;
GO
-- Make a full backup for each writable filegoup (just backup FGPayment2018 as an example)
BACKUP DATABASE [Payment]
FILEGROUP = 'FGPayment2018'
TO DISK = 'C:\DATA\Payment\BACKUP\Payment_FGPayment2018_20180316_full.bak'
WITH COMPRESSION, Stats=5
;
GO
-- for testing, insert one record
INSERT INTO [dbo].[ Payment_2018] SELECT 201802;
GO
--Take a differential backup for each writable filegoup (just backup FGPayment2018 as an example)
BACKUP DATABASE [Payment]
FILEGROUP = N'FGPayment2018'
TO DISK = N'C:\DATA\Payment\BACKUP\Payment_FGPayment2018_20180316_diff.bak'
WITH DIFFERENTIAL, COMPRESSION, Stats=5
;
GO
-- for testing, insert one record
INSERT INTO [dbo].[ Payment_2018] SELECT 201803;
GO
-- Take a transaction log backup of database payment
BACKUP LOG [Payment]
TO DISK = 'C:\DATA\Payment\BACKUP\Payment_20180316_log.trn';
GO
  1. Restore the one-time full backup of the entire database
  2. Restore the last full backup of all readable and writable filegroups
  3. Restore the last differential backup of readable and writable filegroups
  4. Restore all transaction log backups of the entire database
-- We restore full backup
USE master
GO
RESTORE DATABASE [Payment_Dev]
FROM DISK=N'C:\DATA\Payment\BACKUP\Payment_20180316_full.bak' WITH
MOVE 'Payment' TO 'C:\DATA\Payment_Dev\Data\Payment_dev.mdf',
MOVE 'FGPayment2008' TO 'C:\DATA\Payment_Dev\Data\FGPayment2008_dev.ndf',
MOVE 'FGPayment2009' TO 'C:\DATA\Payment_Dev\Data\FGPayment2009_dev.ndf',
MOVE 'FGPayment2010' TO 'C:\DATA\Payment_Dev\Data\FGPayment2010_dev.ndf',
MOVE 'FGPayment2011' TO 'C:\DATA\Payment_Dev\Data\FGPayment2011_dev.ndf',
MOVE 'FGPayment2012' TO 'C:\DATA\Payment_Dev\Data\FGPayment2012_dev.ndf',
MOVE 'FGPayment2013' TO 'C:\DATA\Payment_Dev\Data\FGPayment2013_dev.ndf',
MOVE 'FGPayment2014' TO 'C:\DATA\Payment_Dev\Data\FGPayment2014_dev.ndf',
MOVE 'FGPayment2015' TO 'C:\DATA\Payment_Dev\Data\FGPayment2015_dev.ndf',
MOVE 'FGPayment2016' TO 'C:\DATA\Payment_Dev\Data\FGPayment2016_dev.ndf',
MOVE 'FGPayment2017' TO 'C:\DATA\Payment_Dev\Data\FGPayment2017_dev.ndf',
MOVE 'FGPayment2018' TO 'C:\DATA\Payment_Dev\Data\FGPayment2018_dev.ndf',
MOVE 'Payment_log' TO 'C:\DATA\Payment_Dev\Log\Payment_dev_log.ldf',
NORECOVERY,STATS=5;
GO
-- restore writable filegroup full backup
RESTORE DATABASE [Payment_Dev]
FILEGROUP = N'FGPayment2018'
FROM DISK = N'C:\DATA\Payment\BACKUP\Payment_FGPayment2018_20180316_full.bak'
WITH NORECOVERY,STATS=5;
GO
-- restore writable filegroup differential backup
RESTORE DATABASE [Payment_Dev]
FILEGROUP = N'FGPayment2018'
FROM DISK = N'C:\DATA\Payment\BACKUP\Payment_FGPayment2018_20180316_diff.bak'
WITH NORECOVERY,STATS=5;
GO
-- restore payment database transaction log backup
RESTORE LOG [Payment_Dev]
FROM DISK = N'C:\DATA\Payment\BACKUP\\Payment_20180316_log.trn'
WITH NORECOVERY;
GO
-- Take database online to check
RESTORE DATABASE [Payment_Dev] WITH RECOVERY;
GO
USE [Payment_Dev]
GO
SELECT * FROM [dbo].[ Payment_2018] WITH(NOLOCK)

Summary

This monthly report shows how to use SQL Server filegroup to implement and optimize cold and hot data isolation backup, which greatly improve the efficiency of database backup and restore and provides load balancing of I/O resources, improving and optimizing the performance of the entire database.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com