There are numerous articles about best practice partitioning approaches for SQL Server, and they essentially deal with a similar scenario – continuous population of historical data where a date column is coupled to the data’s relevant lifespan. The partitioning scheme uses this date as the partitioning column, and the oldest partition is archived out of the primary table on regular time intervals (usually via the ALTER TABLE … SWITCH statement into a staging table).
But what approach is best when the historical data is a mixed bag – where some data needs to persist indefinitely (regardless of its origin date) and the rest fits into the more traditional approach described above? This article describes a solution for this business problem.
A car service has an audit database that stores years of the before and after values of a subset of its OLTP tables. Not surprisingly, the large data volumes in the audit database’s tables require partitioning to allow for easier management of data files and backups.
Included in the OLTP subset are tables containing timeless business data like customers (name, address, contact info, etc…) and drivers (name, availability, etc…), and tables containing expirable data like data related to a specific trip including pickup location, duration, and flight arrival or departure information.
Each change in the OLTP database becomes an audit header row identifying the table and change date and child detail rows with the before and after values of the columns updated in transaction.
A user of the archive’s reporting system would need to see the history of customer or driver changes across all time, but only need to see trip data for trip’s that occurred within the last six months.
A “standard” partition function and scheme on the change date works for the “expiring” data but not for the timeless data. What is needed is an abstraction layer to calculate a partition from the source table and change date.
These are the abbreviated scripts and sample data of the primary database objects which implement this solution. Detailed descriptions follow.
CREATE TABLE [Partition]( [PartitionNumber] [smallint] NOT NULL, [IsArchived] [bit] NOT NULL DEFAULT ((0)), [IsReadOnly] [bit] NOT NULL DEFAULT ((0)), [FinalBackupTaken] [bit] NOT NULL DEFAULT ((0)) )
CREATE TABLE [PartitionMap]( [SourceTableId] [smallint] NOT NULL, [StartDate] [date] NOT NULL, [EndDateExclusive] [date] NOT NULL, [PartitionNumber] [smallint] NOT NULL )
CREATE PARTITION FUNCTION [pfAuditHistory](smallint) AS RANGE RIGHT FOR VALUES (1, 2, 3, 4, 5, ...) CREATE PARTITION SCHEME [psAuditHistory] AS PARTITION [pfAuditHistory] TO ([fg00000], [fg00001], [fg00002], [fg00003], [fg00004], [fg00005], ...) CREATE TABLE [HistChangeHeader]( [SourceTableId] [smallint] NOT NULL, [AuditDateTime] [datetime] NOT NULL, [PartitionNumber] [smallint] NOT NULL ) ON [psAuditHistory]([PartitionNumber])&lt;/span&gt; CREATE TABLE [HistChangeDetail]( [HeaderId] [bigint] NOT NULL, [PropertyNameId] [smallint] NOT NULL, [BeforeValue] [varchar](max) NULL, [AfterValue] [varchar](max) NULL, [PartitionNumber] [smallint] NOT NULL ) ON [psAuditHistory]([PartitionNumber])
The Partition table persists the state of a given partition. Partition 1, which is our timeless partition, is never archived or read-only, and its “final backup” will never be taken since it is constantly being added to by the timeless audit data. All other partitions are bound to month values (how this is done will be explained shortly). For example, partitions 2-5 represent data for months in the year 2009, and partitions 100-101 represent 3/2017 and 4/2017 respectively.
Stored procedures drive all the partition operations through this table so that the physical state of the partition (and corresponding filegroup) is always in sync with the Partition table.
- Manage ReadOnly – look for any rows in Partition where ReadOnly = 0 and EndDateExclusive has passed and alter the partition’s filegroup to read-only and the ReadOnly bit in the Partition table to 1.
- Manage Backup – manages the backup of each partition (via BACKUP DATABASE … FILEGROUP=xxx) and sets the FinalBackupTaken bit to 1 if ReadOnly is set to 1 and FinalBackupTaken = 0. This means that the final backup for this partition is complete and no additional backups need to be taken.
- Manage Archive – Once FinalBackupTaken = 1 and the data is no longer needed for reporting (for example, data older than 5 years), the partition can be migrated (via ALTER TABLE … SWITCH) to a staging table and then bcp’ed out of the database. The export can be moved to cheap storage if it ever needs to be imported back into the database which is assumed to be a rare if ever occurrence.
The PartitionMap table defines the mapping from table and date to its partition. When changes are sent to the audit database, a stored procedure is invoked with the source table and change date. This procedure looks for a matching entry in the PartitionMap table, obtains the PartitionNumber and uses it for the insert into the header and detail rows. In the sample data above, tables 2 & 3 are timeless, and use a distant end date so that for the foreseeable future all data for these tables will be assigned to partition 1. Tables 5 & 6 are partitioned by month which allows for older data for these tables to be marked read-only and eventually archived.
Each partition is assigned to a filegroup which in this scenario contains a single file of 10GB. This was calculated via analysis of the already present historical data. There were a few outlier historical months which needed a slightly larger file size, this will obviously be a function of your unique data requirements.
The header and detail tables use the PartitionNumber column as their partitioning column and all indexes created for these tables are aligned with the same partitioning scheme. This was primarily done to allow for the use of ALTER TABLE … SWITCH for easy archiving of a partition’s data, but also allows for partition elimination in queries against this data. For example, once the header rows are found via their business key (e.g. trip number), we can use partition elimination for the join to the detail table which is several times larger than the header table. Having to look across the header’s partitions has not negatively impacted performance, and the scheduled management of archiving off older partitions can mitigate against potential issues.
SQL Server partitioning is a simple but extremely powerful feature for managing your physical data. But like most implementations in the real world, the “standard” scenarios sometimes fall short of your goals.
Hopefully, this helps with your own creative approaches of how to extend partitioning to fit your business needs.