SQL Server Partitioning using Multiple Varying Date Ranges

Overview

There are numerous articles about best practice partitioning approaches for SQL Server, and they essentially deal with a similar scenario – continuous population of historical data where a date column is coupled to the data’s relevant lifespan.   The partitioning scheme uses this date as the partitioning column, and the oldest partition is archived out of the primary table on regular time intervals (usually via the ALTER TABLE … SWITCH statement into a staging table).

But what approach is best when the historical data is a mixed bag – where some data needs to persist indefinitely (regardless of its origin date) and the rest fits into the more traditional approach described above?  This article describes a solution for this business problem.

Scenario

A car service has an audit database that stores years of the before and after values of a subset of its OLTP tables.  Not surprisingly, the large data volumes in the audit database’s tables require partitioning to allow for easier management of data files and backups.

Included in the OLTP subset are tables containing timeless business data like customers (name, address, contact info, etc…) and drivers (name, availability, etc…), and tables containing expirable data like data related to a specific trip including pickup location, duration, and flight arrival or departure information.

Each change in the OLTP database becomes an audit header row identifying the table and change date and child detail rows with the before and after values of the columns updated in transaction.

A user of the archive’s reporting system would need to see the history of customer or driver changes across all time, but only need to see trip data for trip’s that occurred within the last six months.

Approach

A “standard” partition function and scheme on the change date works for the “expiring” data but not for the timeless data.  What is needed is an abstraction layer to calculate a partition from the source table and change date.

Database Objects

These are the abbreviated scripts and sample data of the primary database objects which implement this solution.  Detailed descriptions follow.

[code language=”sql”]
CREATE TABLE [Partition](
[PartitionNumber] [smallint] NOT NULL,
[IsArchived] [bit] NOT NULL DEFAULT ((0)),
[IsReadOnly] [bit] NOT NULL DEFAULT ((0)),
[FinalBackupTaken] [bit] NOT NULL DEFAULT ((0))
)
[/code]

[code language=”sql”]
CREATE TABLE [PartitionMap](
[SourceTableId] [smallint] NOT NULL,
[StartDate] [date] NOT NULL,
[EndDateExclusive] [date] NOT NULL,
[PartitionNumber] [smallint] NOT NULL
)
[/code]

[code language=”sql”]
CREATE PARTITION FUNCTION [pfAuditHistory](smallint) AS RANGE RIGHT FOR VALUES (1, 2, 3, 4, 5, …)

CREATE PARTITION SCHEME [psAuditHistory] AS PARTITION [pfAuditHistory] TO ([fg00000], [fg00001], [fg00002], [fg00003], [fg00004], [fg00005], …)

CREATE TABLE [HistChangeHeader](
[SourceTableId] [smallint] NOT NULL,
[AuditDateTime] [datetime] NOT NULL,
[PartitionNumber] [smallint] NOT NULL
) ON [psAuditHistory]([PartitionNumber])</span>

CREATE TABLE [HistChangeDetail](
[HeaderId] [bigint] NOT NULL,
[PropertyNameId] [smallint] NOT NULL,
[BeforeValue] [varchar](max) NULL,
[AfterValue] [varchar](max) NULL,
[PartitionNumber] [smallint] NOT NULL
) ON [psAuditHistory]([PartitionNumber])
[/code]

The Partition table persists the state of a given partition.  Partition 1, which is our timeless partition, is never archived or read-only, and its “final backup” will never be taken since it is constantly being added to by the timeless audit data.  All other partitions are bound to month values (how this is done will be explained shortly).  For example, partitions 2-5 represent data for months in the year 2009, and partitions 100-101 represent 3/2017 and 4/2017 respectively.  

Stored procedures drive all the partition operations through this table so that the physical state of the partition (and corresponding filegroup) is always in sync with the Partition table.  

  • Manage ReadOnly –  look for any rows in Partition where ReadOnly = 0 and EndDateExclusive has passed and alter the partition’s filegroup to read-only and the ReadOnly bit in the Partition table to 1.  
  • Manage Backup –  manages the backup of each partition (via BACKUP DATABASE … FILEGROUP=xxx) and sets the FinalBackupTaken bit to 1 if ReadOnly is set to 1 and FinalBackupTaken = 0.  This means that the final backup for this partition is complete and no additional backups need to be taken.  
  • Manage Archive – Once FinalBackupTaken = 1 and the data is no longer needed for reporting (for example, data older than 5 years), the partition can be migrated (via ALTER TABLE … SWITCH) to a staging table and then bcp’ed out of the database.   The export can be moved to cheap storage if it ever needs to be imported back into the database which is assumed to be a rare if ever occurrence.

The PartitionMap table defines the mapping from table and date to its partition.  When changes are sent to the audit database, a stored procedure is invoked with the source table and change date.  This procedure looks for a matching entry in the PartitionMap table, obtains the PartitionNumber and uses it for the insert into the header and detail rows.  In the sample data above, tables 2 & 3 are timeless, and use a distant end date so that for the foreseeable future all data for these tables will be assigned to partition 1.  Tables 5 & 6 are partitioned by month which allows for older data for these tables to be marked read-only and eventually archived.

Each partition is assigned to a filegroup which in this scenario contains a single file of 10GB.  This was calculated via analysis of the already present historical data.   There were a few outlier historical months which needed a slightly larger file size, this will obviously be a function of your unique data requirements.

The header and detail tables use the PartitionNumber column as their partitioning column and all indexes created for these tables are aligned with the same partitioning scheme.  This was primarily done to allow for the use of ALTER TABLE … SWITCH for easy archiving of a partition’s data, but also allows for partition elimination in queries against this data.  For example, once the header rows are found via their business key (e.g. trip number), we can use partition elimination for the join to the detail table which is several times larger than the header table.  Having to look across the header’s partitions has not negatively impacted performance, and the scheduled management of archiving off older partitions can mitigate against potential issues.

Summary

SQL Server partitioning is a simple but extremely powerful feature for managing your physical data.  But like most implementations in the real world, the “standard” scenarios sometimes fall short of your goals.

Hopefully, this helps with your own creative approaches of how to extend partitioning to fit your business needs.

Planning and Events Product

In 2017, EmpireCLS wanted to simplify the management of events. An event was a large number of people being driven to and from venues. A venue would be for as little as a single day or up to several days or weeks long. This was an extremely manual process and required many staff hours to manage and validate.

We developed the entire MVP using a “no DB” approach. In order to cut down on the friction of any type of database we used in memory repositories to track all data. A generic set of classes were used to manage the memory repository and offered simple migration to full DB support. We used domain models from the repositories as the application view of data, the repositories would then map the domain models into whatever storage we wanted; mongo, SQL, etc.

We used Typewriter to generate our client typescript code based upon the web API interface, API models, and domain models. By authoring a few templates in Typewriter we would have all of the client-side Angular HTTP services, DTO’s, and view models generated for us. This streamlined the development process immensely.

Technologies… C#, asp.net MVC, asp.net Web API, Angular, Typewriter

SQL Server Archiving Software

In 2015, EmpireCLS had a 1 Terabyte database that was difficult to manage. A restore of this database would take several hours which would not be suitable as a disaster recovery option.

We analyzed the database and usage and were able to use a custom partitioning scheme to keep rolling windows of table and index data on a monthly basis. We were able to then roll off the windows after a certain time period.

Daily backups would only occur on the active filegroups and inactive filegroups would be treated as read-only. The restore of the last active filegroup would then take minutes because it was extremely small.

Another goal of this partitioning was the need to move filegroups around the network easier and being able to restore individual filegroups without affecting disk performance. It was determined that restoring the terabyte backup would not only take a long time but would saturate the SAN that EmpireCLS was using. By only needing to restore 10GB backups these issues were removed.

An more technical explanation can be found here…SQL Server Custom Partitioning

Logical Locking Software

In 2015, EmpireCLS needed a way to alleviate the physical locking issues that they were facing when users edited a reservation. The legacy Linux system was using a physical database lock that would stay active when a user attempted to edit a reservation, this lock would remain in effect until the user explicitly released it or their process was killed by an admin. Because killing the process became the normal procedure the user that had the lock would lose any in process edits. This problem directly impacted daily operations, automated booking feeds, and the IT staff.

We built a cross platform logical locking framework that would enforce the same edit semantics that the EmpireCLS staff had grown accustomed to but would allow supervisors to release locks without the need of killing a process and losing the in process changes for a user.

This implementation required both a Linux and Windows code base so we used C# / Mono to create the cross platform libraries.

Technologies… C#, C, Linux, asp.net, Mono

Assessments and Processes

Microsoft Licensing and Asset Management

In 2015, EmpireCLS needed to institute a formalized process for managing it’s Microsoft licenses. Up until this point EmpireCLS would never know until the “true up” period what the cost of licensing would be for the previous year and what upcoming years would hold.

We derived a simple inventory procedure to track software licenses; Sharpoint, SQL Server, Windows Server, Exchange, etc. This procedure allowed EmpireCLS to make better decisions throughout the year when provisioning new servers and doing various infrastructure maintenance tasks.

Some highlights…

  • SQL Server Software Assurance was discontinued saving hundreds of thousands of dollars over a 10 year period. Since EmpireCLS did not have the need to be on the latest versions of the software, and did not need Mobility Rights for SQL Server, this was a simple change.
  • Exchange Software Assurance was discontinued with the same type of savings and with a migration plan to Office 365 which would cut down even more on cost of the 400 MS Office licenses.
  • Training of EmpireCLS staff in regards to what Windows OS licenses to use and when was also a big win in overall savings on Datacenter vs. standard licenses.

HP Hardware Assessments

In 2015, EmpirCLS needed to figure out if moving to the latest HP storage solution was a required option. They currently had 3 storage units in place, with 1 coming close to end of life. In looking at the upcoming recurring maintenance costs vs. new storage and future maintenance costs we were able to guide EmpireCLS into just staying the course with existing hardware. This allowed them to save 3 to 4 hundred thousand for the first year and prolong the life of existing hardware. We tried to cut back on the “refresh your hardware” scenario that vendors push every 3 years.

Cloud Assessments

In 2016, EmpireCLS was weighing the benefits and costs of moving their data center to Microsoft Azure. We did deep analysis on SQL Server which we thought would be the most expensive option. We pointed out that since EmpireCLS already owned the SQL licenses through 2014 in perpetuity that it was more cost effective to leave the data center intact. This was taking into consideration data center power and space charges, FTE costs, and hardware licensing.

We then did another assessment to determine if it was a benefit to move the various websites to Azure. Since EmpireCLS already owned a lot of its hardware already, we found that it was more cost effective to cut maintenance contracts on existing hardware, using parts from unused server inventory, and then just move all websites to an internal VM, removing server licensing.

The final assessment we did was for migrating Exchange to the cloud, via Office 365. After weighing the costs of Exchange Software Assurance, hardware maintenance, storage, and Office client licenses it was determined that the move to Office 365 would greatly benefit EmpireCLS. I believe we were able to save 2 to 4 hundred thousand of a 10 year period.

These assessments were difficult because EmpireCLS had already made significant capital investments in software and hardware. If it was a ground up build it may have been a different story.

SDLC Procedures

In 2014, EmpireCLS needed help with structuring their development procedures for the multiple teams both local and remote. These procedures included full tracking of both critical path projects and day to day maintenance tasks.

We developed a set of Excel spreadsheets to pull in all TFS work items and to do analysis for forecasting future project completion dates, and to help with end of year financial cost analysis. We also used the spreadsheets to track project hours in relation to the various contractors that were working on the projects.

We instituted a modified SCRUM process where we would have a weekly sprint meeting, ad hoc impediment discussions, and pre-assigned work items.

We continuously released manually every 2 weeks. New features were exposed to subsets of users and controlled with feature toggles and security. We had a notion of beta users that could execute the latest code and give feedback. We had several notions of modules being in read-only validation states to allow cross referencing between multiple systems to validate before enabling write back.

GPS Tracking Software

In 2014, EmpireCLS needed a better GPS solution to track all of its vehicles. This service was a direct dependency of the “auto dispatch” system.

We ended up writing GPS query services to both Bing and Google. Cost was a constant factor as the number of queries that needed to be made was in the 100’s of thousands a day.

Technologies… C#, Google API

Suggestive Dispatch Software

In 2014, EmpireCLS wanted to optimize their dispatch department by automating the chauffeur to reservation assignments that they have been manually doing for decades.

Building on the rules of the newly created dispatch application this application would attempt to automate as much of the work as possible. Some of the rules that would need to be taken into consideration are as follows; driver requests by customer, driver location to customer, VIP customer handling, where driver started their day, shift time, etc.

A real-time GPS tracking system was also required for this system so as to have constant visibility to driver location and time / duration to a customer location or when the customer would be dropped off.

We ended up treating output of the system as suggestions that the dispatcher could just accept as opposed to 100% automation. There were edge cases that could not be covered for 100% automation so suggesting vs. automating was the better solution.

A dispatcher using this tool was able to assign all of the work for the first shift drivers in seconds, where previously it took over an hour.

Technologies… C#, asp.net MVC, Mongo, SQL Server

Promotion Software

In 2013, EmpireCLS needed the ability to offer promotions to their customers. The current method of just crediting their account after the fact didn’t allow them the marketing flexibility that they were looking for.

We designed and implemented the system that would allow EmpireCLS to have time based, instance based, and customer based promotions.

Once the project was completed the marketing team had the tools to create promotions which would allow EmpireCLS to offer services during low volume times, lure back customers that didn’t use the service for a while, and bring on new customers that never heard of them before.

Chauffeur App 2.0

In 2013, EmpireCLS needed to revamp their mobile chauffeur application because the Windows Mobile 6 platform was to be deprecated. Windows Phone 8 was the platform of choice for this application that supported their 400+ mobile workforce.

A large amount of client code was re-used from the original mobile app, along with the entire back end.

Highlights…

  • Chauffeurs could notify central dispatch that they were awake and ready for work. Before this feature dispatchers would call chauffeurs on the phone to validate that they were ready and would make their first pickup of the day. This saved thousands of man-hours a year.
  • Was done using cross platform development patterns to allow other mobile platforms to be implemented. The goal of using Xamarin to create an IOS an Android app was taken into consideration way up front to cut down on overall cost of bringing up these 2 platforms at a later date.

Technologies… Windows Phone 8, c#, WCF, PCL