OK, so we’re still evaluating SQL Server 2014, but are you ready now to explore Microsoft SQL Server 2016? No rest for the weary! Community Technology Preview (CTP) 2.2 is now available as a public preview of next year’s release. But don’t panic! In this blog entry I will examine three key new features that I found most interesting when exploring the CTP. Specifically, I will be looking at: Always Encrypted, Temporal Data, and the Stretch Database.
Microsoft tells us that the three themes for the SQL Server 2016 release are:
- Mission Critical Performance
- Deeper Insights Across Data
- Hyperscale Cloud
Mission Critical Performance
Selected Feature: Always Encrypted - “The Imitation Game”
Security is more important than ever for mission critical systems especially as we start to explore cloud-based data options. Starting with SQL Server 2005, Microsoft allowed us to use column-level encryption natively within the database engine. Sensitive columns could be encrypted by an application and decrypted as needed providing an “end-to-end” security option. Selected columns would be encrypted on disk, on backup, in memory and over the network. However, column level encryption required application code changes to use functions such as EncryptByKey and DecryptByKey. A Database Encryption Key and an appropriate Certificate were needed to be accessible for encryption or decryption to occur successfully. With SQL Server 2008, Microsoft introduced Transparent Data Encryption (TDE) as a feature that would automatically encrypt a whole database without having to change application code even for third-party applications. A major benefit was that the “data at rest” would be encrypted, including database files on disk and backups on tape or disk. This feature provided additional security to counter possible physical security vulnerabilities. However, as soon as data was “in motion”, for instance, into memory or over the network, the information was automatically decrypted and therefore not protected.
Now with SQL Server 2016, Microsoft has implemented an “Always Encrypted” option which includes end-to-end encryption without the need for application code changes therefore giving us a best-of-both-worlds solution. A major benefit of this strategy is that Always Encrypted provides a separation between users who own the data and users who manage the data, such as administrators, because the encryption/decryption occurs at the client layer. With the advent of cloud-based data, this separation is especially important. A Column Master Key (CMK) and Column Encryption Key (CEK) are required to be accessible for encryption or decryption to occur successfully using an Always Encrypted-enabled driver installed on the client computer.
Although the application code does not need to change, the data definitions for sensitive columns will need to be redefined to include the ENCRYPTED WITH clause. The ENCRYPTION_TYPE for a column can be DETERMINISTIC or RANDOMIZED. The DETERMINISTIC setting means that, given a certain input data, the encrypted data will always be the same output. This option allows grouping, filtering, joining and indexing with encrypted values. However, it may provide an opportunity for unauthorized users to deduce data for columns that have only a few distinct values. The RANDOMIZED option provides more security by encrypting in a less predictable way, however, it does not provide support for grouping, filtering, joining and indexing with encrypted values.
Two types of encryption keys are required for encryption: A Column Encryption Key (CEK) and a Column Master Key (CMK). The CEK is used to perform fast symmetric data encryption while the CMK is used to encrypt the CEK asymmetrically for high security. The CMK must be available in a Trusted Key Store, typically using a Certificate, on the client machine.
Once the appropriate keys and certificates are available and authorized to a client, a special connection string option needs to be specified, namely, “Column Encryption Setting=Enabled."
If any of the required objects are inaccessible or unauthorized, decryption will not occur and data values will be presented as encrypted, assuming the user has security permissions on the data columns themselves.
Always Encrypted is therefore a mission critical feature of Microsoft SQL Server 2016 for both “On-Premises” and “Cloud-based” data that require an automatic and transparent end-to-end security solution. (Spoiler alert!) If you’ve not seen the Imitation Game, just remember not to start all your top secret memos with the same greeting!
Deeper Insights Across Data
Selected Feature: Temporal Data – “Time Travel for DBAs”
When comparing the differences between operational systems and business intelligence (BI) solutions, we typically see that the operational viewpoint is that of the current, latest data values whereas the importance of BI provides the ability to analyze historical trends and patterns of data changes over time. A common way of supporting historical analysis in a BI solution is by defining a Slowly Changing Dimension (SCD) in the Data Warehouse (DW) design. The DW provides a historical database as the foundation for business analysis supporting Facts and Dimensions in a “Dimensional Model”. Facts are typically numeric values that are used to assess the business, such as revenue, costs, profits, salaries etc. Dimensions add context to the analysis of facts, for instance, analyzing revenue by region or profits by year or average manager salary by department. In these examples, region, year and department would be dimensions in the DW with revenue, profits and salary implemented as facts. Consider when a new manager is assigned to a department, should their salary be applied to the whole year average or only the time period that they were assigned? If we choose the former, we are defining an “SCD Type 1” which only stores current data values and the potential for inaccurate detailed analysis. For accurate analysis over time periods, we must choose the latter option which requires the careful design of an “SCD Type 2” for historical data. The typical design method of an SCD Type 2 is to provide the ability for multiple rows for a given dimension member to allow for storage of historical changes by the addition of a “surrogate” key. The surrogate key provides uniqueness and allows duplicate values of the business key used in the operational system. Typically added also, would be a start date/time and an end date/time column for accurate time analysis. The design and population of this type of dimension is usually a complex, manual, time-consuming process. Now in SQL Server 2016, we have the option of Temporal Data which automates this process via system tables and is part of the ANSI SQL 2011 specification.
Temporal Tables, also known as “system versioned tables” can be defined in any database and will automatically track data changes with start and end date/times. The temporal table actually consists of two tables internally, the current table that shows the latest data values and a history table which shows the previous changes and when they were current.
Once created, the temporal table will automatically track changes and maintain the start and end date/time columns, effectively implementing an SCD Type 2 via the history table while keeping the current table as normal with the latest data values. Both tables are visible in the SSMS Object Explorer and can be queried as separate tables, if required.
Note that the current table represents the latest data only and that the history table represents the previous updates with the SysStartTime and SysEndTime columns representing the period of time that those data values were current. Temporal data can be used for auditing or for recovering data values after accidental updates but most importantly can be directly copied to an SCD Type 2 dimension table in a Data Warehouse without manipulation, saving complex processing when incrementally loading data.
For existing data, tables can be altered to become temporal data tables without having to change existing application queries. In addition, if you wish to create new queries against temporal data tables that use the FOR SYSTEM_TIME clause, SQL Server will access the current table and the underlying history table as necessary returning time-sensitive data comparisons. This is loosely referred to as “Time Travel” effectively allowing you to view data as it was at any previous point in time. Dr Who fans, aka Whovians, might enjoy this “Wibbly Wobbly Timey Wimey” feature. Or maybe not. Anyway, Temporal data gives extra insight into data changes over time in a relatively simple and effective way.
Hyperscale Cloud
Selected Feature: Stretch Database - “Julio! Get the Stretch!”
Microsoft’s transition to a cloud-computing services-based company has been rapid and largely successful with major services such as Office 365 and Azure leading the way. Cloud-based databases provide increased flexibility and “elastic” scalability, ideal for rapidly growing organizations. There are still many evolving security-based questions, but some data may be appropriate to host in the cloud and synchronize with sensitive “on-premises” data. This distributed approach to our data is a growing trend in the industry. Microsoft is at the forefront of this technology with its Azure platform supporting Azure SQL Database in the cloud and also Azure Virtual Machines (VMs) which can support the full version of SQL Server, if needed. Once again, ease of use and lower costs are Microsoft’s value propositions. Azure SQL Database is now compatible with SQL Server 2016 and supports a new hybrid option called the “Stretch Database."
The Stretch database concept is designed for a database that has large historical tables that are required for query access but in an infrequent manner. The objective is to store the most frequently accessed tables locally with the occasionally accessed historical tables off-loaded to the cloud, reducing local storage costs and resource demands. Application queries require no changes providing a transparent implementation.
The Stretch database consists of a local database hosted by an on-premises instance of SQL Server 2016 and a remote database hosted by Azure SQL Database (V12 and above). Large historical tables can be identified as “stretch tables” making the data “eligible” for migration to the remote database. The stretch tables are fully migrated over time to the remote database, however, migration can be paused and resumed as required to optimize available network resources. Fail-safe mechanisms are built-in so that no data loss will occur during migration including automatic retry functionality. Again, existing queries can remain unchanged both during and after migration as the query optimizer understands the current state of migration. Effectively, a stretch table consists of a local table with eligible data and a remote table with migrated data. After full migration, the local table will be empty and the remote table will contain all the data. Any new rows added to the local table will then become eligible for migration. A stretch-enabled table does not support update or delete operations currently.
In order to allow a stretch database to be hosted by a SQL Server 2016 instance, the “remote data archive” option must be turned on by using sp_configure. Then the required database can be enabled for stretch operations using the appropriate task wizard in SSMS.
Once the Enable Database for Stretch wizard starts, it prompts for the Azure subscription and sign-in credentials that should be used for the remote database server. The wizard creates a linked server definition to the remote Azure SQL Database for secure access.
When the database is enabled for stretch operations, you can enable selected stretch tables within the database using the Enable Stretch option in SSMS. Again, occasionally accessed historical data tables would be ideal candidates for stretch tables.
Once enabled, the stretch table will begin migration of data from the local table store to the remote table store using a specially designed low-impact “trickle migration” process. The migration can be paused and resumed as necessary to control network usage. Application queries against the stretch-enabled table remain unchanged, however, the query optimizer will create an execution plan that includes access to the local table and the remote table combined. Therefore regardless of the current state of migration, the correct data will be returned by the query. Over time, the entire table data will be migrated to the remote table-store freeing up storage and resources for the more active local non-stretch tables. The stretch table will be available for occasional access as needed however, it must be noted that performance of remote queries against stretch tables will be slower than the access to local high-performance non-stretch tables, by design. The theory is that based on less storage and memory requirements, the local queries will actually run faster. As always, performance metrics must be assessed for acceptability to established service level agreements.
You can interrogate the system catalog for information regarding stretch databases and tables using new system catalog views and dynamic management objects such as sys.remote_data_archive_databases, sys.remote_data_archive_tables and sys.dm_db_rda_migration_status.
The remote Azure SQL database, associated linked server definition, remote endpoint and appropriate Azure firewall settings are created automatically when the selected database is enabled as a stretch database.
With features such as this, Microsoft has successfully incorporated the cloud as an extension of the SQL Server architecture, rather than as a completely separate platform, and as such is allowing us to leverage existing technologies in what is being termed the Hybrid Cloud.
So if your database is getting too big because of historical data, just shout “Julio! Get the Stretch!”
Conclusion: “Don’t Panic! SQL Server 2016 is mostly harmless!”
Microsoft SQL Server 2016 has some great new features that will allow you to develop high performing, more secure, scalable applications using the hybrid cloud. But as Douglas Adams advised us: “DON’T PANIC!”. Microsoft is building on the established foundation of SQL Server 2012 and 2014. Using similar architecture and management tools, we should be able to smoothly upgrade our systems as and when we need the new features, based on our own schedule. Just don’t forget your towel!