January 25, 2016

Data privacy is a major concern today for any organization that manages sensitive data or personally identifiable information (PII). Examples of such data include sensitive customer information such as phone numbers, email addresses and bank information, HR data on employees and financial business data.

This sensitive information is often stored in the database, and it should only be available to specific people on a need-to-know basis. Beyond that need, the sensitive data should not be exposed via the application, or to developers or IT personnel who access the production database directly.

Traditionally, the logic of obfuscating sensitive data has been developed in the application layer, requiring it to be duplicated across all modules and applications accessing the data. Alternatively, special views have been created to avoid exposure of these sensitive data pieces on the database itself, although these can impact database operations and are susceptible to errors. In other cases, third-party tools have been introduced to manage the concealment of the restricted data.

SQL Server 2016 and Azure SQL DB now offer a built-in feature that helps limit access to those particular sensitive data fields: Dynamic Data Masking (DDM).

DDM can be used to hide or obfuscate sensitive data, by controlling how the data appears in the output of database queries. It is implemented within the database itself, so the logic is centralized and always applies when the sensitive data is queried. Best of all, it is incredibly simple to configure DDM rules on sensitive fields, which can be done on an existing database without affecting database operations or requiring changes in application code.

How DDM works

Dynamic Data Masking rules can be defined on particular columns, indicating how the data in those columns will appear when queried. There are no physical changes to the data in the database itself; the data remains intact and is fully available to authorized users or applications. Database operations remain unaffected, and the masked data has the same data type as the original data, so DDM can often be applied without making any changes to database procedures or application code.

To add a data mask on a certain column in your database, all you need to do is alter that column by adding a mask and specifying the required masking type. Here, you can choose default masking, which fully masks out the original value, partial masking where you can specify part of the data to expose, or random masking, which replaces the numeric value with a random value within a specified range. There is also an email masking function, which exposes the first character and keeps the email format.

Full masking:

Configure masking function:

Results:

Partial masking:

Configure masking function:

Results:

You can also configure masking functions on columns at the time of table creation:

Creating a table with Dynamic Data Masking:

Enabling access to privileged users

When configuring Dynamic Data Masking rules, the underlying data is unaffected — so privileged users can still access the real data. Administrators of the database are always exempt from masking, so they will always get the real data when performing queries. You can also specify certain users that will have access to the actual data by assigning them the UNMASK permission:

Assigning the UNMASK permission:

Common questions about DDM

Does DDM apply for all database clients, like Java or Node.js?
Yes, query results always contain masked data for nonprivileged users, regardless of the client used to connect to the database.

What happens if a user copies data from a masked column out of the table and into a TEMP table?
In this case, the data is masked when it is retrieved from the original table — so it is written to the target table in masked format (unless a privileged user is retrieving it). This means that the original data cannot be restored from the TEMP table. This ensures that users who do not have access to unmasked data cannot expose the real data by copying it elsewhere. Note: To avoid data corruption, be sure to assign database read/write permissions appropriately.

What is the performance impact of using DDM?
Since the data masking is performed only at the end of the database query operation, right before the data is returned, the performance impact is minimal and often negligible. You should still validate the exact performance impact for your workload.

Part of a comprehensive security solution

Note that Dynamic Data Masking is not a replacement for access control mechanisms, and is not a method for physical data encryption. DDM is intended to simplify the obfuscation of sensitive data by centralizing the logic in your database, but it does not provide complete protection against malicious administrators running exhaustive ad-hoc queries. Dynamic Data Masking is complementary to other SQL Server security features (auditing, encryption, Row-Level Security, etc.) and it is highly recommended to use it in conjunction with them to better protect your sensitive data in the database.

Getting started

You can get started immediately with Dynamic Data Masking to restrict users from seeing sensitive information in your database. All you need to do is identify the sensitive columns in your database and configure data masking for those columns, specifying how much of the data to reveal.

You can learn more about Dynamic Data Masking for SQL Server and for Azure SQL database with the following resources:

See the other posts in the SQL Server 2016 blogging series.