Data Masking

Data masking is the process of hiding data elements while keeping the original data set usable.

Most common use of data masking is to create test data for non-production environments, such as application development and QA, while protecting personally identifiable, personally sensitive or commercially sensitive data. Creation and management of test data sets is a very important part of data governance solution in every organization. Development and QA department personnel usually do not have required security clearance to access actual production data. However, in order to create and test robust business applications, they need to work with data sets that closely resemble real-world data. These test data sets should also pass application-level constraints and validations, which is impossible to achieve using just random test data generators.

Server Studio provides two options for creating test data sets based on a production data.

The first option is to export production data to text files while performing data masking for selected table columns in memory during the export operation. The exported text files, which contain obfuscated data, can be imported into development and QA databases that have corresponding database schema.

The second option is to mask data directly in database tables, which represent a copy of production data created in some secure environment accessible only to cleared personnel. Subsequently, modified tables can be exported and then imported (or backed up and then restored) into development or QA environment using other tools or utilities.

A person responsible for creation of a test data set for a development or QA environment, can choose from a variety of data masking methods and apply them to database table’s columns that require obfuscation.

The following masking methods are available:

TIP: Use Object Explorer’s Columns folder under a database node to browse all columns for all tables in your database to locate the columns that require masking. The Columns Properties panel allows to sort all columns by name and filter columns using column name patterns to help you to identify columns that store sensitive data, such as credit cards numbers, Social Security Number, email address, mailing address, phone, etc. After you find all columns that require data masking, you can open the Data Masking wizard for tables that contain these columns and apply data masking rules.

Random Substitution

Actual data is substituted by randomly generated values that are appropriate for a column’s datatype. Users can select a number of options, such as selecting predefined formats for character fields (for example US Social Security Number, ZIP Code or Telephone Number) or specifying a range of values for numeric and date fields.

Substitution from Database Object

Actual data is substituted by values from datasets previously created by users and stored in a database tables or views. For example, a person responsible for creating a test data for application developers can create datasets with first and last names, list of product parts numbers and descriptions, etc., which correlate to the actual production data but do not represent the real production data.

Substitution from File

Actual data is substituted by values from datasets previously created by developers and stored in text files. For example, a person responsible for creating test data for application developers can create files with first and last names, list of product parts numbers and descriptions, etc., which correlate to the actual production data but does not represent the real production data.

Shuffling

Shuffling method is similar to the substitution method but it derives the substitution set from the same column of data that is being masked. The original data is randomly shuffled within the column. If shuffling is used on several columns at the same time, it is very difficult to re-create the original data out of the newly generated records. It is recommended to use shuffling method in combination with other methods, such as Substitution and Variance methods to make it impossible to reverse engineer original records from the test data set.

Number and Date Variance

Variance masking method allows generation of random values that represent the original data within user defined percentage proximity. This method can be used to mask numeric financial columns and date/time related columns. When applying a small percentage variance to numeric fields, such as +/-15%, you still retain the same range of data as the original data but the actual original values are obfuscated. For Date and Datetime fields, you can apply a variance generation that represents +/-5 days or 2 months of the original values. It allows you to keep the same demographic or chronological distribution as the original data without compromising actual values, such as birth dates or transaction dates.

Masking Out

This method applies to character fields, such as CHAR, VARCHAR, TEXT, LVARCHAR. This method allows obfuscation of only parts of the original value, for example, a specified number of first, middle or last characters in account numbers. You can also apply it to address fields, by replacing only numerical portion of the address, such as house and apartment numbers, but keeping names of the streets, to create a more realistically looking data with exactly the same statistical data distribution as the original data.

Nulling Out

This method allows you to hide data by setting values to NULL in all rows of the selected columns.

Data Masking Wizard – Export to Text Files

To open the Data Masking Wizard in the ‘Export to Files’ mode, select Tools -> Data Manager – Mask Data top-level menu, choose ‘Mask Data and Export to Files’ radiobutton and press OK button.

Choose a list of tables to include into the data masking wizard. You can choose a mix of tables, some of which contain data that require masking and some of which do not. It helps you to export a complete data set with both obfuscated and not obfuscated data that can be imported into a development or QA database. If some tables do not have column-level data masking rules defined, the data masking wizard works for these tables exactly the same way as a regular data export wizard. Press the OK button to open the data masking wizard.

Data masking wizard contains a grid with tables selected for data masking and export at the top. The details panel at the bottom displays a list of columns and column-level data masking rules for a table selected in the top grid.

For those tables in the list that require data masking, choose the columns that you want to mask and for each column select a masking method using Data Masking Method dropdown. Data Masking Method dropdown contains only methods applicable for the selected column’s datatype. If a selected column is a part of the primary or foreign key, it cannot be masked in order to protect data referential integrity in a database.

Data Sample panel allows you to preview how the masked data for a selected column with a specified data masking method will look. It uses a small subset of the actual data from a table’s column as a source and applies to it the data masking method selected in the dropdown.

If you want to add additional tables, press Add Tables button located below the grid with the list of tables.

After you defined column’s data masking rules for required tables, press Next button. The next page of the wizard allows you to specify export file names, locations and data format for each table. By default, an export file name is set to the same name as a table, which serves a source of data for this file. An export file name has .UNL extension and a default separator character is - ‘|’. File Path and export file format by default is applied to all export files in the wizard.

Press the Next button to get to the execution status page of the wizard. If you have more than one table added as data masking/export jobs, this page allows you to specify which action to take when one of the jobs fails. You can choose to ignore the error and continue execution of other jobs or to stop execution process immediately after a first error occurs. If you choose to ignore errors and continue execution, you will be able to review all failed jobs after the entire process for all jobs is completed using the Status column in the top grid. After the cause of errors is fixed, you can repeat execution only for failed jobs. In addition, you can manually break the execution process for all jobs by pressing Cancel button.

Press Start button to start the data masking process. The top grid shows an execution status for each table, for which data masking job was executed. The details panel, located below the grid, shows the status of currently executed job. If any errors occur, the Status field in the grid displays the error message. Click on the table in the grid to see the specific job error’s details in the panel below the grid. Use Failures radio button located below the grid to see only failed jobs.

Data Masking Wizard – Modify Existing Data in Tables

To open the Data Masking Wizard in the ‘Modify Existing Tables’ mode, select Tools -> Data Manager – Mask Data top-level menu, choose ‘Mask Data and Update Table’ radiobutton and press OK button.

IMPORTANT NOTE: This option modifies original data in tables so you need to have a valid copy of this data that can be used to restore original data. Do NOT use this option against production tables - only against a copy of production data.

Choose a list of tables to include into the data masking wizard. You should include only those tables that contain columns that should be masked.

Data masking wizard contains a grid with tables selected for data masking at the top. The details panel at the bottom displays a list of columns and column-level data masking rules for a table selected in the top grid.

For each table in the list, choose the columns that you want to mask and for each column select a masking method using Data Masking Method dropdown. Data Masking Method dropdown contains only methods applicable for the selected column’s datatype. If a selected column is a part of the primary or foreign key, it cannot be masked in order to protect data referential integrity in a database.

Data Sample panel allows you to preview how the masked data for a selected column with a specified data masking method will look. It uses a small subset of the actual data from a table’s column as a source and applies to it the data masking method selected in the dropdown.

If you want to add additional tables, press Add Tables button located below the grid with the list of tables.

After you defined column’s data masking rules for required tables, press Next button to get to the execution status page of the wizard. If you have more than one table added as data masking jobs, this page allows you to specify which action to take when one of the jobs fails. You can choose to ignore the error and continue execution of other jobs or to stop execution process immediately after a first error occurs. If you choose to ignore errors and continue execution, you will be able to review all failed jobs after the entire process for all jobs is completed using the Status column in the top grid. After the cause of errors is fixed, you can repeat execution only for failed jobs. In addition, you can manually break the execution process for all jobs by pressing Cancel button.

Press Start button to start the data masking process. The top grid shows an execution status for each table, for which data masking job was executed. The details, panel located below the grid, show the status of currently executed job. If any errors occur, the Status field in the grid displays the error message. Click on the table in the grid to see the specific job error’s details in the panel below the grid. Use Failures radio button located below the grid to see only failed jobs.