Tuesday, April 16, 2013

PL/SQL: Safely Publishing Data For Analysis/ Obfuscating Data

The purpose of this blog post is to publish a way of obfuscating data so that you can publish data which can be analysed to spot trends but which does not include confidential information while at the same time allowing you to track back from the published information to the original data if external analysis raises issues which require investigation.

We have used this source code to allow us to provide information to our Internal Audit function which includes user and vendor information while keeping that information anonymous.

The first step is to configure the database to store the information. This is fairly simple, you just need to create a single sequence to provide the unique identifiers and a table to store the mappings between the original information and the identifies. By allowing the identifiers to be grouped (using Data Types) this reduces the possibility of someone using a piece of "known data" to work out other pieces of data.

Here's the SQL to create the sequence;

create sequence HIDEDATA_SEQ
start with 1;

And here's the SQL to create the table, and indexes for speed, to store the mappings;

-- Create table
  data_type varchar2(20) not null,
  id        number not null,
  value     varchar2(255) not null
-- Create/Recreate indexes
create unique index HIDDENDATASTORE_PK on HIDDENDATASTORE (data_type, id);
create unique index HIDDENDATASTORE_IDX1 on HIDDENDATASTORE (data_type, value);

Once these have been created your can create the package with the two functions (to Hide and Unhide - with apologies to the English language for that second one!).

The source code for the package is available here (via Google Drive).

Once you've created the package here's a quick test script;

 SELECT hidedata.Hide('TEST',
                     'the quick brown fox jumped over the slow lazy dog') Hide,
                                     'the quick brown fox jumped over the slow lazy dog')) UnHide

The result in my database, in which I've been doing some testing (exactly 23 times to be precise!) is;

Test Results Hiding/Unhiding Data

Post a Comment