- 10 Hot Big Data Startups to Watch
- 11 Unique Uses for Google Glass, Demonstrated by Celebs
- How to Export Your Google Reader Account
- How to Better Engage Millennials (and Why They Aren't Really so Different)
The CIO-level business angle on the latest tech
Data loss is a huge problem for organizations of all sizes and in all industries. In 2008, the Verizon Business RISK Team investigated 90 breaches in which more than 285 million records were compromised. Most of those records involved sensitive payment card data. The thieves who pilfered the records then turned around and sold or used the information for fraudulent purposes -- in other words, to steal money from our credit and debit accounts.
These are just some of the breaches that we know about, making them the tip of the iceberg. Many more instances occur every day involving sensitive information such as Social Security numbers, customer account information, intellectual property, authentication credentials, corporate financial data and more.
Many organizations have turned to data encryption to protect sensitive data. While encrypting data is certainly an improvement over using, moving and storing it in plain text form, encryption has its drawbacks. It can be expensive and cumbersome to manage the keys to encrypt and decrypt the information, especially if the organization wants to use the data in numerous applications. For instance, a retail business may want to use customer payment card data to provide loyalty rewards or to analyze buying trends. Each of these applications would need the means to decrypt the data while it's in use and re-encrypt it afterward.
Now there's a relatively new technology called tokenization that's gaining interest from organizations that have a lot to lose in the event of a data breach. Like encryption, tokenization replaces the sensitive data with an alternate string of characters called a token. However, the token is not cyphertext; rather, it's a randomly generated string of characters. If the token data is lost or stolen, it has no meaning to anyone who would view it. It can only be "unlocked" to reveal the original data by an authorized party who has access to the token server.
Here's a simple overview of how tokenization works. I'll use the example of payment card data because that is the most frequent use of tokenization today.
A merchant has a point of sale system where customers swipe their credit or debit cards to initiate a payment transaction. Among the information from the magnetic stripe on the back of the card is a 16 digit number called the primary account number (PAN). Any thief who can gain access to the PAN has enough information to use the card data fraudulently. The PAN (i.e., the cardholder data) is sent to a token server where it is encrypted and placed into a secure data vault. A token is generated to replace the PAN data in the merchant's storage systems or business applications. If the merchant needs access to the original cardholder data again -- say to issue a refund on the credit card -- the merchant is authorized to reach into the secure data vault to look up the PAN again.
Here's an example of a format-preserving tokenization process. In this process, the length and data type is preserved, but the data values change enough to confound a data thief.