This token gesture secures sensitive data

2008 was a record year for the amount of sensitive information compromised through data breaches. Much of the purloined data was payment card data, which allowed thieves to use millions of credit and debit cards fraudulently. There's a new technology called tokenization that offers great promise for protecting sensitive data. If your organization is compelled to comply with the Payment Card Industry (PCI), you need to know about tokenization and how it can help you achieve and maintain PCI compliance while reducing the cost to do so.

Data loss is a huge problem for organizations of all sizes and in all industries. In 2008, the Verizon Business RISK Team investigated 90 breaches in which more than 285 million records were compromised. Most of those records involved sensitive payment card data. The thieves who pilfered the records then turned around and sold or used the information for fraudulent purposes -- in other words, to steal money from our credit and debit accounts.

These are just some of the breaches that we know about, making them the tip of the iceberg. Many more instances occur every day involving sensitive information such as Social Security numbers, customer account information, intellectual property, authentication credentials, corporate financial data and more.

Many organizations have turned to data encryption to protect sensitive data. While encrypting data is certainly an improvement over using, moving and storing it in plain text form, encryption has its drawbacks. It can be expensive and cumbersome to manage the keys to encrypt and decrypt the information, especially if the organization wants to use the data in numerous applications. For instance, a retail business may want to use customer payment card data to provide loyalty rewards or to analyze buying trends. Each of these applications would need the means to decrypt the data while it's in use and re-encrypt it afterward.

Now there's a relatively new technology called tokenization that's gaining interest from organizations that have a lot to lose in the event of a data breach. Like encryption, tokenization replaces the sensitive data with an alternate string of characters called a token. However, the token is not cyphertext; rather, it's a randomly generated string of characters. If the token data is lost or stolen, it has no meaning to anyone who would view it. It can only be "unlocked" to reveal the original data by an authorized party who has access to the token server.

Here's a simple overview of how tokenization works. I'll use the example of payment card data because that is the most frequent use of tokenization today.

A merchant has a point of sale system where customers swipe their credit or debit cards to initiate a payment transaction. Among the information from the magnetic stripe on the back of the card is a 16 digit number called the primary account number (PAN). Any thief who can gain access to the PAN has enough information to use the card data fraudulently. The PAN (i.e., the cardholder data) is sent to a token server where it is encrypted and placed into a secure data vault. A token is generated to replace the PAN data in the merchant's storage systems or business applications. If the merchant needs access to the original cardholder data again -- say to issue a refund on the credit card -- the merchant is authorized to reach into the secure data vault to look up the PAN again.

Here's an example of a format-preserving tokenization process. In this process, the length and data type is preserved, but the data values change enough to confound a data thief.

The original data is 1234 56789012 3456, which is the equivalent of a 16 digit PAN. After tokenization, the representative number might be 1234 59244701 3456. The first four digits (the "head") and the last four digits (the "tail") remain the same. The middle eight digits (the "body") are randomly scrambled enough to obscure the real data value. The tokenized number is in the same format of the original number so it can be used in business applications without the need to modify the application or to maintain keys to decode the data.

This process offers several advantages for the merchant. First and foremost, it takes highly sensitive data out of the business processes that would use customer data. This reduces the likelihood that the real data can be stolen off of servers or from applications. If a thief steals tokenized data, he can't use it to retrieve the real data, since he isn't authorized to access the secure data vault. Instead, he ends up with a bunch of random numbers that don't mean anything to him.

For companies that are compelled to meet PCI Data Security Standards (PCI DSS), taking the sensitive data out of storage and business applications reduces the cardholder data environment that is subject to PCI compliance and assessments. In effect, a merchant can vastly reduce his spend on PCI compliance and maintenance if he doesn't store or use plain text or encrypted cardholder data. Some large merchants that have used tokenization for this purpose have already proven they can save millions of dollars a year.

Data tokenization is relatively new but catching on. I expect within a few years, it will be considered a best practice for the protection of sensitive data. In fact, I wouldn't be surprised to see a future version of PCI DSS include tokenization as a requirement. For good insight on this, I recommend you read the article ‘Tokenization' touted to increase credit card data security here.

Next week, we'll explore some of the companies that offer tokenization solutions.

Editors' Picks
Join the discussion
Be the first to comment on this article. Our Commenting Policies