Skip Links

How to Secure Big Data in Hadoop

By Thor Olavsrud, CIO
November 09, 2012 08:25 AM ET

CIO - The potential is enormous-as businesses transform into data-driven machines, the data held by your enterprise is likely to become the key to your competitive advantage. As a result, security for both your data and your infrastructure becomes more important than ever before.

Get Hadoop certified fast

Big Data Could Be Toxic Data If Lost

In many cases, organizations will wind up with what Forrester Research calls "toxic data." For instance, imagine a wireless company that is collecting machine data-who's logged onto which towers, how long they're online, how much data they're using, whether they're moving or staying still-that can be used to provide insight to user behavior.

That same wireless company may have lots of user-generated data as well: credit card numbers, social security numbers, data on buying habits and patterns of usage-any information that a human has volunteered about their experience. The capability to correlate that data and draw inferences from it could be valuable, but it is also toxic because if that correlated data were to go outside the organization and wind up in someone else's hands, it could be devastating both to the individual and the organization.

With Big Data, Don't Forget Compliance and Controls

9 Tips for Securing Big Data

1. Think about security before you start your big data project. You don't lock your doors after you've already been robbed, and you shouldn't wait for a data breach incident before you secure your data. Your IT security team and others involved in your big data project should have a serious data security discussion before installing and feeding data into your Hadoop cluster.

2. Consider what data may get stored. If you're planning to use Hadoop to store and run analytics against data subject to regulation, you will likely need to comply with specific security requirements. Even if the data you're storing doesn't fall under regulatory jurisdiction, assess your risks--including loss of good will and potential loss of revenue--if data like personally identifiable information (PII) is lost.

3. Centralize accountability. Right now, your data probably resides in diverse organizational silos and data sets. Centralizing the accountability for data security ensures consistent policy enforcement and access control across these silos.

4. Encrypt data both at rest and in motion. Add transparent data encryption at the file layer. SSL encryption can protect big data as it moves between nodes and applications. "File encryption addresses two attacker methods for circumventing normal application security controls," says Adrian Lane, analyst and CTO of security research and advisory firm Securosis. "Encryption protects in case malicious users or administrators gain access to data nodes and directly inspect files, and it also renders stolen files or disk images unreadable. It is transparent to both Hadoop and calling applications and scales out as the cluster grows. This is a cost-effective way to address several data security threats."

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News