With "big data" grabbing headlines and Hadoop being the poster child for implementing big data, there is tremendous interest in this open source software, largely because the software is cheap and runs on commodity hardware. Hadoop's big shortcoming is its lack of inherent security. Zettaset addresses the security issue, as well as availability and manageability, with a “wrapper” that adds features that make Hadoop ready for the enterprise.
If you visit the Hadoop page on the Apache Software Foundation website, you’ll see the Hadoop project described as one that “develops open-source software for reliable, scalable, distributed computing.”
Conspicuously absent from that description is the word “secure,” and that’s a problem for a lot of companies. “Big data” computing based on Hadoop open source software is almost as great as sliced bread, except for the fact that data security isn’t baked into it.
There are myriad use cases for big data applications. For example, pharmaceutical companies may want to analyze the efficacy of a new drug in development. Oil and gas companies may need to analyze seismic data from a geological formation. Insurance companies may need to analyze actuarial information in order to calculate policy premiums. And yes, a government may need to dig through billions of phone records to search for communications between suspected terrorists.
All of these scenarios and many more can benefit from big data science. The fact is, companies are beginning to have this data deluge and they are looking for new ways to handle it. Instead of using a standalone database server, they are looking at distributed architectures and how to spread the workload out over multiple servers.
There are a lot of different distributed computing solutions but Hadoop is one that is catching everyone’s interest. It’s relatively cheap to use since the software is open source and can run on commodity hardware. Hadoop’s shortfall, however, is the lack of security and general management features. This makes it hard for enterprises to embrace Hadoop when the platform puts companies in violation of mandatory compliance requirements for HIPAA, PCI, GLBA and other regulations.
Zettaset Inc. has taken note of this gap and is filling it with a management layer that adds security, performance and high availability of clusters, and ease-of-use features for any Hadoop distribution.
A distributed computing architecture is fundamentally different from standalone systems. Therefore you just can’t take what was architected for a single machine and use it for a large data cluster. Zettaset developed its Orchestrator solution specifically to match the needs of a system that pushes data and discreet programming jobs out among numerous computers functioning as a cluster.
Zettaset Orchestrator is not a Hadoop distribution, but rather operates as an independent management layer that sits on top of a Hadoop distribution as a security “wrapper” to make the environment enterprise-ready. Zettaset has taken four decades worth of best practices that existed around distributed data science and distilled it into a software package to address policy, compliance, access control and risk management in a Hadoop cluster environment.
There are a lot of aspects to security, and Zettaset started with a most basic requirement: user login. Logging into one system is pretty easy, but when you take something as simple as that and apply it to 20 or 200 or 2,000 machines, it gets pretty complicated. Now all of these machines need to know who the user is because his data could be spread across any combination of the machines. Also, the cluster needs to know how to restrict one user from seeing another user’s data. Zettaset has implemented role-based access control (RBAC) to address these specific issues.
Next the company addressed the need for encryption on the distributed platform, and specifically how to distribute and protect the keys across the entire cluster. Orchestrator also includes support for LDAP and Active Directory, which enables Hadoop clusters to seamlessly integrate with existing security policies within the enterprise environment.
In terms of high availability and performance, Orchestrator is designed to handle potential failures inside of a cluster. This answers questions such as: If a job fails, how do we recover from that failure? How do we keep data from being corrupted? How do we restart the job automatically, or notify the user to restart the job? Orchestrator eliminates the Hadoop bogeyman known as the NameNode single point of failure. In a Hadoop cluster, the main node physically contains all the metadata for where the data is stored. If the NameNode fails, the metadata – which is the data map for the entire cluster – is lost. Zettaset borrowed techniques from the world of high performance computing and codified the processes into software to create a resilient computing environment to prevent such failures.
Orchestrator also adds centralized configuration management, logging, and auditing, which maintains control of ingress and egress points in the cluster and enables Hadoop clusters to meet compliance requirements for reporting and forensics.
Zettaset Orchestrator installs either in the cloud or the data center. It adds ease-of-use features that abstracts the complexity of distributed systems and makes Hadoop a more secure, more enterprise-ready environment.
Linda Musthaler is a Principal Analyst with Essential Solutions Corporation. You can write to her at mailto:LMusthaler@essential-iws.com.
About Essential Solutions Corp:
Essential Solutions (http://www.essential-iws.com) researches the practical value of information technology, and how it can make individual workers and entire organizations more productive. Essential Solutions offers consulting services to computer industry and corporate clients to help define and fulfill the potential of IT.