Storage on a budget: GlusterFS shines in open source storage test
But beware of tradeoffs when it comes to documentation, management tools and failover
Review: Cloud storage
One alternative to buying expensive storage-area networks or other hardware-based dedicated storage is to deploy open source storage software on existing server hardware. For this test, we evaluated three such open source storage products, GlusterFS 3.3, Ceph 0.72 and Apache Hadoop 2.2.0.
All three did a good job, but as you might expect, there is a tradeoff when it comes to using open source storage. This is a DIY project: the documentation might not be as comprehensive as you might like, installation can be tricky, GUI-based management tools might not be available and if anything goes wrong, you’re pretty much on your own.
We liked GlusterFS for its hashing algorithm, which for the most part eliminates the bottleneck and single point of failure risk associated with products that use centralized management. However, GlusterFS, which is being developed by Red Hat, lacks GUI-based management tools.
Ceph also impressed us with its algorithm model. We also liked how Ceph provided object, block and file storage in one system. However, while Ceph is an interesting product to keep an eye on, it’s not ready for prime time deployment in the enterprise. The vendor does not yet recommend CephFS (the file system) for production environments.
+ ALSO ON NETWORK WORLD 25 free open source projects IT pros will love +
Apache Hadoop is a popular, full-featured product with a nice, web-based management console. Our concern with Hadoop HDFS is the potential bottleneck and single point of failure of the centralized server that stores the metadata. Currently there are ways to manually failover to a secondary metadata server and the vendor is working to make failover an automatic feature, but at the time of publication this feature was not yet available.
Here are the individual reviews:
GlusterFS is a POSIX-like distributed file system currently being developed by Red Hat. In addition to being available as an open source storage solution, Red Hat is also increasingly integrating GlusterFS with Red Hat Enterprise Linux (RHEL) products such as the latest RHEL 6.5 release. GlusterFS can be used with commodity hardware as well as virtual and cloud resources.
It utilizes existing disk file systems like ext3, ext4, xfs and others to store data. GlusterFS, although owned by Red Hat, is not tied to Red Hat OS products; in fact we tested it with Ubuntu. The only requirement as far as the file system goes is that it needs to support extended attributes, which enables users to associate computer files with metadata not interpreted by the file system.
GlusterFS fits between traditional storage solutions like Network Attached Storage (NAS) and more expensive models such as SANs. It essentially allows for linear scaling, taking advantage of aggregated storage and memory. Scaling does not affect the user and with GlusterFS you can add and remove servers on the fly. The minimum server requirement to get started is dual-core CPU with 4GB RAM. For storage you can use JBODs (Just a Bunch of Disks), DAS (Direct Attached Storage). RAID is not required with GlusterFS open source, but RAID6 is required if GlusterFS is used as part of RHEL.
GlusterFS uses FUSE (Filesystem in Userspace) to hook itself into the Virtual File System (VFS) layer. FUSE is a mechanism that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space with the FUSE module providing a "bridge" to the actual kernel interfaces.
Unlike some other open storage solutions, GlusterFS does not use a distributed or centralized metadata model, but instead relies on a hashing algorithm, specifically what the vendor refers to as the Elastic Hash Algorithm, which manages the distribution method of the GlusterFS aggregated storage. It hashes the location (path/file name) using a unique identifier similar to md5sum. This means each time the same location is hashed it will produce the same result. The hash is not applied to a specific ‘brick’ (GlusterFS term for unit of storage) location, but rather a virtual volume. This allows the virtual volume to be applied to multiple bricks.
We tested version GlusterFS 3.3 by creating a basic cluster consisting of two servers running Ubuntu Server version 12.04. We wanted to test the product using a non Red Hat OS and also keep the test environment open source from A to Z. Once we had the OS up and running, we needed to configure the disks before downloading and installing GlusterFS by setting up at least one brick (a unit of storage used as a GlusterFS building block) on each server.
Once the disk was prepared, installing GlusterFS was a straightforward operation that involved issuing a few commands from the command prompt. However, getting to the correct commands for our installation type was not so straightforward, as installation parameters vary significantly depending upon the setup environment (virtual, bare metal or cloud install). We would have liked to see more comprehensive, step-by-step instructions broken out for each type of installation. We also seemed to handicap ourselves a bit by using Ubuntu, since the how-tos and instructions were all Red Hat/Fedora/CentOS-centric.
After completing the installation, we created a trusted pool using our two commodity Ubuntu servers. This was done by issuing a single command on each of the servers that were part of the storage pool. Next we created a GlusterFS volume by issuing a simple "volume create" command. A volume is a group of bricks "passed through" translators and presented to the end user as the actual share.
When creating the volume you specify several parameters such as the number of servers to replicate across and also the location of the storage bricks on each server. Also there are a couple of daemon instances that need to run at least once on all peers in a cluster. Glusterd is the elastic volume management daemon and it runs only once on each peer. The glusterfsd is the brick daemon that manages the bricks, it runs once for each brick and there may be multiple running on a given peer depending on the number of bricks.
Once a GlusterFS volume has been created and started, it can be accessed via NFS, CIFS or with the native GlusterFS client. The native client is FUSE-based and is the recommended method as it provides high performance and transparent failover on Linux clients. GlusterFS volumes can be mounted manually or automatically by configuring the ‘fstab’ file, which applies to either the native client or NFS v3. We installed the GlusterFS client piece on a separate Linux desktop and mounted our test volume manually from the command line. Installing the client requires only a few commands, but first you need to download and install the GlusterFS software on each client.
The GlusterFS Command Console is a command line utility that is used to configure and manage the GlusterFS storage environment. On the server storage side, most of the management is accomplished by issuing various commands at the prompt, using a "gluster" prefix. Commands can either be entered as gluster + the command, or you can switch to the gluster shell by typing "gluster"; that way you do not need to type gluster + the command each time. As an example, adding servers to a storage pool is accomplished with a simple "gluster peer probe servername" command.
Currently, GlusterFS does not come with a native Web or other management GUI, however oVirt, a third-party open-source virtual management tool, provides support for GlusterFS in its current release. There are also some third-party non-GUI tools available, such as the open source Puppet-Gluster, which allows administrators to streamline the GlusterFS installation by automating some of the installation tasks (install packages, partition/format bricks). It can also help manage the glusterd service, open firewalls and create volumes.
GlusterFS supports geo replication, replication over LANs or WANs. GlusterFS claims it can handle up to 72 brontobytes of data. This of course would be a function of the available “commodity” resources such as hard drives large enough as well as adequate network, CPU and RAM capabilities. In the public cloud, GlusterFS is currently supported on Amazon Web Services (EC2 and EBS).
Ceph
Originally developed as a doctoral dissertation and funded by the Department of Energy and National Nuclear Security Administration, Ceph is currently in its fourth release since 2012. Ceph’s Reliable Autonomic Distributed Object Store (RADOS) provides object, block and file system storage in a single cluster. The Ceph Filesystem (Ceph FS) is a POSIX-compliant filesystem that uses a Ceph Storage Cluster to store its data.
+ ALSO ON NETWORK WORLD 26 helpful open source network management tools +
Like GlusterFS, Ceph relies on an algorithm to compute the location of data in a cluster. Both the Ceph clients and OSD daemons use the CRUSH (Controlled Replication Under Scalable Hashing) algorithm, specifically developed for placement of replicated data, instead of relying on a centralized lookup table. Ceph OSD daemons store all data as objects in a flat namespace (e.g., no hierarchy of directories).
Although Ceph is designed to run on commodity hardware, its metadata servers do require quite a bit of CPU power to distribute the load. Ceph recommends quad-core processors for the metadata servers and dual-core processors for the OSD daemon which stores the actual data. As for RAM, 1GB per OSD daemon is recommended on the metadata server and at least 500MB on the OSD daemon itself. However, during recovery operations a lot more memory may be needed and Ceph recommends 1GB per 1TB of storage per daemon. As for storage, a minimum of 1TB on each OSD daemon is recommended.
Ceph recommends a cluster with three storage nodes plus an admin node as a good starting point and that’s what we proceeded with, using several Ubuntu 12.04 servers to create our test cluster. There aren’t too many prerequisites to get started and Ceph provides a handy online preflight checklist. First we needed to install an SSH server (openssh).
We also created a password-less key and copied this to all nodes. It is also recommended that you set up a user with root privileges on each of the nodes. Then it was time to add Ceph packages to the repository on the admin node and install the "ceph-deploy" tool. "Ceph-deploy" is a stand-alone method of deploying and de-commissioning Ceph clusters.
It should be noted that we ran into some issues when trying to run the update and deploy command as the "precise" release packages were not found. In order to move beyond this critical roadblock we had to perform a series of web searches in order to find suitable workarounds, which involved changing some of the configuration files. After some trial and error, we were able to get back on track, but this definitely caused some time delays.
Once the preflight was completed, we configured the Ceph node cluster and installed the Ceph software on each of the nodes. Using a series of commands with the aforementioned "ceph-deploy" tool, we also made one of our nodes an Admin node and two of the other nodes OSD daemons. Once a cluster has been created it can be automatically started with "sysvinit" or "upstart", depending on which Linux flavor you’re running.
Ceph data is stored and replicated across the cluster dynamically. A Ceph cluster stores data in pools, which is essentially a logical group. The CRUSH map is at the core of a Ceph cluster; it is used to determine how data is stored and retrieved and allows clients to interact directly with the OSD daemons without going through a centralized server. When first setting up a Ceph cluster, a default storage map is created, but for larger clusters you can customize the map to improve performance.
According to Ceph, the most common use of Ceph block devices is to provide block images to virtual machines. Block devices can also be used by kernel modules and cloud-based systems through OpenStack and CloudStack. In order to use a Ceph block device, a Ceph client needs to contact one of the Ceph monitors before it can access the cluster to read and write data. A user can mount Ceph as a provisioned block device using Ceph’s object storage system. The Ceph block device is currently supported on virtualization platforms such as OpenStack and OpenNebula.
In order to use the Ceph filesystem, at least one Metadata Server needs to be created with a "ceph-deploy" command. Importantly, Ceph does not currently recommend using CephFS for production data. Mounting CephFS on a client can be done both manually and automatically using a kernel driver or FUSE (file system in user space). We were able to mount CephFS on one of our test machines by adding a configuration parameter to the file systems table using a command similar to the following:
id=nwwuser,conf=/etc/ceph/cluster.conf /mnt/ceph2 fuse.ceph defaults 0 0
Applications can access the Ceph RADOS (reliable autonomic distributed object store) through software libraries for Java, Python, PHP, C and C++. RADOS also provides a RESTful interface that works with Amazon S3 and Openstack APIs.
Review: Cloud storage
One alternative to buying expensive storage-area networks or other hardware-based dedicated storage...
Earlier this year we tested Network Attached Storage (NAS) appliances. Now we're reviewing...
It takes something different to stand out in the crowded network-attached storage market. How does...