Skip Links

A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak

By Sergey Bushik, senior R&D engineer at Altoros Systems Inc., special to Network World
October 22, 2012 04:26 PM ET

Network World - "The more alternatives, the more difficult the choice." -- Abbe' D'Allanival

In 2010, when the world became enchanted by the capabilities of cloud systems and new databases designed to serve them, a group of researchers from Yahoo decided to look into NoSQL. They developed the YCSB framework to assess the performance of new tools and find the best cases for their use. The results were published in the paper, "Benchmarking Cloud Serving Systems with YCSB."

The Yahoo guys did a great job, but like any paper, it could not include everything:

● The research did not provide all the information we needed for our own analysis. 
● Though Cassandra, HBase, Yahoo's PNUTS, and a simple sharded MySQL implementation were analyzed, some of the databases we often work with were not covered. 
● Yahoo used high-performance hardware, while it would be more useful for most companies to see how these databases perform on average hardware.

IN THE NEWS: MySQL users caution against NoSQL fad

As R&D engineers at Altoros Systems Inc., a big data specialist, we were inspired by Yahoo's endeavors and decided to add some effort of our own. This article is our vendor-independent analysis of NoSQL databases, based on performance measured under different system workloads.

What makes this research unique?

Often referred to as NoSQL, non-relational databases feature elasticity and scalability in combination with a capability to store big data and work with cloud computing systems, all of which make them extremely popular. NoSQL data management systems are inherently schema-free (with no obsessive complexity and a flexible data model) and eventually consistent (complying with BASE rather than ACID). They have a simple API, serve huge amounts of data and provide high throughput.

In 2012, the number of NoSQL products reached 120-plus and the figure is still growing. That variety makes it difficult to select the best tool for a particular case. Database vendors usually measure productivity of their products with custom hardware and software settings designed to demonstrate the advantages of their solutions. We wanted to do independent and unbiased research to complement the work done by the folks at Yahoo.

Using Amazon virtual machines to ensure verifiable results and research transparency (which also helped minimize errors due to hardware differences), we have analyzed and evaluated the following NoSQL solutions:

● Cassandra, a column family store 
● HBase (column-oriented, too) 
MongoDB, a document-oriented database 
● Riak, a key-value store

We also tested MySQL Cluster and sharded MySQL, taking them as benchmarks.

After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics.

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News