Skip Links

A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak

By Sergey Bushik, senior R&D engineer at Altoros Systems Inc., special to Network World
October 22, 2012 04:26 PM ET

Page 4 of 6

Sharded MySQL showed the best performance in reads. MongoDB -- accelerated by the "memory mapped files" type of cache -- was close to that result. Memory-mapped files were used for all disk I/O in MongoDB. Cassandra's key and row caching enabled very fast access to frequently requested data. With the off-heap row caching feature added in Version 0.8, it showed excellent read performance while using less per-row memory. The key cache held locations of keys in memory on a per-column family basis and defined the offset for the SSTable where the rows were stored. With a key cache, there was no need to look for the position of the row in the SSTable index file. Thanks to the row cache, we did not have to read rows from the SSTable data file. In other words, each key cache hit saved us one disk seek and each row cache hit saved two disk seeks. In HBase, random read performance was slower. However, Cassandra and HBase can provide faster data access with per-column-family compression.

* Workload B: Update. Thanks to deferred log flushing, HBase showed very high throughput with extremely small latency under heavy writes. With deferred log flush turned on, the edits were first committed to the memstore. Then the aggregated edits were flushed to HLog asynchronously. On the client side, HBase write buffer cached writes with the autoFlush option set to true, which also improved performance greatly. For security purposes, HBase confirms every write after its write-ahead log reaches a particular number of in-memory HDFS replicas. HBase's write latency with memory commit was roughly equal to the latency of data transmission over the network. Cassandra demonstrated great write throughput, since it first writes to the commit log -- using the append method, which is a pretty fast operation -- and then to a per-column-family memory store called Memtable.

* Workload C: Read-only. Settings for the workload: 
1) Read/update ratio: 100/0 
2) Zipfian request distribution

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News