CrateDB: The IoT and machine data-focused database

CrateDB looks like a valuable addition to the growing list of database tools out there

There’s been a whole bunch of conversation in the database world in recent years around what the best type of database is for modern applications. Over the past couple of years this has mainly centered around the SQL verses NoSQL wars.

On the one hand are the traditional SQL-based databases, which all follow a traditional row and column format. These are the databases that have existed since pretty much year dot and have proved themselves to be good all-around tools.

+ Also on Network World: IT wants (but struggles) to operationalize big data +

With the advent of social media and the need for database approaches that worked well within the unstructured data landscape that these properties work within has led to the rise of the NoSQL databases. These databases don’t follow, or at least don’t only follow, the standard tabular approach towards data. Hence storage and retrieval of data doesn’t follow the rigid row and column, tabular approach.

However, in recent times, we have seen the SQL versus NoSQL discussion broaden somewhat, and people are looking to the actual usage patterns for the database and thinking about particular requirements.

An example of this is the CrateDB open-source distributed database developed by Crate.io. CrateDB promises to offer the scalability and performance of NoSQL with the power and ease of standard SQL. But in terms of usage focus, CrateDB was designed specifically to support IoT and machine data applications, and in another nod to more modern ways of doing things, Crate is optimized for containerized environments.

Crate.io announced the general availability of CrateDB today, and with the release, it hopes developers of machine data applications—formerly forced to work within a SQL world—will now have the opportunity to use SQL constructs, but within the context of machine data developments.

“The growth of machine data and the opportunities that businesses have to capitalize on it are outstripping the ability of their data management infrastructure to act on it,” said Jason Stamper, analyst, Data Platforms and Analytics, at 451 Research. “CrateDB’s power lies in its ability to enable users to collect and analyze vast amounts of data in real time, using SQL commands they already know.”   

CrateDB, despite becoming generally available just now, has seen some real traction thus far. Since its creation back in 2014, CrateDB has been downloaded more than 1 million times.

CrateDB's key differentiators

To the obvious questions around what the differentiation for developers is, Crate.io suggests that CrateDB’s unique capabilities are enabled by the following innovations:

  • Distributed SQL query engine for faster JOINs, aggregations and ad hoc queries: Columnar field caches and a fully distributed query planner enable CrateDB to perform complex queries in real time and overcome many of the performance and flexibility limitations of first-generation distributed SQL databases.

  • SQL with integrated search for data and query versatility: CrateDB is a unique combination of SQL and search technology, which enables a wide range of analytics, including machine learning and predictive analytics, on time series, full text, JSON, geospatial, and other structured and unstructured data—without having to use different database engines to do so.

  • Container architecture and automatic data sharding for simple scaling: Database scalability is vital for handling variations in machine data volume, but this is normally difficult to do. CrateDB can run as a cluster of containers, which enables it to be scaled easily with Docker, Kubernetes or Mesos container platforms. In addition, CrateDB automatically shards and redistributes data across the cluster as it changes size to optimize performance and high availability.

“When we founded Crate.io, we set out to reinvent SQL for the machine data era. Today, 75 percent of our customers use CrateDB to manage machine and IoT data because of its superior ease of use, performance and versatility,” said Christian Lutz, CEO of Crate.io. “The general availability of the product and our expansion to San Francisco mark a new phase in our growth, and we look forward to driving further innovation of the platform both internally and by extension through the open source community.”

CrateDB seems to have some good pick-up from some top-shelf vendors. A good example is Skyhigh Networks, a cloud access security brokers (CASB) that has seen lots of enterprise adoption and, hence, would be unlikely to trust any part of its operation on an unproven offering.

Says Sekhar Sarukkai, co-founder, chief scientist and vice president of engineering, at Skyhigh Networks:

“More than 40 percent of the Fortune 500 customers depend on Skyhigh to help address their cloud security needs. CrateDB is an important part of our data stack, giving us the performance and horizontal scalability to meet our rapidly growing business needs.”

CrateDB looks like a valuable and interesting addition to the ever-increasing list of database tools out there.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Now read: Getting grounded in IoT