Review: Spark lights up machine learning

Spark brings efficient machine learning to large compute clusters and combines with TensorFlow for deep learning

Become An Insider

Sign up now and get FREE access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content. Learn more.
At a Glance
  • Spark ML 2.01

    InfoWorld Rating
    Learn more
    on Apache Software Foundation

    Spark ML has a wide assortment of machine learning tools, but little in the way of deep learning.

As I wrote in March of this year, the Databricks service is an excellent product for data scientists. It has a full assortment of ingestion, feature selection, model building, and evaluation functions, plus great integration with data sources and excellent scalability. The Databricks service provides a superset of Spark as a cloud service. Databricks the company was founded by the original developer of Spark, Matei Zaharia, and others from U.C. Berkeley’s AMPLab. Meanwhile, Databricks continues to be a major contributor to the Apache Spark project.

In this review, I’ll discuss Spark ML, the open source machine learning library for Spark. To be more accurate, Spark ML is the newer of two machine learning libraries for Spark. As of Spark 1.6, the DataFrame-based API in the Spark ML package was recommended over the RDD-based API in the Spark MLlib package for most functionality, but was incomplete. Now, as of Spark 2.0, Spark ML is primary and complete and Spark MLlib is in maintenance mode.

To continue reading this article register now

Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.