Talend Brings Premier Open Source Data Integration To Big Data

Hadoop integration and partnerships with Hortonworks and Lyris give Talend leg up in Big Data

Before my life was hijacked by RSA Conference week I had a chance to speak with Jim Walker, Director of Product Marketing and Yves de Montcheuil, VP of Marketing at Talend. Talend is a bright spot in the commercial open source world. They have taken open source software and built a very successful business around data integration.tools. They are a recognized leader in their field. Now Talend is tackling the Big Data market, bringing similar quality open source tools and services to the Hadoop world.

Talend first announced their entry into the big data market with the release of Talend Open Studio for Big Data. They licensed their product under an Apache license, the same as Hadoop itself. Talend also teamed up with Hadoop pioneer, Hortonworks announcing that Talend's Open Studio will now be bundled with the Hortonworks Data Platform. Now they just announced that Lyris who had already used Talend's data integration tools had selected Open Studio for Big Data to use on their big data projects.

There are many folks hoping that Talend can do for Big Data and Hadoop what it has done for data integration in general. A reoccurring theme I have heard recently around Hadoop is that it is too hard and not mature enough for many organizations. The tool sets are not developed or powerful enough and finally, there are not enough developers who can help. These are the kinds of things that could stop Hadoop's momentum dead in its tracks.

The Talend platform is not exclusive to Hortonworks. It will work with any Apache Hadoop distribution. According to Walker, the Talend Big Data platform provides:

  • Big Data Integration: Loading Big Data in Hadoop via HDFS, HBase, Sqoop or Hive is considered an operational data integration problem. Talend Platform for Big Data provides an intuitive set of graphical components and workspace that allows for interaction with a big data source or target without the need to learn and write complicated code.
  • Big Data Quality: Talend Platform for Big Data presents data quality functions that take advantage of the massively parallel environment of Hadoop. It enables developers to take advantage of the high performance processing environment to identify duplicate records across these huge data stores in moments not days. It also extends into profiling big data and other important quality issues as the Talend data quality functions can be employed for big data tasks. 
  • Project Optimization: With Talend Platform for Big Data, the ability to schedule, monitor and deploy any big data job is included, built on a shared repository, so that data analysts can collaborate and share project metadata and artifacts.

As I said earlier Talend's products are open source. They offer premium features and services and support over and above the open source projects. Talend also does a great job of supporting the community which is core to the companies success. For new companies eyeing an open source business model, Talend is a company they should watch.

In the meantime it will be interesting to see if they can do for Big Data what they have done for data integration in general!

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Now read: Getting grounded in IoT