Look out Hadoop, there is a new/old kid in town who promises to handle the big data problem better than you can. HPCC (High Performance Computing Cluster) Systems from LexisNexis has been evolving and growing for over 10 years in the pressure cooker environment of LexisNexis. Handling terabytes and petabytes of data, HPCC has been honed to handle the biggest data needs. Now the engine that runs one of the biggest data jobs in the world is being open sourced by LexisNexis and made available to everyone. I had a chance to sit down today with Armando Escalante, Senior Vice President and Chief Technology Officer, LexisNexis Risk Solutions to discuss this further.
Living and working in Boca Raton, Florida, it is not often that I get to meet with the subjects of my articles in person, unless I am at a tech conference. But lo and behold, NexisLexis has a major data center and office right here in Boca and Armando is based here. So I got the full tour and actually met several key members of the HPCC team. Besides Armando, I met with David Bayliss, Chief Data Scientist and "father" of ECL (Enterprise Control Language) which he co-developed with Gavin Halliday and which HPCC runs on. I also met Stu Ort, Director Software Engineering, David Hof, Director of Business Development and Kristina Grammatico, Director of Public Relations.
I had a full tour of the LexisNexis data center here where HPCC has been handling the demanding big data needs of the company for years. So after all these years of using HPCC for their own supercomputer data handling needs, why has LexisNexis decided to release their engine? Simple, they realize what many in the open source community already knew. By opening up the code, the continued development and evolution of HPCC will be accelerated.
Armando and his team have watched for the past 3 or 4 years as Hadoop has continued to make progress. At first they didn’t think it would amount to much. But with the support of the community, Hadoop has made tremendous progress. It is not nearly as mature as HPCC yet according to Escalante, but David Bayliss and Stu Ort saw that if LexisNexis didn’t do something in a few years it could surpass HPCC. With all of the years of work and the millions of dollars in resources sunk into HPCC, Armando knew that he had to open HPCC up to compete and keep its edge. Plus Escalante says competition is the American way. With choice in the market between Hadoop and HPCC, each solution will have to evolve and grow to be successful. Armando welcomes the competition. He has been riding his horse for a long time and he knows he has a winner. So does the rest of the HPCC team.
This is not some new venture funded start up. LexisNexis is a major company with some of the biggest data needs in the world. The team has developed and continued to refine HPCC to exceed those demands. They are confident it already does and will handle the biggest data jobs. HPPC is actually made of several components that the team had developed over the years. The two main parts are Thor and Roxie. Thor is the engine and is the direct Hadoop equivalent. Roxie delivers the data. The entire project runs the ECL language. There are various other modules that Ort and Richard Chapman have developed as well. Overall HPCC is a rich environment that is battle tested. There is much more to the technology which you can read at the web site. For instance there is a Roxie ECO IDE graphical user interface. For a good comparison of HPCC to Hadoop you can click here. There are Roxie Pipes which will let inter-cloud communication work.
You can tell HPCC has been put through its paces. As Armando told me, “we may be new to the open source software world, but we are not new to big data”. HPCC is responsible for 90% of the multi-billions of dollars in revenue that LexisNexis generates.
While offering the open sourced community version of the product, HPCC Systems also offers a commercially licensed version that includes services, support, hosting options and other modules not available in the free version. The company thinks that HPCC will in and of itself become a major product with both private and public sector customers.
It already handles a half a trillion records every 8 hours or so. With that kind of documented performance who is going to argue? So now the gauntlet is thrown down, may the best big data solution win.
As co-founder and Managing Partner at The CISO Group, Alan Shimel is responsible for driving the vision and mission of the company. The CISO Group offers security consulting and PCI compliance management for the payment card industry. Prior to The CISO Group, Alan was the Chief Strategy Officer at StillSecure. Shimel was the public persona of StillSecure as it grew from start up to helping defend some of the largest and most sensitive networks in the world.
Shimel is an often-cited personality in the technology community and is a sought-after speaker at industry and government conferences and events. His commentary about the state of security, open source and life is followed closely by many industry insiders via his blog and podcast, "Ashimmy, After All These Years" (www.ashimmy.com). Alan is now also a regular contributor to The CISO Group’s security.exe blog and podcast.
Alan has helped build several successful technology companies by combining a strong business background with a deep knowledge of technology. His legal background, long experience in the field, and New York street smarts combine to form a unique personality.
Disclosure: The CISO Group sells a software-as-a-service PCI compliance application called SAQPro. The company is independent and does not represent any other vendor's products as a reseller.
Policy on comments: Respectful discussion is welcomed! However comments that use inappropriate language, consist of name calling or personal attacks, or include accusations of wrongdoing are not appropriate. Those comments will be deleted or edited.
Copyright © 1994 - 2010 Computerworld Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.