Skip Links

Network World

  • Social Web 
  • Email 
  • Close

(Comma separation for multiple addresses)
Your Message:

Greenplum touts super-quick data loading

By Tom Jowitt, TechWorld
March 17, 2009 10:30 AM ET
  • Share/Email
  • Tweet This
  • Comment
  • Print

Greenplum has released new technology which it says can speed the loading of data into large scale databases, without compromising overall performance.

San Mateo, California-based Greenplum provides a high performance database (DBMS) typically used in data warehousing and large-scale analytical processing (or business intelligence) applications. It powers the Sun Data Warehouse Appliance, and customers include the likes of Linkedin, Nasdaq, NYSE Euronext, Fox Interactive Media, and Myspace.

Data loading is rapidly becoming an issue for companies increasingly facing exponential data growth. "For many companies data loading is a bottleneck," said Ben Werther, director of product marketing at Greenplum. "Data loading is traditionally done at night, but more data and longer loading cycles, sometime means this extends into the working day."

"The amount of data is growing on a daily or weekly basis," said Paul Salazar, VP of corporate marketing. "Companies are seeking to gain competitive advantage from analysing the data they capture and they are also choosing to store more data about specific events."

Salazar said that if customers can gain field intelligence quickly, by shorten data loading times to a couple of hours instead of overnight or longer, then there is a definite competitive advantage to be had.

To this end, Greenplum has introduced technology it is calling MPP Scatter/Gather Streaming' (or SG Steaming for short). SG Streaming technology is available immediately with the Greenplum Database. It is included at no extra charge to Greenplum customers, and the company says it eliminates the bottlenecks associated with other approaches to data loading.

Indeed, Greenplum cites customers that are achieving production loading speeds of over 4TB per hour. "The loading capabilities of this database are remarkable," said Brian Dolan, director of research analytics at Fox Interactive Media. "We're loading at rates of four terabytes an hour, consistently."

"This is definitely the fastest in the industry," said Greenplum's Werther. "Netezza for example quotes 500GB an hour, and we have not seen anyone doing more than 1TB an hour."

According to Werther, Greenplum utilises a "parallel-everywhere" approach to loading in which data flows from one or more source systems to every node of the database without any sequential choke points. This differs from traditional "bulk loading" technologies, used by most mainstream database and MPP appliance vendors that push data from a single source, often over a single or small number of parallel channels, and result in fundamental bottlenecks and ever-increasing load times. Greenplum's approach also avoids the need for a "loader" tier of servers, as required by some other MPP database vendors.

The SG Streaming technology ensures parallelism by "scattering" data from all source systems across 100s or 1,000s of parallel streams that simultaneously flow to all nodes of the Greenplum Database. Performance scales with the number of Greenplum Database nodes, and the technology supports both large batch and continuous near-real-time loading patterns with negligible impact on concurrent database operations.

  • Share/Email
  • Tweet This
  • Comment
  • Print

Partner Content

Gartner 2009 Magic Quadrant for Job Scheduling

Gartner has positioned BMC CONTROL-M in the Leaders Quadrant of their "2009 Magic Quadrant for Job Scheduling." The report assesses the ability to execute and completeness of vision of key vendors in the marketplace. Read a full copy today, courtesy of BMC Software.

Download whitepaper

Dell's SMART Approach to Workload Automation

Read a compelling case study by EMA, Inc. to learn how Dell uses BMC CONTROL-M to cut cost and increase productivity with workload automation.

Download whitepaper

Workload Automation Cost Savings 2 Minute Video

A major computer manufacturer uses BMC CONTROL-M and just four people to schedule and run over 85,000 jobs every month. By switching to BMC CONTROL-M, they more than quadrupled the workload without adding a single staff member.  See how in this 2-minute video overview.

Go to video

Comments (1)
Login
Forgot your account info?

For the recordBy Anonymous on March 18, 2009, 6:53 pmNetezza's load throughput is understated in this article, it is currently 1TB per hour for uncompressed data.

Reply | Read entire comment

View all comments

Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed