Amazon Data Pipeline promises to ease management of data stored in multiple locations

Amazon focused on data management and enabling analytics

Amazon Web Services today launched Data Pipeline, a new tool designed to make it easier for users to integrate data stored in multiple disparate spots to manage and analyze it.

In addition to announcing Date Pipeline, AWS also announced two new services that could be ideal for processing big data and doing analytics. Both announcements came on the second and final day of AWS's first ever user conference in Las Vegas, named AWS re: Invent.

MORE AWS NEWS: Amazon launches cloud database with analytics tools, lowers S3 pricing 

MORE RE INVENT: 5 Things to watch for at Amazon's first user conference 

The news follows other data-related news from AWS yesterday when it announced Redshift, a cloud-based data warehousing tool. Data Pipeline is meant to be able to take data stored in Redshift, or AWS's other storage services, such as DynamoDB - the company's NoSQL database tool - or its Simple Storage Service (S3) and manipulate the data for easier management and exposure to analysis tools.

Data Pipeline has a drag-and-drop graphic interface lets users manipulate and glean insights from data stored either in AWS's cloud or on their own premise. For example during a demonstration, officials showed how a DynamoDB database can be configured to automatically replicate information into S3 or some business intelligence tool. "This is really meant to be a light-weight web service to integrate disparate data sets," says Matt Wood, AWS's Big Data guru.

The service rounds out AWS's storage and business intelligence options. Earlier this year AWS launched Glacier, a long-term storage service. At its user conference this week AWS announced that its S3 service now holds more than 1 trillion files, and Redshift was the highlight of AWS's announcement on the first day of the conference. AWS has also recently released a Big Data section of its AWS marketplace, which is a series of applications for business intelligence that are optimized to run in AWS's cloud.

In addition to the Data Pipeline news, AWS also announced two new instances types for its Elastic Cloud Compute (EC2) service, aimed specifically at helping users process large amounts of data. The Cluster high memory instance types comes with 240GB of RAM, and 2x 120GB of solid-state drive backed disk space. CTO Werner Vogels says these instance types are ideal for large scale in memory database analytics. The second is a high storage option, hs1.8xlarge, which comes with 117GB of RAM and 48TB of disk space. That news follows the announcement of new instance types the company launched just a few weeks ago, also aimed at high performance computing workloads.

Network World staff writer Brandon Butler covers cloud computing and social collaboration. He can be reached at and found on Twitter at @BButlerNWW.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2012 IDG Communications, Inc.