MIT baby-talk project spawns massive IP SAN
MIT Media Lab builds a 1.4-petabyte SAN to study how babies learn to talk.
By Lucas Mearian
,
Computerworld
, 05/16/2006
- Share/Email
- Tweet This
- Print
Imagine a storage array with capacity that's equivalent to a stack of iPods three times the height of the Empire State Building
but that can be managed with common Ethernet networking tools, and you'll get what a group of MIT scientists and four storage
vendors are in the process of building.
The storage array will support an MIT Media Lab project called the Human Speechome Project that is studying how babies develop
the ability to talk. The project began three months ago when MIT associate professor Deb Roy began recording his baby's everyday
life through the use of 14 fish-eye lens cameras set up throughout his house, giving researchers a bird's-eye view of every
room.
In order to store and then process the video and audio data, a massive storage-area network (SAN) was needed to archive and
search what is expected to be 1.4 petabytes of data, or 1,400TB of data, over the span of the three-year project.
The SAN is being built from commodity hardware and uses a 10 Gigabit Ethernet IP network for data transfer between the backend
SAN and hundreds of servers.
"I think here what we're seeing is what the future of storage is going to be like. This is a great marriage between industry
and the academic world," said Frank Moss, director of the Media Lab and a former CEO of Tivoli Systems, a maker of storage
management software now owned by IBM.
Moss spoke at a press conference held Monday at MIT's Media Lab in Cambridge, Mass.
The Human Speechome Project computing infrastructure is expected to be composed of more than 300 Hammer Z-Rack storage enclosures
from Bell Microproducts, about 3,000 SATA (Serial Advanced Technology Attachment) hard disk drives from Seagate Technology
and more than 100 10 Gigabit Ethernet switches and 400 blade processors from Marvell Technology Group Ltd.
The high-throughput switches are needed for the storage I/O anticipated by researchers who believe they'll be processing 700TB
of data during every 12-hour analytical run. To achieve the desired performance requirements, 150-drive stripes (aggregated
virtual volumes) will be created using the native virtualization capabilities of Bell's Z-SAN. Protection against data loss
will be delivered through RAID 10 mirrors (duplicate copies) of the raw video data, transform data, and metadata files.
"Our approach allows us to eliminate a lot of cost by using high volume, commonly available systems," said Jeff Greenberg,
senior director of product marketing at Zetera, the vendor designing the SAN.
The project has been amassing several terabytes of audio and video data per week of early childhood learning and socialization
data in order to model human language acquisition.
"If you take all parallel tracks of data over three years you'll have 400,000 hours of video and audio data," Roy said.
Roy said an application the university built allows researchers to quickly hone in on video and audio streams that involve
his child's development while avoiding video playback of empty rooms or footage of mundane tasks, such as getting a drink
of water or making coffee.
For more enterprise computing news, visit Computerworld. Story copyright Computerworld, Inc.
Partner Content
www.bmc.com
Gartner 2009 Magic Quadrant for Job Scheduling
Gartner has positioned BMC CONTROL-M in the Leaders Quadrant of their "2009 Magic Quadrant for Job Scheduling." The report assesses the ability to execute and completeness of vision of key vendors in the marketplace. Read a full copy today, courtesy of BMC Software.
Download whitepaper
Dell's SMART Approach to Workload Automation
Read a compelling case study by EMA, Inc. to learn how Dell uses BMC CONTROL-M to cut cost and increase productivity with workload automation.
Download whitepaper
Workload Automation Cost Savings 2 Minute Video
A major computer manufacturer uses BMC CONTROL-M and just four people to schedule and run over 85,000 jobs every month. By switching to BMC CONTROL-M, they more than quadrupled the workload without adding a single staff member. See how in this 2-minute video overview.
Go to video
Comment