• United States
by Salvatore Salamone

Data deluge

Feb 24, 20036 mins
Data CenterIBM

Specialized storage systems help life sciences firms manage fixed content.

Storage management isn’t easy for any industry, but biotech firms face some particularly vexing challenges. Research and diagnostic tools routinely generate huge amounts of data. Complicating matters is the need to store much of this data in a way that meets a range of regulatory requirements.

Storage management isn’t easy for any industry, but biotech firms face some particularly vexing challenges. Research and diagnostic tools routinely generate huge amounts of data. Complicating matters is the need to store much of this data in a way that meets a range of regulatory requirements. What’s more, some of this information needs to be kept for 35 years or more.

“We have eight mass spectrometer machines that produce 60 gigabytes of data per hour, per machine running around the clock,” says Lloyd Segal, president and CEO of Caprion Pharmaceuticals in Montreal. The company uses a mix of Sun  StorEdge T3 disk arrays and StorEdge L700 tape library systems. The online stored data is kept on the StorEdge T3 systems, which accounts for about 5 terabytes of capacity.

Life sciences firms face an array of regulatory requirements

Industrywide, biotech companies must deal with raw data that doubles about every six to 12 months, according to experts. Much of this data never changes. Most biotech research and development experiments generate lab results that, once produced, are simply kept on file somewhere. And data collected in drug clinical trials – including X-rays, medical history and patient reactions to drugs – is collected once and never modified.

All this data often must be retained for more than a decade if it is to be used as part of Food and Drug Administration new drug submission. This requirement to keep data for such a long time is a storage management challenge.

There have been no specific studies to determine what percent of biotech data does not change – so-called fixed content data. However, in general across all markets 75% of all new digital data is fixed content, according to Hal Varian, dean of the School of Information Management and Systems at the University of California, Berkeley.

For such long-term storage “there are lots of problems with tape and optical,” Varian says. “The [data storage medium] formats keep changing. And whenever you have a change in format, you have a big problem with data migration. It’s easier to have the data available on hard drives because migrating becomes a much smaller problem.”

A number of storage vendors recently have launched products that try to deal with this issue.

In December, IBM Storage Systems Group released a hardware and software bundle for sharing, managing and securing clinical trial patient information such as magnetic resonance imaging, electrocardiograms and other digital images. The  bundle includes IBM TotalStorage hardware, Tivoli Storage Manager software and hierarchical storage management software to manage data migration from network-attached storage and storage-area network devices to tape libraries. And several third-party document management vendors have built links to EMC’s  Centera storage systems to simplify the way data is retrieved.

Storage management problems were one reason sister companies Celera Genomics and Applied Biosystems overhauled their computing and storage infrastructure last year. The firms replaced a 100-terabyte storage system from HP and HP AlphaServer data center with EMC Centera systems and IBM eServer p60s.

Market composition: Pharmaceutical, genomic research and biotech companies, as well as academic and government laboratories.
Size: According to Ernst & Young, there are 1,457 biotechnology companies in the U.S. The publicly traded companies accounted for a market capitalization of $224 billion in 2002.
Average time/cost to develop a new drug: It takes between 12 and 15 years and $400 million to $800 million, reports The Tufts Center for the Study of Drug Development.
Worldwide biotech IT spending: IDC forecasts this to grow from $12.2 billion in 2001 to $30.6 billion in 2006.

The net gain in processing power in the switch from the AlphaServers to the IBM eServer p60s was minimal – total processing power increased from 1.7 teraFLOPS (1.7 trillion floating point operations per second) to 2 teraFLOPs. However, three EMC storage systems took the place of 20 HP/Compaq StorageWorks systems and other storage devices.

Within the company, the move is seen as a continuation of an evolving process to keep up with data storage demands while keeping management costs in check.

“We are trying to provide high data-throughput reliability and migrating to newer storage technology helps us meet this goal,” says a senior manager at Celera who couldn’t let his name to be used. “An added benefit of moving to newer technology is that the capacity of the systems chosen allows us to reduce the number of discrete storage devices we need to manage.”

As a result of this trend to handle the combination of longer-term storage and regulatory compliance, biotech companies are starting to see smarter storage systems, in general, and smarter storage networks, in particular. “Advanced functions, such as volume management and storage virtualization, can be implemented in the fabric,” says Dan Tanner, an analyst at Aberdeen Group. “Storage network buyers will soon find themselves evaluating storage applications and then considering which networks run them.”

That was the case for Quantum Diagnostic Imaging, a Dallas company that provides diagnostic imaging tests for referring physicians. The firm recently moved to PACSbuilder, a new digital imaging workflow application from Merge eFilm.

Merge eFilm bundles its application with EMC’s Centera storage systems. The imaging application taps into EMC Centera’s ability to manage long-term storage of fixed content data. The combination offloads many mundane management tasks, such as keeping track of specific locations of files.

Once the system stores an image, Centera gives it a unique identifier, which is all the application needs to know to retrieve that image. That means there’s no need to keep track of the specific drive, directory or disk volume to which an image is saved.

The benefit of the new system is that it lets radiologists and physicians more easily access medical images through a Web browser. “This system will help us maintain operational efficiency that will in turn help us deliver better patient care,” says Doug Schapiro, Quantum’s COO. “The combination of the [EMC and Merge eFilm products] will let us quickly deliver images to the physicians.”

Integral to this trend is the intimate linking of storage systems with the applications that generate or access the data. “We are dealing with data now that is fundamentally different than anything we were dealing with 10 years ago,” says Mike Poidinger, CEO of the Australian Genomic Information Centre at the University of Sydney. He and others note that because of the vast array of experimental techniques used in biotechnology, companies need help from application vendors to do more intelligent searches of this collection of disparate data types.