As researchers at The University of Texas MD Anderson Cancer Center work at "making cancer history," they're doing so with the help of compute power and storage capacity from a private cloud.
Public cloud vs. private cloud: Why not both?
But this is no ordinary cloud.
After all, when you're researching something as complex as the human genome you tend to think big, and MD Anderson's cloud reflects that type of ambition and scale. We're talking 8,000 processors and a half-dozen shared "large memory machines" with hundreds of terabytes of data storage attached, says Lynn Vogel, vice president and CIO of MD Anderson, in Houston.
A different path
And while MD Anderson's general server infrastructure uses virtualization, the typical foundational technology for cloud, this specialized research environment doesn't. Rather, the organization uses an AMD-based HP high-performance computing (HPC) cluster to underpin the research cloud.
"We're currently implementing the largest high-performance computing environment in the world devoted exclusively to cancer," says Vogel, who was recently named Premier 100 IT Leader honoree by our sister publication, Computerworld.
The data and processing capacity are available to the MD Anderson cancer researchers as needed, whether they're sequencing human genomes or investigating radiation physics, epidemiology, dosing calculations for radiation therapy or running simulations for clinical trial activities. About three dozen principal investigators, who each have anywhere from two to 10 assistants, regularly tap into the research cloud, Vogel says.
To access the cloud, they use a service oriented architecture-based Web portal called ResearchStation.
"When you look at the classic definition of cloud computing as enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released, that's in fact how we're approaching our environment," Vogel says.
Enterprise Cloud Services: The agenda
However, he notes, the MD Anderson cloud doesn't currently have a chargeback mechanism - an oft-cited but, at this point, little used cloud attribute. "We don't require a chargeback mechanism because we manage demand largely by a peer review process. The actual determination of priority for using resources is driven by clinicians and researchers themselves, not by IT people," Vogel says.
What this means, he adds, is that he never needs to plead a case for, say, more storage. "They're the ones going to executive management, saying, 'You really have to increase the capacity of this capability or that capability for us to continue to do our work and maintain our rating as one of the top cancer centers in the world," Vogel explains.
More, more, more
In addition, MD Anderson doesn't experience the typical up and down spikes in usage that other enterprises might encounter.
"We find that both the clinicians and researchers in the field of medicine have what I would label 'an insatiable demand' for computing resources, and the demand curve just keeps going up," Vogel says.
He notes that the 8,000-processor HPC sitting at the heart of the private cloud already operates at 80% to 90% capacity, as did its predecessor, a mere 1,100-processor machine. Memory-intensive applications rely on six 512-GHz, 32-CPU servers.
The cloud build-out at MD Anderson dovetails with the organization's expansion into a third data center, due to open this summer.
This will be the second new data center the organization has opened in a four-year period - and these are good-sized operations, with 12,000 to 15,000 square feet of raised floor in each, Vogel says. "We thought our second data center would last us four to five years, but it was full within 18 to 20 months. We had to turn our disaster recovery site into a production data center as we built another one," he adds.
The MD Anderson data centers house roughly 3 petabytes of data, a "somewhat surprising amount," Vogel says, since the cancer center is primarily a 500-bed hospital. But the volume of research data, at about 1.4 PB, now exceeds the amount of clinical data at MD Anderson.
"Anybody who looks at genomic medicine and the sequencing of human genomes begins to realize that there's a tsunami of data coming out of those processes," he notes. "So, ironically, today at MD Anderson we have more data storage capacity devoted to research than we do for clinical care, and that includes all of our images. We're being hit by extraordinary amounts of data that needs to be managed and stored."
To handle the cloud's storage requirements, MD Anderson uses an HP-Ibrix system that supports extreme scale-out. It chose the Ibrix system because of its reliability and its ability to present storage seamlessly over Ethernet or InfiniBand, using CICS, FTP, HTTP, the Linux client, NFS and other technologies, Vogel says. "This capability also enables us to do data tiering through the cluster," he adds.
Manageability also has been a boon. "Having HP as the end-to-end vendor ensures that all parts will fit together and fit into our monitoring system without any clashes," Vogel says.
While MD Anderson uses HP Storage Essentials and CIM to manage each storage unit, it relies on the Ibrix management server, Fusion Manager, for a top-level view. Each server also reports into Fusion Manager, Vogel says.
"As an added bonus, and very much a consideration in a constrained healthcare personnel environment, is the ability to operate our entire cloud configuration with minimal personnel involvement - just two people," Vogel says.
Public cloud: Not on your life
Vogel says he's talked to some public cloud providers who would love to host those MRIs, CT Scans and other clinical images - more than 1 billion of them - within their infrastructures. But no can do, he says.
"We've looked into this, but quite honestly, we've found on performance, access and in the management of that data, going to a public cloud is more risky than we're willing to entertain," Vogel says. "This goes directly to the point that this is identifiable patient data ... and we're just not comfortable with the cloud given the actionable capability of a patient should there be a breach."
What's more, public cloud providers simply can't provide the level of business knowledge that MD Anderson's IT staffers can, some of whom are PhD scientists themselves, Vogel says.
"When you're in the business of biology, which we are, it's a different ballgame in terms of understanding the structures of data, the kinds of access and models used, and the applications that need to be available," Vogel says. "As much as public cloud providers would like us all to believe, this is not just about dumping data into a big bucket and letting somebody else manage it."