Unbridled growth in data storage and the rise in Web 2.0 applications are forcing a storage rethink.
With storage growth tracking at 60% annually, according to IDC, enterprises face a dire situation. Throwing disks at the problem simply doesn't cut it anymore. Andrew Madejczyk, vice president of global technology operations at pre-employment screening company Sterling Infosystems, in New York, likens the situation to an episode of "House," the popular medical drama.
Click here for tips on how to deal with your growing volumes of unstructured data
"On 'House,' there are two ways to approach a problem. You treat the symptoms, or you find out what the root cause is and actually end the problem," Madejczyk says. "With storage, up until now, the path of least resistance was to treat the symptoms and buy more disks" - a method that surely would ignite the ire of the show's caustic but brilliant Dr. Gregory House.
Were the doctor prone to giving praise, he'd give a call out to enterprise IT managers who are rethinking this traditional approach to storage. He'd love that technologists are willing to go outside their comfort zones to find a solution, and he'd thrive on the experimentation and contentiousness that surround the diagnosis.
House probably would find an ally in Casey Powell, CEO at storage vendor Xiotech. "Everybody acknowledges the problem and understands it, but nobody's solving it. As technologists, we have to step back, look at the problem and design a different way," Powell says.
Optimizing the SAN
Today most organizations store some combination of structured, database-type and unstructured file-based data. In most cases, they rely on storage-area network (SAN) technologies to ensure efficiencies and overall storage utilization, keeping costs down as storage needs increase.
In and of themselves, SANs aren't enough, however. Enterprises increasingly are turning to technologies that promise to provide an even bigger bang for the buck, including these:
• Data deduplication, which helps reduce redundant copies of data so firms can shrink not only storage requirements but also backup times. (For a status report on data deduplication products, listen to this podcast now.)
• Thin provisioning, which increases storage utilization by making storage space that has been overprovisioned for one application available to others on an as-needed basis (see "Thin provisioning: Don't let the gotchas getcha,").
• Storage tiering, which uses data policies and rules to move noncritical data to slower, less expensive storage media and leave expensive Tier 1 storage free to handle only the most mission-critical applications.
• Storage resource management software, which helps users track and manage storage usage and capacity trends. (Compare Storage Resource Management products.)
"In the classic SAN environment, these tools don't just provide a partial solution. They allow you to make fundamental improvements," says Rob Soderbery, senior vice president of Symantec's Storage and Availability Management Group. Clients that have pursued these strategies even have been able to freeze storage spending for a year at a time, he says. "And when they get back on the storage spending cycle, they get back on at about half the spending rate they were at before," he adds. (Get more insight from Soderbery in this podcast, “Is storage resource management right for you?”).
Although few IT executives report such dramatic reductions in storage spending, many are pursuing such strategies.
At Sterling, for example, moving from tape- to disk-based backups via Sepaton's S2100-ES2 virtual tape library reduced the time it takes for nightly backups from 12 to just a few hours, Madejczyk says. Sepaton's deduplication technology provides an added measure. In addition, he has virtualized more than 90% of his server environment, "reducing our footprint immensely'" and implemented EMC thin provisioning and storage virtualization.
Still, his company's storage needs keep growing, Madejczyk says. "In this economy, Sterling is being very responsible and careful about what we spend on," he says. "We're concentrating on the data-management part of the problem, and we're seeing results. But it's a difficult problem to solve."
Tom Amrhein, CIO at Forrester Construction in Rockville, Md., has seen similar growth. The company keeps all data associated with its construction projects in a project management database, so the vast majority of that stored data is structured in nature. Regulatory and compliance issues have led to increased storage needs nonetheless.
"Most companies need to keep their tax records for seven years, and that's as long as they need to keep anything," Amrhein says. "But tax records are our shorter-cycle data. Depending on the jurisdiction, the time we could be at fault for any construction defect is up to 10 years - and we're required to have the same level of discovery response for a project completed nine years ago as we would for a project that's been closed out two weeks."
Forrester Construction has cut down a bit on storage needs by keeping the most data-intensive project pieces - building drawings, for example - on paper. "Because the scanning rate is so high and paper storage costs so low, retaining those as physical paper is more cost-effective," Amrhein says.
The real key to keeping costs in check, however, is storage-as-a-service, Amrhein says. IT outsourcer Connectria hosts the company's main application servers, including Microsoft Exchange, SQL Server and SharePoint; project management, finance, CRM and Citrix Systems. It handles all that storage, leaving Forrester Construction with a predictable, flat monthly fee.
"I pay for a set amount of gigabytes of storage as part of the SLA [service-level agreement], and then I pay $1 per gig monthly for any excess," Amrhein explains. "That includes the backup, restore and all the services around those. I'm paying $25K a month to Connectria, plus paying for about 10GB over my SLA volume. That overage is a wash."
For the firm's unstructured data, Forrester Construction uses Iron Mountain's Connected Backup for PCs service, which automatically backs up all PCs nightly via the Internet. If a PC is not connected to the Internet at night, the user receives a backup prompt on the next connection. (Read a Q&A with Iron Mountain Digital President John Clancy.)
"With 60% of the people out of the office, this is perfect for us," Amrhein says. "Plus, Iron Mountain helps us reduce the data volume by using deduplication," he says. "Even for people on a job site with a wireless card or low-speed connection, it's just a five- or 10-minute thing."
Still, the unstructured side is where the construction company sees its biggest storage growth. E-mail and saved documents are the biggest problem areas.
The rise in Web 2.0 data
Forrester Construction is not alone there. In the enterprise, IDC reports, structured, transactional data will grow at a 27.3% compounded annual rate over the next three to five years. The rise in unstructured, file-based data will dwarf that growth rate, however. IDC expects the amount of storage required for unstructured, file-based data to increase at an unprecedented 69.4% clip. By 2010, enterprises for the first time will find unstructured storage needs outstripping traditional, structured storage demands.
The rub here is that although SANs are extremely efficient at handling structured, transactional data, they are not well optimized for unstructured data. "SANs are particularly ill-suited to Web 2.0, scale-out, consumer-oriented-type applications," Symantec's Soderbery says. "No. 1, the applications' architecture is scale-out, so you have hundreds or thousands of systems working on the same problem instead of one big system, like you would have with a database. And SANs aren't designed that way. And No. 2, these new applications - like storing photos on Facebook or video or display ads or consumer backup data - are tremendously data intensive."
Symantec hit the wall with this type of data in supporting its backup-as-a-service offering, which manages 26 petabytes of data, Soderbery says. "That probably puts us in the top 10 or 20 storage consumers in the world. We could never afford to implement a classic Tier 1 SAN architecture," he says.
Instead, Symantec went the commodity path, using its own Veritas Storage Foundation Scalable File Server software to tie it all together. "The Scalable File Server allows you to add file server after file server, and you get a single namespace out of that cluster of file servers. This in turn allows you to scale up your application and the amount of data arbitrarily. And the software runs on pure commodity infrastructure," Soderbery explains. Plus, the storage communicates over a typical IP network vs. a more expensive Fibre Channel infrastructure.
Symantec's approach is similar to that of the big cloud players, such as Google and Amazon.com. "We happen to build packaged software to enable this, whereas some of the early adopters built their own software and systems. But it all works the same way," Soderbery says.
The prudent approach to storage as it continues to grow, Soderbery says, is to optimize and use SANs only for those applications that merit them - such as high-transaction, mission-critical ERP applications. Look to emerging commodity-storage approaches for more scale-out applications, such as Web 2.0, e-mail and interactive call-center programs.
Does that mean enterprises need to support SANs and new cloud-like scale-out architectures to make sure they're managing storage as efficiently as possible? Perhaps.
Eventually, however, the need to support unstructured, scale-out data will trump the need to support structured, SAN-oriented data, IDC research shows. With that in mind, smart organizations gradually will migrate most applications off SANs and onto new, less expensive, commodity storage setups.
A new enterprise approach
One interesting strategy could provide an evolutionary steppingstone in the interim: using Web services. Championed primarily by Xiotech, the idea is to use the Web-services APIs and standards available from such organizations as the World Wide Web Consortium (W3C) as the communications link between applications and storage.
"The W3C has a nifty, simple model for how you talk between applications and devices. It includes whole sets of standards that relate to how you provision resources in your infrastructure, back to the application," says Jon Toigo, CEO of analyst firm Toigo Partners International. "All the major application providers are Web-services enabled in that they ask the infrastructure for services. But nobody on the storage hardware side is talking back to them."
Nobody, that is, except Xiotech.
Xiotech's new Intelligent Storage Element (ISE) is the first storage ware to talk back, although other vendors quickly are readying similar technology, Toigo says. ISE, based on technology Xiotech acquired with Seagate Technology, is a commodity building-block of storage, supporting as many as 40 disk drives plus processing power and cache that can be added to any storage infrastructure as needed. Xiotech claims ISE can support anything from high-performance transactional processing needs to scale-out Web 2.0 applications. (Compare Storage Arrays.)
All storage vendors should work to Web-services-enable their hardware and software so they can communicate directly with applications, Xiotech's Powell says. This would preclude vendor lock-in and let enterprises build storage environments using best-in-breed tools instead of sticking with the all-in-one-array approach. "They'd be able to add more storage, services or management, without having to add everything monolithically to a SAN," Powell says.
Eventually Web services support will have virtualized storage environments realizing even greater efficiencies, to the point where applications themselves provision and de-provision storage. "Today, when we provision storage, we have to guess, and typically, we either over- or underprovision," Powell says. "And then, when the user is no longer using it, we seldom go back and reclaim the storage. But the application knows exactly what it needs, and when it needs it. Via Web services, it can request what it needs on the fly, and as long as its request is within the parameters and policies we set up initially, it gets it."
Web services already have proved an efficient storage tack at ISE user Raytown Quality Schools in Missouri, says Justin Watermann, technology coordinator for the school system. The system went with Xiotech shortly after it moved to a new data center and created an all-virtual server infrastructure. A big plus has been Xiotech's Virtual View software, which uses Web services to communicate with VMware's VirtualCenter management console for its ESX servers, Watermann says. He can manage his virtualized server and storage infrastructure from a single console.
"When you create a new data store, Virtual View shows you what port and LUN [logical unit number] is available to all of your ESX hosts in that cluster," Watermann says. "And when you provision it, it uses Web services to communicate with VirtualCenter, and says, 'OK, this is the data store for these ESX hosts.' And you automatically have the data store there and available to use. You don't even have to refresh or restore."