• United States

Unstructured, yet essential

May 24, 20047 mins
Data CenterHTMLMicrosoft Word

Don’t overlook audio clips, Word documents and other desktop data when plotting your new data center storage strategy.

At Genesys Health System, CIO Dave Holland thought he had his storage problems licked. He would ditch cumbersome, costly departmental storage in favor of a next-generation enterprise architecture that would give him a big-picture view while using storage resources more wisely. Information life-cycle management (ILM) tools, for moving data from one storage tier to the next based on business value, featured prominently in his plan. He envisioned a day when all company data would move automatically, based on certain “enterprise parameters,” from a high-end EMC Symmetrix system to midlevel storage such as IBM’s FastT system and then to optical disk for the long term.

While the plan worked well for database-resident, structured data, Holland soon realized that it failed to account for the unstructured files critical to daily operations at the Flint, Mich.,company. These included electronic patient charts, and digital images such as X-rays and MRIs. “When we got started with this whole project, we really didn’t think of unstructured data. We really didn’t understand its value,” Holland says.

Spending time with physicians as they did their work brought the issue into focus. “I realized how much they looked at paper and how driven they were by those paper documents,” he says, referring to the patient charts that are then scanned and turned into electronic files. “I also realized how impossible it would be for me to convert all that data from unstructured content to structured content in order to make it available. So I said, ‘I’ve got to figure out a way to deal with unstructured data today because it’s how they work, and I can’t ignore that.’ “

From content to storage management

Corporations everywhere are finding that unstructured content – data that traditionally has been managed by content managers, not the storage administrator – is ballooning. Today, about 80% of a company’s content is unstructured – such as Word documents, PDFs, spreadsheets, digital images and audio clips, Enterprise Storage Group says. New federal regulations that mandate better access to corporate data are forcing the storage management issue.

“Content management systems employ databases to sort and order, provide access control, and search files, PowerPoints, documents, PDFs, whatever is in that system. But as you begin to get into issues of compliance, you need to think about things in a life-cycle manner,” William Hurley, a senior analyst at the Enterprise Application Group, says.

Geoffrey Bock, a senior consultant with Patricia Seybold Group, agrees. “As long as enterprise content management [ECM] systems were departmental in nature and were not necessarily concerned about maintaining the corporate memory of a company for many years to come, storage was not really an issue,” he says. “Now that we’re building [enterprise] content repositories, which are multiple terabytes in capacity, and now that we have to organize and store this content in a meaningful way, storage is becoming more of an issue.”

At Genesys, Holland is looking at an IBM software combination to integrate ECM and ILM. It already used IBM’s DB2 Content Manager ECM system to provide physicians access to electronic patient data 24 hours a day and is deploying Tivoli Storage Manager for ILM. By linking the two, Holland expects unstructured content will be moved and managed within the storage system along with the typical structured data.

But integrating ECM and ILM within the new data center might be hard. With departmental content management systems, IT executives might be contending with a variety of disparate content repositories. Also, first ECM implementations tend to be messy, Bock says. “They first need to straighten out that mess and then figure out what their storage architecture is,” he says.

Rising to the challenge

Users can soon expect help from vendors on the integration challenge. For instance, ECM vendor Documentum (now part of EMC), in March announced its acquisition of Xerox’s askOnce business unit. This gained it technology for building a virtual repository across sources such as other content management systems, enterprise applications and search engines.

“So now we can federate non-Documentum repositories as well as our own repositories, which means we can include things like Lotus Notes, FileNet and OpenText, into our federation and search across them, workflow across them and manage them,” says Dave DeWalt, Documentum president.

Partnering for content management

Enterprise content management providers and storage vendors are teaming over unstructured data. Activity includes:

  • EMC acquired Documentum, which created a separate business unit for enterprise content management (ECM).
  • Documentum acquired askOnce, a Xerox business unit with virtual repository technology.
  • Network Appliance and FileNet have teamed to integrate their respective information life-cycle management and ECM tools.
  • Veritas Software has partnered with companies such as search vendor Autonomy and FileNet as part of a data life-cycle management initiative.
  • HP has teamed with companies such as FileNet, Documentum and IXOS, which OpenText acquired.

Analysts say enterprise users should expect to see more partnerships and possibly acquisitions as the industry cements around this idea of integrating content management and storage management tools.

By integrating Documentum with EMC storage, the content repository becomes aware of its storage options, DeWalt adds. “We have the ability now to tag information in our repository and tell that information where to store it, how long to store it, when to destroy it, when to archive it, when to compress it and what to do to it,” he says.

That’s functionality about which business-to-business office supply firm Corporate Express is particularly pleased. As a longtime customer of both vendors, Corporate Express is working with EMC and Documentum to implement a better, more cost-effective way to store unstructured content, says Wayne Aiello, vice president of eBusiness Services at the Broomfield, Colo., firm.

Corporate Express uses the Documentum software to manage about 22 million customer invoices and reports, mostly unstructured PDF and XML files. It is seeing rapid growth in the amount of unstructured data within the company. Aiello attributes that growth in large part to the company’s use of Documentum to store those XML files and HTML and other Web content.

“Today, we store quite a bit of data in what I would call fairly expensive storage. We basically treat a lot of our data as production-level quality. And then we take very old data and archive it off. We want to get a more intelligent approach, have a more tiered approach to that,” Aiello says. “It’s very effective from a business perspective because the data is very accessible for quite a long period of time. But from a cost perspective, we stand to save quite a bit if we can learn to better archive it and put it on to the proper storage mechanism depending on the need. To do that without some sort of content management software like Documentum would be very difficult.”

Bob Terdeman, vice president and chief information architect at Rogers Communications, feels the same about the ECM-storage integration project he’s undertaken at the Toronto company. There, he says, more than 80% of the data is unstructured.

“[By integrating content management and storage tools], you’ll see a huge leveling in the growth of high-speed storage that we’ve been using for traditional requirements,” he says. “A great example is the number of documents, whether PowerPoints or Word, that are now sitting on conventional storage, that really belong in content-addressable storage. It could free up huge quantities of storage that could be returned to mission-critical use.”