• United States
Executive Editor

Users turn to virtual data marts

Apr 19, 20047 mins
Enterprise ApplicationsIBM

As companies work to integrate new and legacy application systems, the cost of that effort continues to climb – already it consumes about 35% of the total cost of installing, writing or modifying an application, according to Gartner.

IT departments are under pressure to make operational data trapped in transactional systems available faster and to a broader audience of users. The problem is spawning new products from IBM, Oracle and BEA Systems – and a crop of start-ups including AvakiCenterBoard and Composite Software – that help companies glean business information on the fly.

So-called data integration tools let users run analytic queries against distributed data sources without having to replicate the data or alter existing application sources. Queries run over corporate networks, polling internal and external systems. The software then builds federated or virtual databases that provide access to structured and unstructured data, including XML documents, e-mail and multimedia files, as if it were stored in one place.

The appeal of building such virtual data marts is catching on among users: The data integration market grew 3.3% in 2002 – a year in which worldwide software spending deceased for the first time in the market’s history, according to IDC. Data integration software revenue, now about $1 billion, is expected to near $1.5 billion by 2007 with compound annual growth of more than 8%, the research firm says.

Meanwhile, vendors are improving their data integration wares. IBM is preparing to release a new version of its year-old DB2 Information Integrator software in the second half of the year. The release will incorporate search technology the company is developing, code-named Project Masala, to help users more easily cull data from a broader range of enterprise content, including intranets, extranets, Web sites, relational database systems and content repositories.

Among the start-ups, Avaki last week announced Avaki 5.0, which adds features that let users access third-party software, such as data-cleansing and business-intelligence tools, and make them available as services within Avaki’s data grid.

For users, data integration offers an alternative to – though not necessarily a replacement for – traditional enterprise application integration (EAI) and data warehousing.

EAI software, such as that from Tibco, SeeBeyond and webMethods, is used to link application interfaces and shepherd requests for data among systems. It tends to be more focused on secure message delivery and business process management than on examining data.

A data warehouse involves a central repository for the data that business applications collect; the warehouse typically is populated through scheduled batch processes that copy data from enterprise systems to the central repository.

Rather than copying data from reference sources, data integration tools maintain a central directory that provides a map of data formats and locations; links between systems are set up and broken down as data aggregation sessions are executed.

Comparing technologies

Data integration tools are a complement to data warehouses, not a replacement. Here’s how they stack up in several categories.
Data integration softwareData warehouse software
Applications Best suited for unplanned, varying queries. Best suited for frequent, planned queries.
MethodologyLeaves data in its place.Copies data to central repository.
TimelinessCan poll sources in real time.Only as current as the latest batch upload.
Query performanceData is remote; computations depend on network availability.Data is local, enabling fast computations.

Data warehouses are best suited for planned analytic queries that are executed on a regular basis, says Eric Austvold, research director at AMR Research. A data warehouse typically is designed with specific queries in mind and programmed to collect data relevant to those queries, he says.

The strength of data integration tools is in answering questions that are unplanned and not repeated regularly. “The beauty of information integration is being able to answer questions where the data has yet to be organized,” Austvold says.

With data integration tools, a user can formulate a question, devise a query and get the results immediately. Conceptually, the idea makes sense, especially considering recent advances in the speed and capacity of networks, hardware and databases, he says.

But skeptics ask whether these tools place an unreasonable burden on computing systems. “You don’t want to send a query and have the lights dim,” Austvold says.

Pacific Capital Bancorp is rolling out software from Avaki that consolidates, transforms and publishes data from the bank’s existing databases, operational systems and data warehouse. It did consider the impact of the software on network performance, says Scott Matthew, vice president in the office of technology and applied practices at the Santa Barbara, Calif., bank.

But Pacific Capital found performance was no more of an issue than any new software addition – the bank is mulling other bandwidth-intensive applications, such as VoIP, Matthew says. “It’s always a balancing act with networks,” he says.

On the plus side, because Avaki’s specialty is access – making data from multiple, heterogeneous sources available to applications – it doesn’t take on the entire computational burden. Avaki leaves some of the fine querying to specialists, such as data cleansing or business activity monitoring vendors. So the bank’s data engines – its business intelligence software from Informatica, for example – still do some of the computational work, and Avaki simply makes the results available to other sources, Matthew says.

The opportunity to reuse analytic resources was a big draw for Pacific Capital. Avaki lets users create “data services” that are similar to the concept of a stored procedure on a database engine. Avaki centralizes these stored procedures so multiple applications can use them, Matthew says.

Kawasaki uses IBM’s DB2 Information Integrator to streamline some of its inventory operations. With it, the Irvine, Calif., company has been able to speed the time it takes to ship parts to its 8,000 dealers by reducing the time it takes to pass transactional data from its mainframe to warehouse IT systems.

Instead of waiting for a batch process to run overnight, DB2 Information Integrator pulls orders from Kawasaki’s mainframe soon after dealers place them, says Victor Martinez, manager of data administration and information access services group at Kawasaki.

Using information integration technology is a shift for Kawasaki. The company initially started building traditional data marts, but found it couldn’t keep up with demand. As users became aware of the analytic data available from data marts, requests to build more data marts started flowing, Martinez says.

DB2 Information Integrator provided an alternative. “It allowed us to pierce the veil of the mainframe, go in and grab the data we needed without moving it to a data mart,” he says. “We can bring up five virtual data marts in the time it took to build one data mart.”

For Sutter Health, data integration provided an alternative to costly platform migrations. The Sacramento, Calif., healthcare organization needed a way to provide a single, current view of each patient, including medical history. But it didn’t want to have to migrate dozens of patient systems, spread across its 30 hospitals, to one format.

Sutter maintains more than 100 databases; records pertaining to any one patient might be in 50% to 60% of them, says John Hummel, CIO at Sutter Health. “It sounds simplistic enough, the idea of centralizing patient data,” Hummel says. “But it’s extremely difficult to get all the disparate pieces of information together and have them make sense.”

To accomplish that, Sutter is rolling out software from Initiate Systems, which specializes in customer data integration for healthcare companies. Initiate’s Identity Hub software links data from disparate systems without requiring changes to existing records.

That’s a key cost advantage, Hummel says. Other healthcare organizations have paid up to $1 million per hospital to retrofit systems, he says. With Identity Hub, Sutter is spending about $4 million, or $1 per patient, across the health group’s 30 hospitals to create a master patient index.