Microsoft to relieve 'Excel hell' with Web crawler for enterprise data

Project Barcelona at Microsoft creating metadata information server

Microsoft's Project Barcelona is creating Web-like indexing tools to manage the explosion of enterprise data.

Business data is growing so fast that the task of managing it all is becoming nearly as complicated as indexing the Web, and new technologies are needed to help enterprises cope.

That's the message from Microsoft researcher Andrew Conrad, who is leading the company's "Project Barcelona" to create a metadata information server to help businesses "understand and facilitate management of data across the enterprise." The project will provide crawlers to extract metadata from Microsoft products and an index server with an API to allow querying.

HISTORY: 10 Microsoft research projects

Introducing Project Barcelona earlier this month, Conrad compares the vast web of enterprise data with the World Wide Web.

Business data is expanding so fast that it's becoming almost as complicated for enterprises to manage it as it is to index the Web.

"The modern Web is vast and decentralized topology of websites and services connected via an almost infinite amount of links," Conrad writes. "Fortunately as the Web has grown more complex, tools for understanding and leveraging the Web have kept pace."

Web crawlers index the Web, helping us discover sites and information through search engines "that we could not possibly find outside of random chance," he notes, adding that "by contrast, as the modern enterprise has trended towards becoming more Web-like, the tools for understanding and leveraging the enterprise data topology have been almost nonexistent."

Although relational databases have become the "corporate standard for storing data," Conrad says several trends have made the current model inefficient. These include the low cost of acquiring and storing data, the ease with which data can be moved and changed, proliferation of self-service technologies such as databases and Web portals, leading non-developers to build and maintain data-producing services, and "Excel hell," what Conrad calls "the great proliferation of Excel (and Access, SharePoint) as the enterprise data management tool."

These trends have led to big productivity gains but also "made even the simplest DBA and ETL developer tasks increasingly complex and error prone," he says. "On top of that, it is almost impossible for information workers to know anything about enterprise data outside of their specific data silos."

The Microsoft team working on a solution to these problems is jokingly calling it the "Marauder's Map," after the magical map in "Harry Potter" that shows the location of every person in Hogwarts.

Project Barcelona will provide multiple crawlers for Microsoft products, including SQL Server, Excel, SharePoint and others that will extract metadata and "enterprise dataflow information" for indexing in the Barcelona Index Server. Some sources can't be crawled, and in those cases Barcelona "will provide a declarative way of describing the metadata and dataflow information."

The Index Server will cache all the harvested data and "expose an API for querying, augmenting, and annotating the metadata and dataflow information." There will also be tools for administrators to manage the crawlers and Index Server, and database administration tools to handle advanced tasks.

Instead of a centrally controlled metadata repository, the Project Barcelona "overall design embraces the decentralized and web-like nature of the modern enterprise," Conrad writes.

Conrad declined an interview request, saying, "We anticipate being able to do those once we firm up release plans," which should be in late summer. Conrad also said his team will answer technical questions on the Project Barcelona blog and Twitter feed.

The project team will also seek community input through a series of technology previews.

"Although we are designing the first iteration of the product to be a DBA/ETL developer solution, we believe that the long term value will grow significantly beyond this," Conrad writes. "Hence, from the start, the base platform for the product will be completely open. For example, developers can plug in their own crawlers or metadata providers. They can also access the harvested metadata and dataflow information via the query API. Finally, we will support metadata augmentation and have rich annotation support (both crawler support and via server API) which will allow producers and consumers of the system to leverage the crawlers and Index server in ways we haven't even thought about."

Microsoft is also tackling the big data problem with new data warehousing appliances using SQL Server.

Follow Jon Brodkin on Twitter: www.twitter.com/jbrodkin

Learn more about this topic

Microsoft, HP selling $2M data warehouse appliance

Microsoft's sleep proxy lowers PC energy use

10 Microsoft research projects

Editors' Picks
Join the discussion
Be the first to comment on this article. Our Commenting Policies