Cyber data mining catching fire

The deep study and analysis of the vast amounts of online data continues to pick up steam.  This week four research agencies teamed to develop an international competition  they hope will heat up humanities and social science research using large-scale data analysis to develop international partnerships and explore vast digital resources, including electronic repositories of books, newspapers and photographs to identify opportunities for cyberscholarship.

New techniques of large-scale data analysis let researchers discover relationships, detect discrepancies, and perform computations on data sets that are so large that they can be processed only using computing resources and computational methods developed and made economically affordable within the past few years.  With books, newspapers, journals, films, artworks, and sound recordings being digitized on a massive scale, it is possible to apply data analysis techniques to large collections of diverse cultural heritage resources as well as scientific data, the group said.

The Digging into Data Challenge will be sponsored by: the Joint Information Systems Committee (JISC) from the United Kingdom, the National Endowment for the Humanities (NEH) and the National Science Foundation (NSF) from the United States, and the Social Sciences and Humanities Research Council (SSHRC) from Canada. 

According to the group the challenge will work like this: Applicants will form international teams from at least two of the participating countries. Winning teams will receive grants from two or more of the funding agencies and, one year later, will be invited to present their work at a special conference. These teams, which may be composed of researchers, scholars and scientists, will be asked to demonstrate how data mining and data analysis tools currently used in the sciences can improve humanities and social science scholarship. In order to apply, interested applicants must first submit a letter of intent by March 15, 2009. Final applications will be due July 15, 2009.  Further information about the competition and the application process can be found here.

The group is just one of the latest to explore cyber data mining. The NSF recently said it is looking for highly interpretive technology to help all manner of government and private researchers evaluate the massive amounts of data generated in health care, computational biology, security and other fields.

In a nutshell, the NSF said it is seeking mathematical and computational algorithms and techniques that will fundamentally improve law enforcement and the intelligence communities' ability to transform large, often streaming data sets, e-mails, images, numbers and sounds into a form that better supports visualization and analytic reasoning, NSF stated. To enable visual-based data exploration, it is necessary to discover new algorithms that will represent and transform all types of digital data into mathematical formulations and computational models that will subsequently enable efficient, effective visualization and analytic reasoning techniques, the NSF stated.

This latest round of research is part of a five-year, $3 million project known as the Foundation on Data Analysis and Visual Analytics (FODAVA) research initiative lead by the Georgia Institute of Technology. DHS and NSF anointed in August Georgia Tech-led to establish FODAVA as a distinct research field and build a community of top-quality researchers that will collaborate on research workshops and conferences, industry engagement and technology transfer. One example of the FODAVA programs is a Georgia Tech system known as Jigsaw helps analysts better assess, analyze and make sense of large document collections.

Meanwhile, interpreting data is at the root of recently announced artificial intelligence (AI) research. The Defense Advanced Research Projects Agency (DARPA) said it wants to develop software known as a Machine Reading Program (MRP) that can capture knowledge from naturally occurring text and transform it into the formal representations used by AI reasoning systems.

For example, all of the text in the World Wide Web will become available for automating the monitoring and analysis of technological and political activities of nations; plans, rhetoric, and activities of transnational organizations; and scientific discovery within various disciplines, DARPA stated.

Layer 8 in a box

Check out these other hot stories:

Complex IT challenges will hinder online healthcare move

Mars methane discovery means planet not dead as a doorknob

Feds to offer cash for your clunker

NASA puts wings on unmanned aircraft experiments

Air Force: More unmanned aircraft than manned in 2009

Apple's Steve Jobs takes leave of absence

Paybacks: Telescammers to fork over $50 million in restitution

Will bats inspire future micro unmanned aircraft?

Government spends over $30 million to sharpen cyber security saber

Watchdogs bite IRS for continued security lapses

FBI/DOJ warns of economic cybergeddon

Beam up my shape shifting robot Scotty: Layer 8's Best of 2008

Ducks, dorks and deviants: Wackiest stories of 2008

Copyright © 2009 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022