President Obama targets $200 million for big data boost

U.S. government looks to bolster techniques to access, organize, and collect information from huge volumes of digital data.

The U.S. government is the poster child for big data and today President Obama is set to announce a $200 million research program to bolster the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.

"In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security," said Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy in a statement.

The White House Office of Science and Technology Policy and a number of key federal departments and agencies will be part of the Big Data Research and Development Initiative.

The agencies and their particular input in the program include:

• National Science Foundation and the National Institutes of Health: NSF is implementing a long-term strategy that includes new methods to derive knowledge from data; infrastructure to manage, curate, and serve data to communities; and new approaches to education and workforce development. Specifically, NSF is: Encouraging research universities to develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers; Funding a $10 million Expeditions in Computing project based at the University of California, Berkeley, that will integrate three powerful approaches for turning data into information - machine learning, cloud computing, and crowd sourcing; Providing the first round of grants to support "EarthCube" - a system that will let geoscientists access, analyze and share information about our planet; Issuing a $2 million award for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data. Providing $1.4 million in support for a focused research group of statisticians and biologists to determine protein structures and biological pathways.

• NIH: The health agency is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other data sets related to health and disease.

• Department of Defense: The DoD is "placing a big bet on big data" investing approximately $250 million annually (with $60 million available for new research projects) across the departments in a series of programs that will: Harness and utilize massive data in new ways and bring together sensing, perception and decision support to make truly autonomous systems that can maneuver and make decisions on their own. The department is seeking a 100-fold increase in the ability of analysts to extract information from texts in any language, and a similar increase in the number of objects, activities, and events that an analyst can observe. In addition, the DoD will announce a series of open prize competitions over the next several months.

• As part of the DoD, its Defense Advanced Research Projects Agency (DARPA) is beginning the XDATA program, which intends to invest approximately $25 million annually for four years to develop computational techniques and software tools for analyzing large volumes of data, both semi-structured and unstructured traffic. Central challenges to be addressed include: Developing scalable algorithms for processing imperfect data in distributed data stores; and creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions. The XDATA program will support open source software toolkits to enable flexible software development for users to process large volumes of data in timelines commensurate with mission workflows of targeted defense applications.

• National Institutes of Health - The National Institutes of Health is announcing that the world's largest set of data on human genetic variation - produced by the international 1000 Genomes Project - is now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes - the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs - the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is storing the 1000 Genomes Project as a publicly available data set for free and researchers only will pay for the computing services that they use.

• Department of Energy - Scientific Discovery Through Advanced Computing: The Department of Energy will provide $25 million in funding to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute. Led by the Energy Department's Lawrence Berkeley National Laboratory, the SDAV Institute will bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the department's supercomputers, which will further streamline the processes that lead to discoveries made by scientists using the department's research facilities. The need for these new tools has grown as the simulations running on the department's supercomputers have increased in size and complexity.

• U.S. Geological Survey - USGS is announcing the latest awardees for grants it issues through its John Wesley Powell Center for Analysis and Synthesis. The center catalyzes innovative thinking in Earth system science by providing scientists a place and time for in-depth analysis, state-of-the-art computing capabilities, and collaborative tools invaluable for making sense of huge data sets. These Big Data projects will improve understanding of issues such as species response to climate change, earthquake recurrence rates, and the next generation of ecological indicators.

