- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
Network World - How did European researchers working on the Higgs boson recently make one of the most revolutionary physics discoveries in recent decades? From an IT perspective, they relied on a good old-fashioned grid computing infrastructure, though a new cloud-based one may be in the offing.
The European Nuclear Energy Association's (CERN) decade-old grid computing infrastructure has been used extensively during the past few years for research that culminated with discovery of the Higgs boson, or so-called "God Particle."
WHAT IS THE HIGGS BOSON? Quick look: The Higgs boson phenomenon
Unlike a public cloud, where data and compute resources are typically housed in one or more centrally managed data centers with users connecting to those resources, CERN interconnected grid network relies on more than 150 computing sites across the world sharing information with one another.
For the first couple of years after the grid computing infrastructure was created, it handled 15 petabytes to 20 petabytes of data annually. This year, CERN is on track to produce up to 30 PB of data. "There was no way CERN could provide all that on our own," says Ian Bird, CERN's computing grid project leader. Grid computing was once a buzz phrase similar to that of what cloud computing is now. "In a certain sense, we've been here already," he says.
CERN, where the Large Hadron Collider that is the focal point of the Higgs boson research lives, is considered Tier 0 within the grid. That's where scientific data is produced by smashing particles together in the 17-mile LHC tunnel. Data from those experiments is then sent out through the grid to 11 Tier 1 sites, which are major laboratories with large-scale data centers that process much of the scientific data. Those sites then produce datasets that are distributed to more than 120 academic institutions around the world, where further testing and research is conducted.
The entire grid has a capacity of 200 PB of disk and 300,000 cores, with most of the 150 computing centers connected via 10Gbps links. "The grid is a way of tying it all together to make it look like a single system." Each site is mostly standardized on Red Hat Linux distributions, as well as a custom-built storage and compute interfaces, which also provide information services describing what data is at each site.
Research that contributes to a ground-breaking discovery like the Higgs announcement, though, is not always centrally organized. Bird says in fact it's quite a chaotic process and one that makes it difficult to plan for the correct amount of compute resources that will be needed for testing at the various sites. For example, when there is a collision in the LHC, impacted particles leave traces throughout the detector. A first level of analysis is to reconstruct the collision and track the paths of the various particles, which is mostly done at the Tier 0 (CERN) and Tier 1 sites. Other levels of analysis are broken into smaller datasets and distributed to the partnering academic institutions for analysis. From there, a variety of statistical analysis, histograms and data mining is conducted. If a certain discovery is made, an analysis might be refined and another test may be run. "You really can't predict the workflows," he says.