When the Large Hadron Collider (LHC) starts back up in June, the data collected and distributed worldwide for research will surpass the 200 petabytes exchanged among LHC sites the last time the collider was operational. Network challenges at this scale are different from what enterprises typically confront, but Harvey Newman, Professor of Physics at Caltech, who has been a leader in global scale networking and computing for the high energy physics community for the last 30 years, and Julian Bunn, Principal Computational Scientist at Caltech, hope to introduce a technology to this rarified environment that enterprises are also now contemplating: Software Defined Networking (SDN). Network World Editor in Chief John Dix recently sat down with Newman and Bunn to get a glimpse inside the demanding world of research networks and the promise of SDN.
Can we start with an explanation of the different players in your world?
NEWMAN: My group is a high energy physics group with a focus on the Large Hadron Collider (LHC) program that is about to start data taking at a higher energy than ever before, but over the years we’ve also had responsibility for the development of international networking for our field. So we deal with many teams of users located at sites throughout the world, as well as individuals and groups that are managing data operations, and network organizations like the Energy Sciences Network, Internet2, and GEANT (in addition to the national networks in Europe and the regional networks of the United States and Brazil).
For the last 12 years or so we’ve developed the concept of including networks along with computing and storage as an active part of global grid systems, and had a number of projects along these lines. Working together with some of the networks, like Internet2 and ESnet, we were able to use dynamic circuits to support a set of flows or dataset transfers and give them priority, with some level of guaranteed bandwidth.
So that was a useful thing to do, and it’s been used to some extent. But that approach is not generalizable in that not everybody can slice up the network into pieces and assign themselves guaranteed bandwidth. And then you face the issue of how well they’re using the bandwidth they reserved, and whether we would be better off assigning large data flows to slices or just do things in a more traditional way using a shared general purpose network.
So I presume that’s where the interest in SDN came in?
What’s happening with SDN is a couple of things. We saw the possibility to intercept selected sets of packets on the network side, and assign flow rules to them so we don’t really have to interact much with the application. There is some interaction but it’s not very extensive, and that allows us to identify certain flows without requiring that a lot of the applications to be changed in any pervasive way. The other thing is, beyond circuits you have different classes of flows to load-balance, and you want to prevent saturating any sector of the infrastructure. In other words, we want mechanisms so our community can handle these large flows without impeding what other people are doing. And in the end, once the basic mechanisms are in place we want to apply machine learning methods to optimize the LHC experiments’ distributed data analysis operations.
Most advanced research and education networks in the last year or two have made the transition from 10 Gb/sec (Gbps) backbones to 100 Gbps; so people tend to say, “Wow, now you have lots of bandwidth.” But the laboratory and university groups in our field people have deployed facilities where many petabytes of data are stored, along with very large numbers of servers which are moving from 1 Gbps to 10 Gbps and in some cases 40 Gbps network interfaces. And 100 Gbps interfaces are expected within the next few months, so as fast as the core networks progress, the capabilities at the edges are progressing even faster, so this is a real issue.
Are you looking at using SDN on specific research networks or trying to implement the capabilities across a range of them?
NEWMAN: There is one project called the LHC Open Network Environment (LHCONE) that was originally conceived to help with operations that involved multiple centers. To understand this, though, I have to explain the structure of the data and computing facilities.
The LHC Computing Model was originally a hierarchical picture that included a set of “tiered” facilities. We called CERN the “Tier 0” where the data taken at the LHC are first analyzed. There are now 13 Tier 1 centers, which are major national computing centers including centers at the Fermi National Accelerator Laboratory (Fermilab) and the Brookhaven National Lab (BNL) in the US.
There are also more than 160 so-called Tier 2 centers at universities and other labs throughout the world, each of which serves a region of a large country like the United States, or in some cases they serve an entire country. Then every physics group has a so-called Tier3 cluster, and there are about 300 of those. All of these facilities are interconnected by the research and education networks mentioned.
The US is involved mainly in the two biggest experiments at the LHC. The one I work on is called CMS, short for the Compact Muon Solenoid (CMS) which is served by FermiLab, and our competing experiment is called ATLAS, which is served by BNL.
CMS and ATLAS are multipurpose particle physics experiments exploring the most fundamental constituents of matter and forces of nature. In 2012 they both discovered the Higgs boson, thought to be responsible for mass in the universe. And with the restart of the LHC higher energy and luminosity (intensity) we anticipate even greater discoveries of physics beyond the Standard Model of particle physics that embodies our current knowledge.
University connections to the Tier 1 centers are mostly through Internet2 and regional networks. So, for example, I am at Caltech so we work with Internet2 and with CENIC which is the California region network.
So the experiments at the Large Hadron Collider are the data sources and the networks are used to distribute this data to users for analysis?
Right. Data is taken by the experiments, processed for the first time at CERN and then distributed to the Tier 1 centers via a sort of star network with some dedicated crosslinks for further processing and analysis by hundreds of physics teams. Once the data is at the Tier 1s it can be further distributed to the Tier 2s and Tier 3s, and once there, any site can act as the source of data to be accessed, or transferred to another site for further analysis. It is important to realize that the software base of the experiments, each consisting of several millions of lines of code, is under continual development as the physics groups improve their algorithms and their understanding and calibration of the particle detector systems used to take the data, with the goal of optimally separating out the new physics “signals” from the “backgrounds” that result from physics processes we already understand.
The data distribution from CERN to the Tier 1s is relatively straightforward, but data distribution to and among the Tier 2s and Tier 3s at sites throughout the world is complex. That is why we invented the LHCONE concept in 2010, together with CERN: to improve operations involving the Tier 2 and Tier 3 sites, and allow them to make better use of their computing and storage resources in order to accelerate the progress of the LHC program.
To understand the scale, and the level of challenge, you have to realize that more than 200 petabytes of data were exchanged among the LHC sites during the past year, and the prospect is for even greater data transfer volumes once the next round of data taking starts at the LHC this June.
The first thing that was done in LHCONE was to create a virtual routing and forwarding fabric (VRF). This was something proposed and implemented by all the research and education networks, including Internet2, ESnet, GEANT, and some of the leading national networks in Europe and Asia, soon to be joined by Latin America.
That has really improved access. We can see dataflow has improved. It was a very complex undertaking and very hard to scale because we have all of these special routing tables. But now the next part of LHCONE, and the original idea, is a set of point-to-point circuits.
You remember I talked about dynamic circuits and assigning flows to circuits with bandwidth guarantees. A member of our team at Caltech together with a colleague at Princeton has developed an application that sets up a circuit across LHCONE, and then assigns a dataset transfer (consisting of many files, totaling from one to 100 terabytes, typically) to the circuit.
ESnet’s dynamic circuits, which have been in service for quite a while, are called OSCARS. There is an emerging standard which is promoted by the Open Grid Forum called NSI, and we’ll integrate NSI with the application, so that’s one upcoming milestone for the LAT ONE part of this picture.
One might ask, “What sets the feasible scale of worldwide LHC data operations?” Two aspects are the data stored, which is hundreds of petabytes and growing at a rate that will soon reach an Exabyte, and the second major factor is the ability to send the data across networks, over continental and transoceanic distances.
One venue to address the second factor and show the year-to-year progress in the ability to transfer many petabytes of data at moderate cost using multiple generations of technology is the Supercomputing Conference. This is a natural place to bring our efforts on network transfer applications, state of the art network switching and server systems and software defined network architectures together, in one intensive exercise spanning a week from setup to teardown.
Caltech and its partners, notably Michigan, Vanderbilt, Victoria and the US HEP labs (Fermilab and BNL), FIU, CERN, and other university and lab partners, along with the network partners mentioned above, have defined the state of the art (nearly) every year since 2002 as our explorations of high speed data transfers climbed from 10 Gbps to 100 Gbps and, more recently, hundreds of Gbps.
The Supercomputing 2014 event hosted the largest and most diverse exercise yet, defining the state of the art in several areas. We set up a Terabit/sec ring with a total of 24 100 Gbps links among the Caltech, NITRD/iCAIR and Vanderbilt booths, using optical equipment from Padtec, a Brazilian company, and Layer 2 Openflow-capable switching equipment from Brocade (a fully populated MLXe16) and Extreme Networks.
We also connected to the Michigan booth over a 100 Gbps dedicated link and we had four 100 Gbps wide area network links connecting to remote sites in the US, Europe and Latin America over ESnet, Internet2, and the Brazilian national and regional networks RNP and ANSP.
In addition to the networks we constructed a compact data center capable of very high throughput using state of the art servers from Echostreams and Intel with many 40 Gbps interfaces and hundreds of SSDs from Seagate and Intel.
Then apart from the high throughput and dynamic changes at Layer 1, one of the main things was to be able to show software-defined networking control of large data flows. So that’s where our OpenFlow controller, which is written mainly by Julian, came in.
We demonstrated dynamic circuits across this complex network, intelligent network path selection using a variety of algorithms using Julian’s OpenDaylight controller, and the ability of the controller to react to changes in the underlying optical network topology, which were driven by an SDN Layer 1 controller written by a Brazilian team from the university in Campinas.
Once set up, we quickly achieved more than 1 Tbps on the conference floor and about 400 Gbps over the wide area networks. The whole facility was set up, operated with all the SDN related aspects mentioned above, and torn down and packed for shipment in just over one intense week.
The exercise was a great success, and one that we hope will show the way towards next generation extreme-scale global systems that are intelligently managed and efficiently configured on the fly. We’re progressing well, and expect to go from testing to preproduction and we hope into production in the next year or so.
After Supercomputing in 2014, we set up a test bed and Julian has started to work with a number of different SDN-capable switches—including Brocade’s SDN-enabled MLXe router at Caltech and switches at other places. So we’re progressing and we expect to go from testing to preproduction and we hope into production in the next year or so.
This is just one cycle in an ongoing development effort, keeping pace with the expanding needs and working at the limits of the latest technologies and developing new concepts of how to deal with data on a massive scale in each area. One target is the Large Hadron Collider, but other projects in astrophysics, climatology and genomics, among others, could no doubt benefit from our ongoing developments.
So the goal of all these efforts is to enable users to set up large flows using SDN?