The University of Florida is just putting the wraps on a remarkable year of IT upgrades. The school, which has a 2,000-acre campus and more than 900 buildings, installed a new supercomputer in a new data center, installed a 100Gbps link to Inernet2, and upgraded its Campus Research Network from 20G to 200Gbps while adding support for Software Defined Networking (SDN). Network World Editor in Chief John Dix got the lowdown on all of the developments from Erik Deumens, director of research computing.
You folks have accomplished an awful lot in one year. What got the ball rolling?
A few years ago the University of Florida hired a new CIO, Elias Eldayrie, and one of his primary goals was to improve the research computing infrastructure at the University of Florida. And when the Internet2 Innovation Platform movement got going he said we should be part of that. He talked to the president and provost and the VP for research and other administrators and got an agreement in-principle that that would be a good thing to do.
We wrote a proposal to NSF for a CC-NIE award, which is for campus cyber infrastructure, and got funding for switch equipment to connect to the nearest Internet2 point that had been upgraded to 100 Gig, which is in Jacksonville. And we were lucky because we had another proposal in with the NSF MRI (Major Research Instrumentation) program that was funded to upgrade the internal campus research network from 20Gbps to 200Gbps.
With the awards in place the university agreed to provide some extra funding to pay for the missing things, because there are always components that cost more. And so on the 1st of February of 2013 we deployed the connection from the University of Florida campus network to Internet2 as an innovation platform. And then the month after that we upgraded the core of the campus research network. The full campus research network upgrade has been in place since September.
Was your network tapped out or does the higher capacity just open new doors?
It’s a little bit of both. We were not fully tapped out on the 20 Gig Campus Research Network, but the outgoing link was only 10 Gig, and that reached maximum capacity of 9.6Gbps several times a week. So we really were close to needing to do something, and we decided that going to 100 Gig was the best way to do it.
One of the reasons we needed the extra capacity is to support the Compact Muon Solenoid (CMS) experiment at CERN’s Hadron collider. This is one of the experiments that contributed to the discovery of the Higgs Particle, which was awarded the Nobel Prize this year. We have a large research group at the University of Florida that manages what is called a Tier 2 distribution center for Hadron data. CERN takes the collider data and distributes it to about 10 labs across the world, with Fermi Lab here in the US being one of them. And within the US there are another 10 Tier 2 sites that the data gets replicated to, and the University of Florida is one of those.
So we get a lot of traffic from local researchers, but also we are serving up data to the nation. Any high-energy physics researcher who wants to analyze some of that data will request data from our site. So that’s why this network connection is very important to us and why it’s so heavily used.
As I understand it, the networks also support your new supercomputer?
That’s correct. The HiPerGator. We completed a new data center on Jan. 30. It’s a 25,000-square-foot building with 10,000 square feet of machine room, 5,000 dedicated to research and 5,000 to enterprise computing. It’s a Tier 3 data center and the new HiPerGator supercomputer is in the research section. That new building is connected at 200Gbps via our upgraded Campus Research Network to the point of presence where the University of Florida connects to the Florida Lambda Rail regional network and to the Internet2 access point, and also to other machine rooms on campus.
How did the Internet2 part of the project go? Any challenges?
We actually didn’t encounter any challenges. Basically we carefully planned it and we got the funding and everybody was in agreement, even at the highest levels of the administration. The cost was about a million dollars. Some $380,000 of that was for a 100-Gig Brocade switch with extra 100-Gigabit ports for connection to the Campus Research Network and the rest was for the Florida Regional Lambda Rail connection and the Internet2 fees to connect in Jacksonville.
Did you stick with Brocade for the new campus network?
Yes. With a $1.5 million NSF MRI grant we got at the same time we installed several other Brocade switches to upgrade the Campus Research Network.
Is the research network separate from the campus data network?
Yes. Actually the University of Florida was a bit of a pioneer in that regard, because in 2004 when we were seeing data contention and governance conflicts between how to manage the data for research and data for enterprise security and stability, we created a 20Gbps network with separate fiber links between machine rooms that had large data processing equipment. That’s the network we upgraded with this grant.
Is the whole network 200-Gig now?
The core between the most important data centers are 200-Gig, and then there are a few outliers at 40-Gig and a few more at 10-Gig. But all of these are separate fibers, and they’re completely separate from the standard campus network. They also have their own governance structure, because that’s important in terms of keeping the security rules simple so we can have faster turnaround. On the research network, for example, we don’t have a firewall. We just have ACLs, which are higher performance.
When you say 200-Gig, I presume you’re talking about multiple 100-Gig interfaces, right?
Yes. And they are both active but we use different paths so if a backhoe hits one we still have the other path.
The Campus Research Network connects what?
It links data-center-type rooms, machine rooms and special equipment rooms in about 10 buildings. It doesn’t go to every building. So you have the campus network that goes to the Genetics Institute, but then there is one room inside the Genetics Institute where all the gene sequencers feed their data into a machine that is connected to the Campus Research Network so that data can be easily transferred to High Performance Computing (HPC) resources in another data center.
The other data center rooms on the research network have smaller clusters that are usually associated with certain advanced engineering labs. For instance, there is one lab that is called the Center for Autonomic Computing. It’s an NSF-funded center with several grants to do advanced research on virtual machines, and they also provide Web services to a community of researchers. They’re part of FutureGrid, which is another NSF-funded grant with Purdue in Indiana, which allows them to connect in a more flexible way to reach collaborators across the nation.
Then there’s another machine room that has a small cluster for ocean simulation. These clusters are separately managed by research groups, so they’re usually smaller resources, whereas the HiPerGator is managed by my division, which is a department under Information Technology (UFIT) reporting to the CIO, and we provide services to everybody on campus.
What was powering the existing 20-Gigabit network before the upgrade?
Switches from Force10, a company that was later acquired by Dell. With this new network we needed high-speed capability, but also wanted a robust implementation and a good roadmap for future support of SDN and OpenFlow, because that is one of the thrusts we want to explore. We not only wanted to put in higher bandwidth, we also wanted to enable our computer science students and professors to do research on OpenFlow and software-defined networks. So that was a critical component in our selection for the upgrade, and we chose Brocade because they met the requirements very well.
So you spelled out the need for OpenFlow and SDN capabilities in your request for proposals?
We made that a clear case in both proposals. And that is actually a requirement of the Internet2 organization. When they upgraded their backbone with the big award from the federal government, they basically said, “We’re creating a new class of member called an Internet2 Innovation Platform member, and there are three conditions: One, you need to connect to the backbone at 100Gbps; two, you have to have an assigned Science DMZ (demilitarized zone) so you can do research without having to go through the firewall of a production enterprise network; and third, you have to have active research in software-defined networks.
And it turned out that, when we went to the April annual meeting of Internet2, the University of Florida was the first university to meet all three conditions. We had our 100-Gig connection, we had the assigned DMZ, and we had several researchers on campus doing active research with NSF GENI (Global Environment for Network Innovations) and Future Grid projects that involved SDN. So we’re pretty proud of that.
What did you specify in terms of SDN support from Brocade? Did the equipment have to support OpenFlow out of the box?
Out of the box, yes. And we wanted to know their roadmap, to see they were going to stay on top of it in terms of development, because OpenFlow is evolving very rapidly. So if new features were added, we wanted to know they would commit to implement them and make them available quickly. Brocade has official statements about that, which was important. Some of the other vendors we considered made more wishy-washy statements so that’s why they were ruled out.
+ ALSO ON NETWORK WORLD Planning for SDN +
Are you using the Brocade SDN capabilities yet?
We actually did something that I think is a bit innovative. While we required the Brocade switches to be OpenFlow and SDN-enabled, currently we don’t actually run them in that mode. What we did is bought a bunch of small Pronto 3920 OF and 3290 OF switches from Pronto Systems (now merged with Pica8) to put behind each of the Brocade switches. So when our computer scientists are doing early stage experiments with OpenFlow, when the research is still a bit unstable, we can use these Pronto switches to support software-defined traffic, and they can break whatever they want without impeding any of the production traffic. And then once we validate that the work is stable using that second layer of Pronto switches, we can move it to the bigger switches.
So that is our long-term strategy to mitigate the risk and have a research network that at the same time can provide reliable production traffic.
I’m not familiar with Pronto Systems.
They’re some of the cheapest SDN-capable, OpenFlow-capable switches you can get. They provide basic functionality, and if anything goes wrong with them we just buy another one. They’re very affordable.
And you dedicate capacity between the Pronto switches?
Yes. These Pronto switches have just been deployed, and we’re working with the researchers on getting some traffic going. But basically we have them on a separate VLAN, and for now they’re not restricting any capacity because there isn’t any. But we’re observing the traffic and, if at some point it causes a problem, we will take appropriate action. Because the Campus Research Network is designed to enable innovative research, we don’t want to put in something with strict policies that can cause problems.
Brocade doesn’t offer its own SDN controller. Did that matter to you?
No. Because controllers are general and generic. There’s a basic open-source one that you can deploy on your own on a Linux box. And until people have more experience, I think it will be a while before people buy these as appliances from vendors. We have several controllers already and that’s OK.
How about the importance of OpenFlow? It once seemed like it was going to be the one thing that united the SDN community, but now many vendors are shunning it.
We think OpenFlow is important, and other vendors, even if they support something else today, will eventually join in some version of OpenFlow support. There are some use cases for which you could quickly deploy something special instead of waiting for the standard to agree. And that may be what some vendors are seeing. And with SDN a lot of the complicated logic is done by the controller, and you can have a software implementation that really performs very well. You don’t necessarily have to do this in an ASIC. So I think that may be part of why there is diversity.
So SDN for you folks represents some research opportunities, but do you have any idea yet how it might benefit the running of your own network?
No. That’s still quite open. We can see several use cases that we’re exploring with both the scientists who need to move data and the computer scientists who have the graduate students and the knowledge to try and implement it in an OpenFlow program. Because most of those SDN applications are not going to be developed by the end users who need to move the data, we as research computing IT providers are the go-between.