The network at CoxHealth, a healthcare organization in Springfield, Missouri, with five hospitals, 83 clinics and 10,000 employees, was running out of gas just as new demands were ratcheting up, so Senior Manager for IT Dan Brewer started to rethink everything. Here’s the story he shared with Network World Editor in Chief John Dix.
Dan Brewer, Senior Manager for IT, CoxHealth
Can you describe the environment and the problem you were having that encouraged you to drive some change?
I started here in 2006 and we had a single Cisco 6509 core and mostly Dell switches on the edge, but some Cisco pieces sprinkled in -- including about 600 autonomous Cisco wireless access points -- and everything was a flat Layer 2 network. With the different types of equipment, not everything worked well together and we had pretty pervasive broadcast storms.
By 2008 we were in huge growth mode and the network was running out of gas, but we were budget challenged so we were just growing organically. As something broke, we would buy a new piece. But in 2010-2011 we started getting budget to do more because we were implementing Electronic Health records (EHR) and Computer Physician Order Entry (CPOE) and wireless traffic was increasing.
Computer Physician Order Entry is about moving away from paper. Instead of a doctor writing down orders he enters the information directly into a computer and it goes into the patient’s electronic health record.
I presume they’re using tablets or mobile workstations?
Right. They were using a combination of workstations on wheels for the nursing staff and the physicians were using laptops. One of the criteria the physicians gave us was they wanted to be able to move around in the hospital, even between floors, without getting kicked off the network and having to log back into Cerner, which is the provider of our Electronic Health Record (EHR) system. So full mobility was the criteria for our wireless network. We actually lined our elevator shafts with access points so doctors could be able to roam at will.
So you had new mobile physician-led initiatives and were still having trouble with the flat Layer 2 network, so everything was coming to a head. What did you do?
We moved to Brocade in all the network closets and started pulling Layer 2 out and pushing Layer 3 to the edge. We’re trying to get to a Layer 3 routed network to make it more carrier-grade. There was a time before EHR when you could take the core down to do something because the nursing staff would go back to paper charting. Now everyone uses electronic health records, and if the network is not there, there is no EHR to use. So the tolerance for downtime is just nonexistent. With OSPF in Layer 3, we were able to add redundant cores and redundant network components so we don’t have planned downtimes.
The medical devices we support created a couple of challenges when we moved to a Layer 3 network. One, we had a VLAN that spread all over the campus, and we had to split it up and renumber or re-IP a lot of medical devices, and many medical folks do not like to change anything on those devices. And two, a lot of those devices are FDA approved and still using older technology, so it’s a bit of a challenge to support that older technology.
You had a single core in the Cisco environment, and moved to dual core with the shift to Brocade?
Yes. Now we have dual MLX cores that feed redundant MLX aggregate routers in each building, and those aggregate routers feed about 450 closet access switches. We’ve got about 21,000 access ports across all the hospitals and the clinics. So now if I need to upgrade code I can take one component down without taking the whole place down.
Your closet switches also support your Cisco wireless access points?
Yes. We’re kind of unusual in that I have a complete Cisco wireless network with a Brocade wired network.
You initially described the APs as autonomous. Is that still the case?
When we started in 2006 the Cisco APs were autonomous, meaning each was independent of the other so if you had to add a VLAN or add an SSID you logged into every single AP and made that configuration change. In 2008 we moved to a controller-based Cisco model where the APs look back to a controller for their config.
And marrying Brocade to Cisco wireless has worked out?
Yeah, it’s worked out very well, actually. Brocade plays well with everybody, and Cisco is a solid wireless platform.
How many APs do you have at this point?
About 2,400, and that’s growing by the day. In terms of the wireless devices, I couldn’t give you an exact device count, but at any given time of day a snapshot shows we’re supporting about 4,000-6,000 clients.
Of the 450 access switches, are those all Brocade now or do you still have a mix?
I wish. What we’ve done is upgrade location by location, so of our five hospitals I’ve got one left to upgrade. On our south campus we have ten buildings. Those buildings average about 1,800-2,200 access ports per building, so we redo one building at a time and I will be done with all the buildings by September 2016. Right now I’m about 40% complete on my clinics and about 36% complete on my campus buildings.
I presume before you went with Brocade you looked at other options?
We looked at several vendors. It came down to Brocade and Cisco. Brocade won because of three things. One was the value. Hospitals, contrary to popular belief, are not making tons of money, so I only get so much and I have to be able to stretch those dollars without shortcutting performance. Brocade brought me a lot more value on that side.
Second was the physical aspect of it. If you compare some of the boxes, the Brocade equipment is typically a lot smaller, especially the larger chassis. And they use less power so they were a little easier to get into the closets.
Third, Brocade really tries to work with other vendor’s gear. That’s been one of the challenges, because we’ve really been migrating since 2011. I’m always in this half-way state of new technology and old, and that’s tough. Brocade was a lot more willing to try to help me support the legacy gear.
How are all your facilities linked? Do you have a private fiber ring?
I’m actually doing 10GB MPLS on lambdas provided by a metro Ethernet supplier here called SpringNet. I get lambdas between my facilities, and their equipment is passive so I don’t have any circuit electronics; to my equipment it looks like dark fiber.
Are you seeing some nice performance gains?
Getting rid of the VLAN means we don’t have extensive outages from broadcast storms. And we were feeding the buildings with 1GB and now we’re feeding them with 10GB, so we’ve had a lot of people come up and say, “What have you guys done, because this is completely different.”
At peak times our radiologists were having trouble loading CT images, with it consistently taking about 40 seconds to load an image, and some times up to four minutes. The uplinks off of the server was getting congested and the normal traffic was around 276MB on a 1GB link, which wasn’t bad, except we were seeing spikes where it was hitting the ceiling of that 1GB link. So we went to a Brocade 7750 which supports our 10GB backbone, and we were able to pull those load times down to an average of four seconds and generally they’re less than two seconds to load.
How many data centers do you have?
Our primary data center is at Bluebird Network, which is a colocation facility 85 feet underground in an old limestone mine, and then we have a secondary data center at the south hospital, and a third data center about 30 miles away in another colo facility.
I have four 10GB links out of the Bluebird data center supporting my own MPLS network (not carrier MPLS), with two of the links going to the south facility, which is the biggest location, and the other two going to two other facilities. Those facilities also have 10GB links to the south campus and other facilities as well, so it’s more of a mesh than a ring. Between my primary data center and my primary hospital, I have four connections and those are load-shared and I’m running traffic that is active-active.
How do you quantify how much compute horsepower you have?
We’ve got about 1,200 servers, and about 65%-70% of those are virtual. We’ve got some Dell blades, some HP blades, and not too long ago we rolled in some Cisco UCS blades. Our storage is EMC.
Are you considering Software Defined Networking?
We’re trying to get there, and we see the industry moving to a New IP era of networking. We’re in the process of doing our primary data center with the Brocade VDXs for Ethernet fabric. Those are already 1.3 OpenFlow ready, so we’re looking at SDN and the software-defined data center. And obviously the UCS platform, VMware, everything is headed that way, but some of the Dell equipment I’m switching out is 12 years old.
But I always try to reach as far out ahead as possible. We’ve got apps people trying to anticipate what end users will want, and I have to be able to anticipate what that means for the rest of the infrastructure so when somebody wants something it’s a simple upgrade and not a forklift upgrade because I just don’t have the budget to do that.
So you’re buying OpenFlow 1.3-ready gear but not doing SDN yet.
We are not. But in June we start to pilot Cerner’s Ambulatory PowerChart touch application, and while we use Cerner in the hospitals, until now we haven’t used it out in the clinics. But just like doctors wanted to be able to move around more in the hospitals without losing connectivity, now they want to be able to go anywhere, including into the clinics and even home, and they want a seamless experience everywhere they go. That’s a big challenge, but I think SDN will give us the flexibility to accommodate that.
When do you anticipate the need to start turning on SDN?
I would like to finish my 2016 upgrade. You’re never finished, but my big upgrade would be done by the end of September of 2016, then we will start looking at firming that up. It may be sooner, but right now that’s my personal hope.
What will that involve? The addition of some controllers, obviously, but is it your impression that it’s a pretty easy upgrade?
It really does seem like it’s going to be a fairly simple upgrade once I get on a uniform platform. The addition of a couple of centralized controllers. Then, any centralized changes we make will get pushed out to the switches automatically. Which is similar to what we gained when we switched from autonomous wireless access points to the controller-based system. It makes management a lot easier.
One buzzword that’s getting talked about a lot about these days is the Internet of Things. It seems to me that hospitals are already living IoT given all the equipment that is network- attached.
Everything has a stinking IP address.
So is it just business as usual?
One thing new in the last year is bedside equipment -- infusion pumps, ventilators, patient vitals. It used to be nurses monitored that, wrote it all down, and typed it into the record. Now we’re integrating all of those devices via the network into the EHR record, which is neat. It frees the nurse up to take care of the patient. But it’s a challenge to keep up with everything they roll in that needs a network address.
You’re responsible for voice too? Are you a big VoIP shop?
All of our new switches and all of our closets are PoE so we can do VoIP. We have about 20,000 phone handsets and only about 4,500 are VoIP, and part of the reason is I’ve got to get the network ready before I can handle the VoIP traffic. We will never go 100% VoIP, but we will probably get to around 75%.
How about on the cell side? Any call to install a distributed antenna system or micro cell?
Now we’re going down a rabbit hole. Yes, we are building a $185 million, 10-story addition onto our hospital at the south campus right now. Everyone is using low-e glass now, which does not allow cell signals through. It’s almost like a Faraday cage. So yes, we put in a DAS in the new tower to be able to distribute cell signals, because cells are so pervasive. All the patients use them, everyone has them. That’s been a challenge in itself. It is not live yet. It has to be live by June 18.
That tower is also interesting because we took a different approach on the network architecture. We split the network in two, a blue network and a black network. One is cabled with blue cables and blue jacks and one is black cables, black jacks. If a nurse’s station has ten network connections, five are black, five are blue. The blue and the black cables go to different rows, different stacks of access switches, and those switches have independent connections to aggregates and then back to the cores. So even if we lose an access switch, we would only lose half of the devices on the floor or half of them in the nurse’s station. That allows the nurses to continue to work while we work to bring things back up. It’s redundancy to the room level. And
We’re doing the same with the access points. We do yellow and orange networks and zigzag them so if you lost the orange network the yellow will automatically turn up the power and fill in the holes until we can get that network back up and running.
I guess in healthcare you’ve got to think of every situation.
You really do. With everything flowing into the electronic patient record, there is just no tolerance for any down time. We have to be a utility service.