It started out as a simple call to the help desk from an engineer at one of our major development centers: Phone calls were being dropped. Soon, similar complaints were coming in from other engineers, as well as from sales associates, who said the inability to maintain phone calls was making it difficult to close deals.
At issue: Phone and Internet service is severely compromised at a development center.
Action plan: Restore service, then find out what went wrong. And once that's done, take steps to avoid a repeat.
Anything that affects revenue is sure to get someone's attention. The telecom team checked out the Cisco call manager and gateways; they were fine. It wasn't until the help desk received a new set of complaints about Internet connectivity being slow at that same development center that someone decided to get the security department involved.
The head of our network team, who is also responsible for firewall administration, sent me a message that was sure to get my attention: "You better come check this out." What he had to show me was that the logs from the firewall protecting the development center were filled with outbound connections over Port 445 to several locations on the Internet.
We had to contain that activity quickly to return Internet and phone service. Our attempt to block the outbound traffic at the firewall didn't succeed, since the logs had taken up so much of the firewall's resources that we couldn't do anything at all on the firewall. The network engineer placed an access control list on one of the routers, which eventually allowed him to modify the firewall rule to block the bad traffic. That got us back the Internet and phone service, so the immediate problem had been remediated. But what had caused it? I had the engineer back up the logs so we could analyze the data.
Our review showed that the IP addresses that were generating the traffic were assigned to a classroom. The instructor told me that the trainees had installed a virtual server image on the classroom desktops and, contrary to normal classroom protocol, connected the virtual machines to the corporate network. We found that those virtual machines were not running any antivirus software and hadn't been patched in more than two years, so we ran a virus scan of one of the virtual machines. Suddenly, everything became very clear.
The virtual machine was infected with a virus whose characteristics matched the activity that caused the denial of service to the office. In fact, all 30 desktops in the classroom were infected. But that's not the worst of it.
The installed images were derived from a base image maintained at a cloud provider. That base image contained the virus, which explains how 30 machines became infected.
I then moved on to the person who was responsible for provisioning virtual-machine images to find out why steps hadn't been taken to avoid an infection. He explained that a couple of years ago some patches had caused images to become unstable, so patching was stopped. As for antivirus software, he said he didn't have the budget to install it on more than 1,500 Microsoft Windows images. Perhaps that explanation was supposed to mollify me, but I could barely contain my dismay. Fifteen hundred VM images that had little or no protection from viral infection! And those images were regularly used by several departments on machines operating on our corporate network.
I immediately called a meeting with our CIO and the vice presidents for the divisions that deploy virtual machines. I called for an immediate mandate to scan all images, install our corporate antivirus software, update all patches and put a process in place to ensure that images comply with the company's patch management process.
All in a day's work, right?
This week's journal is written by a real security manager, "Mathias Thurman," whose name and employer have been disguised for obvious reasons. Contact him at email@example.com.
This story, "Security Manager's Journal: Virtual machines, real mess" was originally published by Computerworld .