Skip Links

MySpace speeds problem resolution with Splunk

Splunk search technology helps multiple IT teams at MySpace solve performance and other problems on the social networking site.

Network/Systems Management Alert By Denise Dubie, Network World
July 28, 2009 09:56 PM ET
Denise Dubie
Sign up for this newsletter now!

Industry analysis by Beth Schultz, plus the latest news headlines.

  • Print

One wouldn’t think that 4,000 errors would be hard to find, but when searching system log data across 10,000 hosts, IT staff at MySpace needed help narrowing down where exactly those problems happened.

“Each server has an individual log and if we wanted to find information on the logs from a specific time, we had to quickly get to them or the logs would roll over,” explains Jeremy Custenborder, senior performance architect at MySpace in Santa Monica, Calif. “When we find an error on the site, we have to search the logs before and after the time we thought it happened.”

That’s why Custenborder uses Splunk technology to expedite the process of locating Web site errors. The company has been using Splunk for about a year and now has the latest version, Splunk 4, in production across multiple IT groups at MySpace. Splunk’s IT management software searches for management data across logs, message queues, configuration files, SNMP traps and database transactions to more quickly correlate events that could be related to a failure -- and that network managers would typically have to search manually.

“I needed to track down a problem that only happened 4,000 times, which may not seem like a small number, but it wasn’t a problem in which the end user couldn’t use the site, it was more that a page was presenting the option wrong,” Custenborder explains. “When searching across 10,000 hosts that have million of requests per day, finding a couple thousand requests is very difficult.”

Using Splunk software, Custenborder is able to more quickly isolate problems and forward along is findings to other IT staffers for the final fix. He says the software allows him to send out code signatures to others on staff and delegate the fix after troubleshooting the problem. Other groups, such as the network operations and security teams, use Splunk for their own purposes, Custenborder says, and he says the software is flexible enough to gain value for several IT functions.

“There are many options for getting data into Splunk, such as agents and network sockets, it is incredibly configurable,” he says. “It is truly a system that you put your data in and figure out what you want to do with it. In an environment as large as ours sometimes, it is difficult to know you have an error in production, so this technology really helps us find and triage errors.”

July 31, 2009 will mark the 10th annual System Administrator Appreciation Day. I’d like to know from IT pros what their perfect SysAdmin Day would entail from start to finish. How can companies show you their appreciation? How do you want to spend this year’s SysAdmin Day? Let me know at ddubie@nww.com.

Do you Tweet? Follow Denise Dubie on Twitter here

Read more about infrastructure management in Network World's Infrastructure Management section.

Schultz is a longtime IT journalist. You can email her or find her here.

  • Print

Videos

rssRss Feed