If you've read my bio, you will notice I wrote a book (actually I was the lead author) - MOM 2005 Unleashed. We're in the final stages of writing the follow-up to that, Systems Center Operations Manger 2007 Unleashed. I mention this because it probably would be worthwhile to post some articles related to operations management - what that means, what it's all about; and potentially to get specific about the Microsoft product (without stealing too much thunder from the book of course!).
I'd like to talk a bit here about why unscheduled downtime is a bad thing. Obviously its not a good thing if you can't get to your email or run your business applications when you were expecting to, but it is always interesting to try to quantify why it's a bad thing - e.g. put some hard numbers to it.
We can start with a simplified example of the impact of temporarily disrupting an e-commerce site normally available 7x24. The site generates an average of $4,000 per hour in revenue from customer orders for an annual value in sales revenue of $35,040,000 US. If the website were unavailable for six hours due to a security vulnerability, the directly attributable losses for the outage would be $24,000 US.
This number is only an average cost; most e-commerce sites generate revenue at a wide range of rates based on time of day, date of week, time of year, marketing campaigns, and so on. Typically the outage occurs during peak times when the system is already stressed, greatly increasing the cost of a 6-hour loss.
There are other costs incurred from an outage. Some customers may decide to find alternative vendors, resulting in a permanent loss of users and making the revenue loss even higher than the direct loss of sales. The company may decide to spend additional money on advertising to counter the ill will created when customers could not reach the site. The costs from our example 6-hour outage can thus be far higher than its simple hourly proportion of time applied to an average revenue stream.
Another case in point would be a large-sized credit-card processing card company that estimates it would stand to lose nearly $400,000 in direct revenue if they experienced a one-hour operational outage affecting their ability to process credit-card transactions. This number assumes an estimated cost of just over $1.00 per missed transaction, and does not include the inevitable decline in revenues due to a loss of confidence from clients were such an outage to happen.
Does this actually happen? Let's look at some real cases. We can look at what happened on Black Friday in 2006, which refers to the day after Thanksgiving in the United States and is the busiest day of the year in the retail sector. On that particular Black Friday, the websites for two very large U.S. retailers (Wal-Mart and Macy's) were unavailable starting around 4:00 a.m. for approximately 10 hours, presumably from overload. While it is possible that the potential customers tried the sites at a later time, it is also possible that they took their business to competitors.
There are two types of downtime of course - scheduled (which usually is a very small window in the wee hours of the morning on a weekend), and unscheduled - the type you can't plan around and what happened on Black Friday last year. Managing IT Operations means we want to take actions to mitigate the possiblity of the unscheduled variety.
You may have heard of something called "5 9's of availability". This means scheduled uptime is 99.999%, which works out to about 5 minutes of unscheduled downtime in a year. That would be high availability - sounds like nirvana! Getting 5 9's takes work to attain and maintain. Most companies are happy to get 99.9% uptime. This doesn't mean you don't take systems down for maintenance - but that you manage your systems ro reduce the unplanned outages.
And in a nutshell, that's what operations management is all about.
Kerrie Meyler, MVP, MCSE, MCTS, MCT, is an independent consultant and trainer with over fifteen years of experience in IT. While at Microsoft in Field Technical Sales for four years she focused on infrastructure and mangement, presenting at numerous product launches. Kerrie has presented Operations Manager 2007 at TechEd 2007, MMS 2009, MMS 2011, and internal Microsoft conferences, receiving company recognition and awards including a SPAR MGS award. Kerrie worked with Microsoft Learning to develop functional specifications for the original Operations Manager Microsoft courseware, 2550: Implementing Microsoft Operations Manager 2000 and did the beta teach for that course.She also participated in development for several System Center certification exams.
Kerrie is the lead author of Microsoft Operations Manager 2005 Unleashed, System Center Operations Manager 2007 Unleashed, System Center Configuration Manager (SCCM) 2007 Unleashed, System Center Operations Manager 2007 R2 Unleashed, System Center Opalis Integration Server 6.3 Unleashed and System Center Service Manager 2010 Unleashed.
Check out an excerpt from System Center Operations Manager 2007 Unleashed, Chapter 3: Looking Inside OpsMgr.
You can also check out an excerpt from System Center Configuration (SCCM) Manager 2007 Unleashed, Chapter 3: Looking Inside ConfigMgr.
Read a sample chapter of System Center Operations Manager 2007 R2 Unleashed at Chapter 1: Introduction and What's New.
You can also read a sample chapter of System Center Opalis Integration Server 6.3 Unleashed at Chapter 1: Introducing Opalis Integration Server 6.3 and System Center Service Manager 2010 Unleashed at Chapter 1:Service Management Basics.
System Center Service Manager 2010 Unleashed was selected as the September, 2011 book giveaway for Microsoft Subnet.