How long is your downtime?

How much padding should you put into your downtime estimates?

It's always a fine line to walk how much you need to pad your downtime estimates.  The business doesn't want to be down forever, but at the same time they get upset when you run over for whatever reason.  I always pad my tasks because you really never know what's going to happen.  I've had stupid things go wrong with the simplest ops and they took me a long time to get worked out.  What are the things that can go wrong?  Well, let's take a simple file move.  You're moving the DB files from one drive to another.  I've had them come up corrupt before.  So you've got to protect against that.  Typically, it's faster to move a file than it is to copy it, so I try to do moves whenever possible.  But the possibility of corruption means I have to take the slower copy op or do a full backup ahead of time to make sure I'm protected.  And I have to account for that in my plan.  So there's time for troubleshooting, and then you have to decide on a course of action, and then execute it.  All of that takes time and the business wants to know how long they'll be down.  So if the file takes 45mins to copy and it comes up corrupt on the other side, then I need to add in say 10-20mins of troubleshooting time (which includes time to make a directional decision), and then time to either re-copy or restore from backup.  I also need to figure in backup time, which can be vastly different depending on how the DB is setup.  And if you choose to re-copy and it's corrupted again, then you need to get your windows guys involved because you've probably got driver problems at that point.  And do you try to allot time for that too?  So while it's necessary to pad your tasks a little, it's really difficult to decide on how much to put.  And everything can't be a 12hr downtime.  That's just ridiculous.

It's about presenting it correctly to the business.  Don't tell them that the downtime ops will take 8hrs because that makes them think there's actually 8hrs of work.  What you should tell them is that the downtime is being submitted for 8hrs to allow for worst case scenarios, but the actual work is only going to take 5hrs.  But the business needs to understand that this padding is necessary because depending on the vendor and how well they write their upgrade code, the padding may absolutely be necessary.  And who knows, maybe you'll finish in 3hrs and everyone will be happy, but you need to put in some padding most of the time. 

One thing you don't want to happen is to consistently give them longer windows and consistently finish with ¾ of the time to spare.  It makes you look like you don't know what you're doing because you can't accurately estimate your task timeline.  So by telling them how long the task will take if everything goes well, and how long it could take if you have serious problems, you better prepare them for the realities the downtime.  In test it took us 3hrs, but that was our 4th attempt and we finally got it to run without any issues.  The different issues we came across on the other trials took us anywhere between 1 to 5 hrs to fix.  So we're submitting an 8hr window in case we run across all of those issues this time.  And it's impossible to tell how many we'll run into.  So our best case scenario is probably about 3hrs, and our worst case is about 8hrs.  That's a really easy sell to the business and if it isn't, then that just means that they really don't understand enough about IT to get it. 

At this point you should be using analogies to get your point across.  Talk to them about a surgeon operating on someone and how they can't tell you exactly what they'll find until they get in there.  There are many analogies you can use to try to show them that expecting you to be able to foresee every contingency is unreasonable.  And often times the ones who are being the most insistent are the ones who refuse to provide you with proper testing facilities to begin with.

So while often times we do our best to give reasonable downtime timelines, we do have to pad them to accommodate for unforeseen errors, and it's at those times we need to be honest with the business and set their expectations appropriately.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Take IDG’s 2020 IT Salary Survey: You’ll provide important data and have a chance to win $500.