5 deadly mistakes in agile IT operations

Making these mistakes makes your life more difficult, slows you down and hurts your reputation with the business

5 deadly mistakes in agile IT operations
Thinkstock

IT operations is under a set of conflicting mandates and pressures.

The business wants IT operations to be more agile and to be a partner in the process of bringing more business functionality online (also knows as digitization).

The executives in charge of IT (most often the CIO) want IT operations to be more cost effective, which means spending either needs to be reduced or not grow as quickly as it has in the past.

Application owners want two inherent conflicting objectives. They want IT operations to guarantee that their infrastructure will provide excellent performance for their applications, and they simultaneously want IT operations to feel more like a cloud provider with a rich set of self-service options.

Mistakes that make life worse for IT operations

So, the life of an IT operations executive is hard. Making matters worse, often IT operations makes mistakes that affect agility and hinder the ability to respond to business needs. Here are five mistakes that can have disastrous results:

1. Letting security rule the roost

In these days of constant attacks and hacks—with information security and data protection being board of director and corporate liability-level issues—it is natural to take a security-first approach to everything operating in the IT environment. But the ultimate approach to security would be to allow nothing to happen, which would shut down the business. More realistically, overly stringent and bureaucratic approaches to security can, and do, dramatically impede IT agility at all levels.

So, the key is to find approaches to security that maximize protection against threats and impose the minimum disruption upon application agility and IT agility. Think of security as a tax. We need the tax to pay for some fundamentally essential things. But we should structure the tax so as to minimize disruptions and distortions upon the economy (the operation of the business as a whole).

2. Organizing by IT silo

Many IT organizations have storage teams, server teams, networking teams, virtualization teams, teams for each operating system, teams for each piece of middleware (database servers, Java servers and web servers), teams for application delivery, mobile teams and teams for each application or set of applications. Having people who are domain experts in each area is a great thing. However, each having their own tools with no ability to share information and metrics across silos is a terrible thing because it makes it impossible to either deploy new things with agility or support things across the stack with excellent quality of service and reliability.

3. Organizing by IT layer

Everyone wants to do cloud. The natural way to organize your cloud computing initiative is to organize it by IaaS, PaaS and SaaS. So, the IaaS team owns all of the hardware, as well as the virtualization layer up to the hypervisor and the management of the hypervisor. The PaaS team owns all of the infrastructure software above the hypervisor and below the applications themselves—everything from the operating systems in the virtual servers through the run times (JVMs and PaaS frameworks), and quite possibly everything up to the containers (but not the contents of the containers). And the SaaS teams own the applications in the JVMs and everything in the containers.

This is a dramatic improvement upon the silo’ed approach, but it falls into a trap. The trap is that boundaries of these layers are defined by abstraction layers such as the hypervisor, the JVM and the container. Each abstraction layer makes it very hard to see how things above the layer interact with things below the layer.

For example, it is hard to know how a transaction running in a JVM affects the operating system hosting that JVM and affects the hardware underneath the hypervisor hosting the VM in which the JVM runs. The key here is to pursue approaches to metric collection and monitoring that span these layers of abstraction so that you have a full top-to-bottom view of the the stack (otherwise known as an end-to-end view).

4. Firing all of your smart domain experts

It is tempting for executive management to say, “I do not want to have to worry about that” and then think they have accomplished something when they replace internal experts who deeply understand applications and their supporting systems with some form of outsourcing. The problem is the people to whom you outsource don't understand the particulars of your requirements and environments. And as soon as you train them to some minimal level, they get pulled off of your account to go do something else.

This also applies to cloud services (see below). You cannot and should not outsource your infrastructure to any kind of a cloud provider unless that provider is willing to give you availability, throughput and response time SLAs similar to what you expect of your internal environment.

5. Having a cloud-first strategy

It is one thing (and a great thing) to have a cloud strategy. However, to assume there should be a bias towards cloud in general or a specific type of cloud strategy (for example, a public cloud-first strategy) is dangerous. That's because a bias towards any particular kind of execution environment does not treat the needs of the business supported by the application in question as a priority. The needs of the application to deliver the required response time and throughput to its users and its business constituents also aren't a priority.

A proper cloud strategy should start with the requirements of the business that owns the application or business service implemented by the application(s). And it should, in particular, focus on the performance (response time or latency), throughput (work done per unit of time), and error rate of the applications and transactions that comprise that business service. Public cloud vendors refuse to guarantee either the latency or the throughput of their infrastructure. That makes it impossible for you to guarantee the latency or throughput of the applications or transactions running on public clouds.

Public cloud vendors also refuse to guarantee 100 percent uptime of an individual virtual server or instance in their environment. That means it is incumbent upon you and your application architects to design your applications to be stateless with no single points of failure. This is achievable for brand-new applications designed for the cloud, but it is effectively impossible to retrofit the existing estate of stateful applications to an “unreliable” cloud operating model.

A cloud strategy is a great and essential thing. But depending upon your business needs, you might be best served with a private cloud or an IaaS offering that guarantees your own dedicated bare metal execution environment and access to all of the bare metal utilization, latency, throughput and contention metrics.

Summary

IT operations is under tremendous pressure to simultaneously cut costs and improve agility and responsiveness across the stack—from the transaction to the physical infrastructure.

It is tempting look at companies that run everything in a public cloud and think, “I wish my life were that simple.” But it is not simple to have no control over the infrastructure on which your your critical application runs. And it is a dangerous delusion to think a public cloud provider can provide you with a higher quality of service than your own internal IT experts.

So, give some thought to this before you outsource your IT operations expertise to an entity that does not have any domain expertise in your business services, will make no guarantees as to the quality of their services, and will state their innocence when presented with any kind of a performance, throughput or reliability problem.

This article is published as part of the IDG Contributor Network. Want to Join?

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Now read: Getting grounded in IoT