If you work in a big organization replete with financial and human resources, ITIL may be your best bet for structuring your application performance management. But if your organization is like most, and you don't have the wherewithal and time to implement full-blown ITIL, we suggest a much more practical and less onerous approach to managing application performance. At NetForecast we call it "ITIL-Lite" for APM. This approach consists of four APM best practices, which when well executed get demonstrable results. These are to understand, measure, and communicate about application performance--and to link application performance to the business.
Understand performance by learning about your applications and their requirements, application users and their requirements, and your infrastructure environment. The most basic understanding comes from watching users do their jobs--or have fun if that is what you deliver--interacting with an application. It is also valuable to talk to users about the experience. We know many technologists who play key roles supplying application services, yet never talk to users! Understanding users and how an application works is a must.
Measure any technical parameter that influences application performance or user satisfaction. You can never have too much data. Yes, you may not use it all--but you will have data to show changes. Performance management is about incremental improvements. Measure small improvements early, even if they appear modest. This gives you data to justify bigger changes for larger gains. The bottom line is you must measure, measure, and then measure some more.
Communicate your findings. This means showing people your measurements and explaining the performance gains--or losses. Write your reports so non-geeks can understand them. You must communicate technical data differently to non-technical audiences than to your tech-savvy peers. We know many skilled application performance managers who never show their reports to anyone. Like gnomes, they hoard data for their own use--and only retain it to cover their backsides should a decision backfire.
Link performance to business needs. Performance reports shared outside of IT should be grounded in "what matters" to the business. Is management eager to see availability and/or utilization numbers? Not! What executives want is information like the amount of revenue made or lost, number of orders processed, patients treated, users positively or adversely affected, etc. The business metric you choose must foremost be important to your business, and it must have a business goal. Linking captures how well the application's performance supported the business goal.
The goal of APM best practices is to improve application performance--and for the best results these best practices cannot stand alone. Each must be part of a continuous improvement process that ensures that your application performance supports your business needs as shown in the figure below.
The process begins by understanding your user and application needs. Then you must measure data that reflects your understanding. The data has little value unless you report it to the right people within your organization. Finally, the reports need to serve as input to link performance to key business needs. At that point, your IT and business groups engage in dialogue that improves your IT group's understanding of what is important and how to measure against the new objectives. After that another cycle starts.
Dialogue is vital because the business group injects contextual understanding about what really matters for applications users, and that understanding enables you to measure the right things and set thresholds that will help you optimize performance where it counts most.
Here's an example of how these best practices work in the real world. Picture this. You spend a morning looking over the shoulder of web-based order entry system user we'll call Shirley as she enters customer order information, and you watch as around 11 AM Shirley starts to fidget and finally takes a coffee break when her data input task completion times (which you are timing with a stop watch), exceed 10 seconds. After she takes her break, you head back to your cube and over the course of the afternoon and subsequent days you point your user response time tool to track order entry application users and correlate response time with active users, time of day, and general network traffic. The measurements show that during peak network usage times order entry task completion times routinely exceed 10 seconds. You document your findings and send them to your manager and to Shirley's boss, noting that based on your observations Shirley's productivity is being hampered by poor application performance--and recommending that steps be taken to improve order entry system response times during times of network stress.
You ask Shirley's boss how many orders Shirley should reasonably be able to enter per hour so you can translate that information into the application response times needed to meet that business goal using the data gathered which shows users are most productive (active) when response time is below 4 seconds.. You tell your boss that to meet that business goal, he'll have to open the departmental purse strings to install a QoS solution to prioritize the web-based order entry application, and you convince him to fund a QoS pilot.
You complete the pilot, measure the resulting order entry task times, and create a report showing that the pilot QoS deployment consistently reduces task response times during peak hours to four seconds. Your boss asks you to present your findings to the budget committee, which agrees that the business benefits justify deploying the production QoS solution. Shirley and her boss are delighted, and treat you to lunch to thank you.
During lunch they ask if you could reduce the task response times to two seconds. You respond by showing management a chart that compares performance for web-based order entry activity in Shirley's group with another group performing the same work but closer to the data center. The closer group gets two-second response times, but is not more productive. You recommend that improving order entry productivity will take a combination of better response time and a simpler entry process with fewer steps. Improving response time to two seconds by itself will be expensive now that QoS has done all it can, and you will need to upgrade WAN bandwidth. You are then invited to work with the order entry application group to show them that improving productivity is now in their court.
Your tools helped solve the issue and educate the company, but you had to know how to apply the tools, where to look for answers, and communicate your insights to company. This is the difference best practices bring to your job and your success in your job.