In our recent blog story that generated substantial interest among Network World readers:
VoIP monitoring: The quest for call quality ubiquity
We covered VoIP monitoring tools and how making VoIP enterprise-class will very quickly become a priority item for network administrators.
At a high-level, these tools will enable IT to determine that there is something wrong with VoIP calls or that VoIP calls are being dropped.
However, these tools do not take into account that VoIP is just one component of a very dynamic ecosystem.
There are other applications on this shared network that are contending for the same resources.
So all these monitoring solutions do is confirm what the end-user most likely has already reported – that there is a problem.
When there is a problem with the VoIP service, network operations will need more information on what has affected the equilibrium of these applications to determine the source of the problem that is affecting VoIP quality.
However, some of these conventional monitoring tools may not be sufficient for managing VoIP quality and ensuring end-user satisfaction.
Details about the various network components and applications, such as how they normally behave over the infrastructure and when they are adversely affected, is becoming mandatory information for identifying the complex problems that affect VoIP performance, as problems can originate from anything that is leveraging the same infrastructure.
Knowing where to start in this complex web is the challenge for network operations and it can often take them days or weeks to locate and identify the target problem.
A new technology is now available to empower IT organizations to better understand their networks so that they can proactively detect and control problems associated with VoIP on the network infrastructure.
Called rapid problem identification (RPI), this agentless-solution uses NetFlow-based information (and other flow technologies) to provide live views of the networks, applications, users and the servers (including call managers) that are sharing the network infrastructure.
RPI analyzes network flows from every endpoint, network and application to establish individual, dynamic profiles for tens of thousands of networking elements that are sharing the infrastructure.
It then leverages these profiles to compare against the live activity of the ecosystem to pinpoint the problem source of a reported issue.
For example, to detect problems with VoIP, the RPI technology analyzes the application behavior of all the IP phones, call managers, and voicemail systems deployed across the enterprise.
This behavior analysis includes traditional performance variables like packet rate and bit rate, but also includes affinities to specific applications, periods of times and other endpoints.
The next step for RPI is to correlate and analyze all the VoIP phone-specific issues with other symptoms experienced on the infrastructure to determine a single specific problem source.
The result is actionable information about the source of the core problem that allows for a quick problem resolution to get the VoIP service up and running to expected quality levels.
Traditional monitoring tools do not provide this level of analysis, resulting in a less than enterprise-class service.
Fortunately, the inventor of this new rapid problem identification (RPI) technology - Jagan Jagannathan Ph.D., agreed to a Question & Answer session regarding the company he founded Xangati and the technology he invented.
Jagan is a veteran of Reactive Network Solutions, Xerox PARC, Sun Microsystems and SRI International and holds a Ph.D. in Computer Science from the University of Waterloo, Canada. |
1. There are a number of solutions on the market that leverage NetFlow data so what makes Xangati’s solution different?
The difference ultimately comes down to the problem area that a given product is trying to address, which then determines that product’s design goals.
The Xangati rapid problem identification (RPI) system is focused on enabling IT to find the problem source when there are complex performance and productivity issues affecting users, the network and/or the applications.
To do this, our product was designed to track the complex inter-relationships of all users/clients, servers, networks and applications in an IT infrastructure.
Our experience shows us that big problems arise when there are subtle changes in this ecosystem and our solution, through targeted live and historical data, delivers actionable information to guide the IT user to the problem source in the shortest amount of time.
In this context it should be noted that we leverage other ambient data including SNMP, LDAP, rDNS to help us concretely map clients, servers, interfaces and subnets.
Comparatively speaking delivering RPI represents a very different goal than other management products that also consume NetFlow.
These products are interface-centric and are specifically designed to provide details on the utilization of a WAN link as well as the breakdown of applications and users per link.
This is just a component of what we do, but it is the central focus for other products.
View a video demonstration of the Xangati RPI solution:
View a higher resolution video demonstration of the Xangati RPI solution:
http://www.xangati.com/demo/demo_ent.html
2. Why do traditional management solutions have challenges with assisting IT in troubleshooting network and application performance and availability problems?
The question is indicative of one of the frustrations that I hear consistently when I meet with enterprises—almost every one of them is able to recount for me a very recent and vivid story of an extended firefight.
A large reason for this is that traditional management solutions have a bottom up view of the world and are focused on the performance and availability management of a given IT silo: network, application, server etc.
The issue is that there is no integrated understanding of how the elements in the different silos interact with each other and according to Network World columnist Jim Metzler: performance problems transcend silos.
3. What does leveraging NetFlow as a primary data source allow you to do differently than traditional network management tools?
The utility of NetFlow data is that it provides sufficiently comprehensive information about interactions between different parts of large and distributed IT infrastructure without imposing a burden on the routers that generate the data while only consuming nominal bandiwtdh.
It obviates the need for probes all over the infrastructure and the need for potentially multiple agents on endpoints, both of which are prohibitive in cost and maintenance.
4. Who are the typical users of the Xangati RPI system and how are they using the solution in context of their daily workflow?
One of the primary users of the system is the network operations center (NOC) staff which has the twin challenge of managing both applications and networks.
When a problem is escalated to the NOC staff, they can quickly drill into the problem area through the Xangati UI and get a live view of the activity in that realm.
With the situational awareness they gain from the UI, they can seamlessly navigate to more detailed and context-laden views.
In addition to the NOC, we have also seen the service/help desk also find significant success with the solution.
In their workflow, they can start down at the end-user level and literally see what that end-user’s networked application usage is the exact moment the end-user is reporting a problem.
And at this level, the service desk rep can truly qualify, investigate and likely resolve the issue without an escalation.
This is quite a shift from what they were doing previously as the only solutions that were available to the help desk in the past were optimized for desktop support.
5. What are the areas of innovation you have focused your engineering team on in terms of cultivating unique intellectual property (IP)?
The framework for our IP started with the belief that you have to know about everything on your infrastructure with a great degree of granularity and specificity to catch complex problems.
The result from that thinking led us down a path of creating a highly scalable platform with the ability to have visibility into and awareness of the activity of each end-user.
Needless to say we undertook this challenge and put a great emphasis on scaling the system which can support up to 100,000 endpoints.
And scaling the system has an added degree of difficulty because it extends in three dimensions all with a high degree of specificity for each infrastructure element:
1) Delivering live activity views.
2) Learning the normal application experience of each element over time.
3) Fine-grained history reports presented at will.
Through these various mechanisms we enable a user of our system to have unparalleled access to critical troubleshooting information.
There is substantial amount of back-end work that our engineering team has created over time to sustain this and we continue to build upon it.
I should also point out that our UI framework is set up in a way that the data our system crunches is presented in an easy to use format that makes the information actionable.
On this front, we have seen this be an attractive aspect of our technology to the extent that the system is now simple enough to use that it can be embedded in the workflow of a help-desk support person.
6. You place a degree of importance on the concept of inter-relationships why is that and what value does it provide to customers?
Inter-relationships are essential for an IT user to understand because networked applications are essentially an amalgamation of cross-silo elements: application (which might actually be multiple applications, example given, web-front end, app server, and database back-end), network, clients and servers.
To understand what is normal, you have to understand the relationships within a networked application ecosystem and then across ecosystems.
And then if you hope to find the root cause of a complex problem, you will want to know where the relationships have changed between the ecosystem elements.
7. Xangati puts a particular emphasis on the end-user experience of networked applications, why is that?
The simple answer is that it is ultimately IT’s role to deliver a high-quality application experience to their business end-users.
Moreover, we place emphasis on it because it helps to make a point that application experience is incredibly important for end-user productivity but not often well understood.
This is surprising given the tremendous investments companies are making in networked applications.
Up to this point, the solutions to deliver visibility all the way down to the end-user have been very rudimentary.
As a result when a user calls to complain about their application experience, it is a great challenge to IT because the helpdesk doesn’t have visibility into what they are doing.
This is where the RPI Virtual Task Manager can be leveraged.
See this video clip for an example:
http://www.xangati.com/taskmanager/Wireless_hog4.html
8. If an enterprise already has a network management system in place, how do you see the Xangati RPI system fitting into the picture?
We see the RPI system as very complementary to traditional management system installations for example HP OpenView and IBM Tivoli.
Those solutions provide effective manager of manager (MOM) capabilities and our system can integrate with them by sending traps related to problems identified.
The role of these solutions and ours are very different.
The big solutions are ideal for providing comprehensive views of the up/down status of many disparate IT infrastructure components.
Our RPI system augments them by helping to find the complex problems that arise even when all the elements being looked at by a MOM are saying things are fine.
9. Can you provide some examples of the common kinds of problems your customers have identified with your solution?
The top issues we have seen are:
| Unscheduled back-ups clogging the WAN for an ERP application. | |
| Misconfigured VoIP call processes dragging down call center productivity. | |
| Software as a service application (SaaS) intermittently sluggish due to Internet video streaming. | |
| Server cluster not properly load-balanced. | |
| Hundreds of non-inventoried endpoints accessing centralized servers. | |
| Unmapped/unknown activity by critical servers. | |
10. Where do you see the market evolving in terms of how Cisco routers can fuel the intelligence of management products like yours?
We think that the richer the data that can be provided by routers and switches the better things are for IT.
In addition to NetFlow, Cisco solutions have NBAR for application recognition and IP SLA for latency and mean opinion score (MOS) measurement.
These technologies will have increasing value over time in the management of your infrastructure.
Since anything critical within your IT infrastructure is leveraging the network, then what better data source than the network itself to fuel management products.
Do YOU agree with Jagan that up to this point, the solutions to deliver visibility all the way down to the end-user have been very rudimentary?
Brad Reese cofounded BradReese.Com Cisco Refurbished, which enables affordable Cisco networks globally by assuring customer satisfaction with guaranteed one year warranties on both Cisco Repair as well as Refurbished Cisco.
Don't be shy, contact Brad Reese online or call him at 646-827-1130.