Before look at a wide area and network performance analysis tool this week I want to add another language to the list of worthy tools I suggested in Gearhead the week before last.
The first is Erlang, another functional programming language described by Wikipedia as "a general-purpose concurrent, garbage-collected programming language and runtime system."
Developed by Ericsson in 1986 and turned loose as open source in 1998, this language has gathered a lot of adherents and a recent book, "Learn You Some Erlang for Great Good!" by Fred Hebert (pub. No Starch), which is a great place to start exploring the topic.
Herbert says Erlang "is no silver bullet and will be particularly bad at things like image and signal processing, operating system device drivers, and other functions."
So what good is it? Erlang "will shine at things like large software for server use (for example, queue middleware, web servers, real-time bidding and distributed database implementations), doing some lifting coupled with other languages, higher-level protocol implementation, and so on," Herbert writes. "Areas in the middle will depend on you."
According to Wikipedia, in 1998 "the Ericsson AXD301 switch was announced, containing over a million lines of Erlang, and reported to achieve a reliability of nine '9's.'" Impressive.
I've only just started getting into this fascinating language (so many languages, so many books, so little time), but I'm going to give the book a Gearhead rating of 5 out of 5.
By the way, the publisher, No Starch, does something I love ... buy the book and you get a DRM-free ebook copy in ePub, Mobi, or PDF format thrown in.
No Starch recently released another good tome, "The Book of GIMP," which is a great way to learn about what is, in effect, the free, open source alternative to Photoshop called GIMP.
So, on to the main event for this week ...
Monitoring the performance of a network with WAN links is tricky. At the most basic level you want to know how fast data can get from point A on one subnetwork to point B on another subnet and, for that, you might consider using a tool to automatically ping remote nodes.
But while pinging tells you what the transit delay (latency) is from A to B, it doesn't tell you whether the bandwidth you think you should have between A and B is actually there. It also doesn't tell you about how well services such as Voice Over IP (VoIP) might work. Or not work.
A simplistic way to test available bandwidth is to run a load test ... just stop all other use of the link between A and B and upload and download files as fast as you can to see how much data the link can transfer. Voila! You'll have a snapshot of your connection, but it only tells you what things look like at that moment; a minute, an hour, or a day later the connection may not perform as well.
What you really need is a technique that can analyze a network connection in real-time and in-band.
<digression> By the way, what is this irritating habit that has sprung up in the computer industry where things like updates and patches are being described as "in-band" if they are released on a schedule and "out-of-band" it they aren't scheduled? Once again, it appears this linguistic offense can be laid at the feet of Microsoft and, sadly, it seems to have started to spread to the verbiage of all sorts of people and organizations. Mostly marketing types. Sigh.</digression>
Anyway, Packet Dispersion Analysis is such a technique. By sending a sequence of small packets of varying sizes and configurations it's possible to determine the latency, the available bandwidth ("headroom"), the bandwidth used, and how well certain protocols performed.
At first blush most networking professional will go, "Ah, yes, I can see how that might work," but it turns out that getting useful data out of this method is not, in reality, quite that simple.
If you want to dig deeper into how this is done you might care to read "Packet Dispersion Techniques and Capacity Estimation'', "What do packet dispersion techniques measure?", and "End-to-End Available Bandwidth: Measurement Methodology, Dynamics, and Relation with TCP Throughput" by Constantinos Dovrolis and others.
In conjunction with other researchers, Dovrolis developed two open source tools, Pathrate and Pathload, to perform these analyses.
The operation of the first tool, Pathrate, is "based on the dispersion of packet pairs and packet trains. To make a rather long story very short, we use many packet pairs (with packets of variable size) to uncover a set of possible 'capacity modes'. Then, we use long packet trains to estimate the so called 'Asymptotic Dispersion Rate' ... " after which it all gets very complicated.
Pathload, the other tool, is based on the idea "that the one-way delays of a periodic packet stream show increasing trend when the stream rate is larger than the [available bandwidth]." The explanation goes on to say "the measurement algorithm is iterative and it requires the cooperation of both the sender and the receiver. Pathload is non-intrusive, meaning that it does not cause significant increases in the network utilization, delays, or losses. The tool has been verified experimentally, by comparing its results with SNMP utilization data from the path routers."
Another tool, abget is available which estimates "the available bandwidth of an end-to-end path using only one of the end hosts. The second host can be any TCP based web server ..."
So, we have a suite of tools with some serious academic chops. You could download and compile the C sources if you so pleased, but you're in the business of running a network not hacking code, right? You would probably prefer to use one of the commercial tools from AppNeta that implement these techniques.
At the starter level AppNeta offers a free tool called SpeedCheckr which runs as a service under Windows and as 32- and 64-bit apps under Linux, Mac and Windows and which will track the total, available, and utilized capacity, data loss, latency, and round trip time for up to three targets from a predefined list of eight servers; two on the West Coast and the rest on the East Coast.
SpeedCheckr's client-side service (called, somewhat confusingly, Sequencer) is responsible for generating the packet "trains" of 20 engineered data packets once per minute. These trains will use less than 0.5% of the available bandwidth allowing SpeedCheckr and its big brother, PathView, to operate in production environments. The behavior of these trains are analyzed by the AppNeta servers and, for SpeedCheckr, the results for the last hour only are displayed on a Web-based console.
While SpeedCheckr provides some useful connectivity insight and performance measurements, if you want to do serious performance monitoring and analysis for hundreds of custom targets you're going to need AppNeta's PathView.
AppNeta's PathView uses Packet Dispersion Analysis to measure real-time network link capacity, latency and data loss.
PathView consists of a hardware appliance that implements the same functionality as the Windows Sequencer service. This device is added to your network on either a router span port or as a transparent in-line bridge between your network and the Internet. The appliance provides not only detailed service level monitoring and alerting in real-time but also comprehensive route analysis.
AppNeta's PathView also provides a detailed route analysis feature.
PathView also helps determine underutilized capacity and can detect 80 types of network performance problems. There's also an API for PathView allowing third-party developers to integrate PathView with existing network management systems or create new interfaces.
AppNeta's FlowView is an add-on that provides high granularity network traffic capture and filtering down to individual user level as well as performing detailed Quality of Service analysies.
But wait! There's more! AppNeta also offers AppView Web to slice and dice Web application performance (including the ability to determine whether a problem exists at the server application, network, or browser level) and AppView Voice which does the same kind of detailed analysis for VoIP.
Pricing is dependent on configuration and which modules are used, but the average remote site with support for PathView and FlowView will cost about $2,000 per year, a reasonable price for that level of detail and that simplicity of implementation.
The AppNeta system is one of the most compelling multi-site wide area and network performance management suites I've seen and gets a Gearhead rating of 5 out of 5.