Monitoring tools
|
|
|||
|
|
The tag has been made and I'm back in the ring. That's right. Keith Pelczarski has left the building after several excellent columns on Web site design, all with only one major digression.
Now I am back in the batter's box, mixing metaphors and rambling about technical Web stuff. Writing of which, today's column is sponsored by the letter "M," as in Monitoring. Yup - I'm gonna address that pesky issue of figuring out what is working and what ain't on a Web site.
I receive e-mail on a pretty regular basis that goes something like this:
Dear Dwight,I love fan mail. And I'm always happy to accommodate my adoring fans. So, at the risk of sounding like another Network World columnist who possesses a frighteningly similar name, I'd like to outline the Fool's site monitoring strategy and talk about the tools we use.You use NT and IIS. Therefore, you are a bonehead. I hate Microsoft. I hate Bill Gates. And I hate you. How the heck did you ever get your job with such a tiny brain? I hope you die a horrible, painful death very soon. BTW: I need to monitor several Web servers and was hoping you could give me some advice in that area.
Hugs and Kisses,
UnixKicksNTAssP.S.: Did I mention that you are an idiot?
There are a lot of tools out there that let you monitor servers. Before looking at any of them, you should ask yourself a question: Why?
Why do you want to monitor your site? What do you hope to accomplish? If your site is very small or your tolerance for downtime is very high, there may not be a compelling reason to monitor your site 24x7. We're all pressed for time and you may have better things to do with yours. However, if you're like the Fool, you depend on your site being up and speedy around the clock. And you want to know about problems as soon as they arise. When it comes to monitoring, we want to:
- Catch problems quickly.
- Limit site downtime.
- Track site performance.
We did a lot of research to locate a single tool that would allow us to accomplish all three goals. Unfortunately, we couldn't find a tool that would. However, we did find two tools that, when combined, do what we need:
Keynote
|
For more granular data, we use:
SiteScope
There are many tools in the market that let you monitor servers. We have found one that we think stands out. SiteScope from Freshwater Software is the best thing since sliced bread - we haven't found anything that comes close for the price. But besides price, we like its performance, flexibility and breadth of features. Let's take a closer look.
Some monitoring packages only ping a server to check its health. That's not really good enough, though, because a server could respond to pings, yet not be serving Web content correctly. Other tools make an HTTP request to check for Web server health. But a box serving up error messages might look perfectly healthy to such a tool.
Sitescope lets you monitor myriad server variables. Most important, it will parse Web pages and check for specific strings in the HTML. We use this capability extensively.
At the Fool Web site we use several Web servers. We use SiteScope to monitor eight URLs for each Web server: the home page, a page of stock quotes, a stock portfolio page, a user-customized MyFool page, a message-board post, a page that lets users edit their portfolios, the Fool UK main page, and a UK quote page.
We also monitor the SMTP service on each server. SiteScope checks each of these every minute for every server we have, so if there is a problem (no response or slow response), we see it immediately.
We also combine this with a couple of tricks of our own, some using Active Server Pages (although any server-side scripting language would work).
As the scripts build our pages, they put an HTML comment at the bottom of each page only if they don't run into any problems:
The scripts can also modify our content pages, so if they run into any problems (being unable to connect to the database server, for example), they also leave out that comment tag. Sitescope looks for this tag (and a couple of other things) when checking the pages. If these comments don't appear, Sitescope triggers an alert. It's simple yet effective - two very good qualities in monitoring solutions.
Sounds pretty simple, right? We're just checking pages and parsing the results.
We also log errors to our database server for just about every service on our site (quotes, portfolios, message boards, registration etc.). We make sure to log every 404 error. We have set up several ASP pages that query the errors database to check for the number of errors in the last x minutes. If it is above a certain threshold, we do not write the comment tag that SiteScope is looking for.
SiteScope checks these pages once a minute. If there are any problems, an alarm is tripped and we are notified of the specific problem. We are able to monitor the amount of free space on our database sever (and several other things) in much the same way. By doing this we keep our fingers on the pulse of our Web site and stay on top of any problems. But wait, there's more.
In addition to checking pages, SiteScope records the results of each check, specifically the success or failure of the check and the time elapsed from request to delivery. It uses this data to automatically generate and then mail out daily, weekly and monthly reports that detail our uptime and the average speed of page requests. This gives us a good idea of what the site availability is and how quickly we are serving our customers. Here's an example of one of those reports:
A history report from SiteScope for 1:00 am 6/23/99 to 1:00 am 6/24/99 has been created for you.
Report Summary:
Name Uptime% Error% Warning% Last (Main Page) The Motley Fool 100.00 0.00 0.00 good (Quotes) The Motley Fool 100.00 0.00 0.00 good (Boards) The Motley Fool 100.00 0.00 0.00 good (My) The Motley Fool 100.00 0.00 0.00 good Name Measurement Max Avg Last Main Page round trip time 1.67 sec 0.65 sec 0.56 sec Quotes round trip time 5.27 sec 0.98 sec 1.12 sec Boards round trip time 5.98 sec 0.68 sec 0.56 sec My round trip time 10 sec 1.68 sec 1.22 sec
This is most helpful. In addition, we can look at the trends to see if we are improving, getting worse, or staying about the same.
SiteScope can be used for several other tasks. If you want to do some competitive research, you can monitor pages on your competitors' sites and include them in the reports. This will show you how you stack up. Yet another use is to compare hardware and software. Suppose you want to compare the speed of two servers. Simply set them up at your Web site and have SiteScope monitor them. The reports will show you which server is faster and more reliable. We actually did this when evaluating Akamai. We decided we wanted to compare Akamai to our Web servers and our Network Appliance Filer. We added all three to SiteScope. After a month, we had our results:
Name Uptime% Error% Warning% Last (Graphics) Web Server 99.14 0.85 0.00 good (Graphics) NetApp Filer 99.23 0.76 0.00 good (Graphics) Akamai 100.00 0.00 0.00 good Name Measurement Max Avg Last (Graphics) Web Server round trip time 3.25 sec 0.34 sec 0.29 sec (Graphics) NetApp Filer round trip time 33 sec 0.28 sec 0.23 sec (Graphics) Akamai round trip time 16 sec 0.10 sec 0.06 sec
This and the Keynote reports we ran convinced us to go with Akamai. It's hard to argue with 100% uptime over a month with an average response time of less than half of the alternatives'. The beauty here is it moves you from the world of "I think ..."/"I feel that ..." to hard numbers.
Another useful feature of SiteScope is its alert capabilities. You can set it up so that if x% of your monitors is in error for more than y minutes, SiteScope will send an e-mail alert. We have it set to send e-mail to our cell phones so that no matter where we are, we will know if there is a problem with our Web site. We have found that these alerts are often tripped by poor connectivity from Fool HQ to our Web site. That is actually a good thing in that SiteScope also helps us monitor our providers' networks.
We even use SiteScope within Fool HQ to monitor our Exchange server, intranet server, and the CPU, RAM, and NIC usage on a couple of servers. This helps our internal operations techies monitor the health of our LAN. As you can tell, there are many uses for SiteScope.
So you are probably thinking, "Wow - that is a LOT of monitoring. You must have a big screen to display is all." Actually, the monitoring information window is quite small. Most of the Web techs at the Fool place the monitoring page on their desktop using NT 4.0's Active Desktop feature. This takes up a small amount of screen real estate and allows us all to keep a constant eye on our site performance. If there are any problems, we immediately see the red dots and can drill down to see what the problem is. The result is that we are able to quickly fix the problems that pop up from day to day.
OK, at this point, I sound like I am gushing - just a SiteScope shill. I should probably say that I have no financial interest in or relation to Freshwater Software. I don't usually get too excited about software tools. However, when I find something that just totally kicks butt and helps my team do its job better at an incredibly reasonable price, well ... I like to share the love. In the Gearhead spirit, I give SiteScope five jester bells out of five. It is truly an impressive product and one that can be extremely helpful in keeping your Web site up and running 24x7x365.
Until next time, y'all be Fool.
Related Links
The Motley Fool
The premier site for investment advice on the Internet, of course.
The Foo' Bar archive
All the Foo' Bar columns, in one convenient place.
Feedback
Tell us your thoughts on this article or the issues it raises.

