If your network has between 1,000 and10,000 devices and computers, you have a midsized network. Your servers, connections and other resources suffer the same problems as larger networks, but your budget for keeping the network healthy is less than what large enterprises enjoy.
If your network has between 1,000 and 10,000 devices and computers, you have a midsized network. Your servers, connections and other resources suffer the same problems as larger networks, but your budget for keeping the network healthy is less than what large enterprises enjoy.
The ideal network management product accurately discovers devices, computers and apps, uses compute resources frugally, graphically depicts network components, monitors the health of every device or computer, gleans its data from a variety of sources, and works with both IPv4 and IPv6.
It also supports all devices, cloud resources, wireless connections and virtual servers, can accept and use complex descriptions of thresholds, can send alerts via e-mail, pager or text to different individuals or groups depending on the problem, can escalate notifications when problems persist, can perform root cause analysis, and can correct some problems automatically. Plus, it integrates with help desk software and with other monitoring tools, produces useful, easy-to-understand and timely reports, and is highly scalable, reliable and is easy to use.
In other words, it has to be able to do it all. We tested six such products in this review (watch a slideshow version of this story):
Paessler PRTG v12.4, Heroix Longitude v8.1, HP Intelligent Management Center (IMC) Standard and Enterprise v5.2, Ipswitch WhatsUp Gold (WUG) v16, SolarWinds Orion Network Performance Monitor (NPM) v10.4 and Server & Application Monitor (SAM) v5.2 and Argent Software Advanced Technology (AT) v3.1, including Argent Commander 2.0 and Argent Reports 2.0.
Argent Advanced Technology earns itself the Network World Clear Choice award, edging Heroix Longitude, which came in second. Advanced Technology gave us sophisticated thresholds, a responsive user interface, accurate device discovery, time-saving root cause analysis, helpful corrective actions and meaningful reports.
[ALSO: Essential Mac management tools]
Here are the individual reviews:
Argent Software Advanced Technology (AT)
Argent's Advanced Technology (AT) is a highly scalable, feature-rich and mature monitoring tool. Its Web 2.0-based user interface is intuitive, responsive and productive. AT's discovery process is accurate, its thresholds can express sophisticated networking situations and it can flexibly send alert notifications to multiple administrators.
AT doesn't require agents, but it can minimize WAN link traffic by using them to collect and forward network status information. AT can automatically correct a wide range of network problems.
The clear, meaningful reports are perfect for tracking network failures, spotting imminent performance issues, showing SLA compliance and analyzing capacity planning trends. AT isn't the least nor the most expensive monitoring product. In short - Argent's Advanced Technology is a highly capable monitoring tool appropriate for networks of varying sizes.
Furthermore, the interface implements many features found in traditional desktop applications, such as context menus, modal dialogs, client-side validation and virtual controls. However, its browser-based interface lacks some of AT's native Windows user interface, which Argent still includes with AT.
Argent Commander, an "umbrella" interface, is a customizable Web 2.0-based central console for managing moderately large to large networks. It shows real-time monitoring results, including server status, network health, critical key performance indicators and top X events, and it integrates with Active Directory for the sake of security. Argent Commander's Web interface supports multiple languages.
Argent AT's wealth of thoughtfully designed features served us well in testing. The discovery process used ICMP pings, SNMP queries, DNS lookups and other actions to accurately recognize, identify and harvest details from network devices. It quickly gave us a network map that enumerated our routers, switches, servers and clients. When we pointed AT at a particular router, it intelligently discerned the network links, nodes, devices and computers associated with that router.
AT organizes network status conditions into five categories: Acceptable Operation, Approaching Limit, At Limit, Exceeding Limit and Major Overload. The real-time monitoring and root cause analysis helped us quickly pinpoint network problems. And AT's sophisticated set of thresholds greatly facilitated our efforts to keep the network up and running. These thresholds let us specify abnormal traffic levels and unhealthy server behaviors by time of day and day of week.
Argent includes more than 2,000 pre-defined application- and device-specific rules in AT.
Without requiring agents deployed across the network, AT used SNMP, WMI and other protocols to find and diagnose our network's failures. In each case, AT zeroed in on the source of the problem (the root cause), which let us ignore the cascade of downstream linkage faults.
AT can take corrective actions, either by running a program, running a script, restarting a failed Windows background service or rebooting a server. It can even issue SQL statements (to trigger, for example, the running of an Oracle process).
AT escalated its notifications when necessary and its broad range of corrective actions let us set up automatic fixes for many network failures.
Platform-neutral AT monitors a range of server operating systems. It has an abundance of application-specific modules for monitoring, for example, Oracle, Microsoft SQL Server, Exchange, BlackBerry Servers, Lotus Notes, Brocade, Checkpoint, Cisco, Compaq, Dell, HP, Intermec, Legato, Liebert, NetWare, Nokia, Nortel, Lotus Notes, Omnitronix, Sonic, WebSphere and WebLogic.
Argent for Exchange fully monitors Microsoft Exchange 2007 and Exchange 2010, and it has legacy support for Exchange 5.5, 2000 and 2003. Argent for Exchange also supports the complete set of Exchange's client-access protocols, including round-trip testing, MAPI interface, Outlook Anywhere, Outlook Web Access, and Outlook Web Services.
AT's True Round Trip Time measurement tested our Exchange server by actually sending and receiving real e-mails and noting elapsed times, giving us early warnings of potential Exchange faults and performance problems. On all servers, AT notified us of server CPU utilization, disk space, low memory and network adapter issues. On our Windows servers, it monitored Windows services, Active Directory and system registry health.
Argent for VMware pervasively monitors both V3.5 and V4.0 hosts, including Datacenter, Cluster, Resource Pool, ESX and ESXi. It tracked events and logs in our virtual machines, and it also managed patch updates. Argent for VMware monitored the guest operating systems with the same thoroughness AT did with physical servers. Argent offers a Citrix XenServer virtual machine monitoring tool, as well. Argent doesn't support Microsoft Hyper-V or Red Hat KVM.
AT's reports are especially clear and meaningful. They're useful, out of the box, for charting events, expressing service-level agreement compliance and detailing network devices, computers and applications. They're an excellent source of data for capacity planning and historical trending analysis.
Creating new custom reports with AT is easy, with a run-time version of the Crystal Reports design tool. It took us only a few drag-and-drop operations and a few mouse clicks to get the information we wanted for the time period(s) we chose. We gave the report a name, saved it in an Argent Reports Folder and scheduled it to thereafter run weekly.
We designed both Argent Graph and Argent Table custom reports. For a Graph-based report, we merely needed to specify the AT network metrics to include, the network nodes to report on and a few report options. We could choose to graph various metrics, including SNMP metrics, Linux/Unix metrics and VMware metrics.
The table-oriented reports we created showed alerts, change logs, events, file audit results, Exchange mailbox activity, Exchange traffic activity, node details and summaries, performance data, Top X traffic generators and SLA downtime data.
AT runs on Windows Server 2003 or Windows Server 2008 on a machine with at least 2 CPU cores, 2GB RAM and a 1.2 GHz CPU. Its device/event repository can be either Microsoft SQL Server or Oracle.
PRTG Network Monitor is easy to use, simple to install and an excellent monitor of diverse devices. Unfortunately, its $10,800 price tag for an unlimited number of devices is expensive. Paessler also offers PRTG on a per-sensor basis (about $400 for 100 sensors), but the fine granularity of what Paessler terms "sensors" means you'll need lots of them.
"Sensors" are PRTG's basic monitoring elements. One sensor monitors one particular metric on your network (one single aspect of a device). This metric might be a switch port's traffic, a server's CPU load or a server's free disk space. We found we initially needed about 10 to 20 sensors per server and one sensor per switch port. Our sensor count quickly mounted as we added sensors for ping time, the traffic of a network interface and the status of a toner cartridge. Budget-conscious customers will be relieved to know that disabling an active sensor returns it to the license pool so they can then activate a different sensor, if they wish.
PRTG offers more than 130 sensor types, with each instance counting as one sensor toward the licensed total. The available sensor types range from simple ping, SNMP and WMI monitors to specific server-type sensors for database, mail, file, Web and FTP servers.
We used Paessler's Amazon CloudWatch sensors to monitor Amazon AWS Cloud applications. Paessler's new Google Analytics sensor showed us all activity for a Web site. And we found PRTG's monitoring of VMware, Microsoft Hyper-V and Citrix XenServer virtual environments highly useful.
PRTG installs on nearly any Windows version (XP and later) in under two minutes.
PRTG's fault tolerant clustering feature coordinates the running of a master PRTG node and up to four failover (slave) PRTG nodes. The master and slaves continuously monitor the network, so failover was nearly instantaneous in our tests.
PRTG requires the installation of agents (Paessler terms them probes) on non-local networks. Remote probes monitor WAN connections, collect/forward remote network statistics and can be used to distribute the network monitoring workload.
PRTG has a responsive, intuitive AJAX-based Web interface. PRTG color-codes sensors to indicate at a glance which ones are up, down, paused or in a warning state. Hovering the mouse over a sensor displays a graph of live data. Clicking on a sensor drills down for detailed, pertinent and useful information about that sensor.
PRTG's initial network discovery was quick and accurate. Once we supplied credentials so it could access our servers and routers, PRTG populated its device database and began monitoring all the devices it found. During initial discovery, PRTG automatically added a sensor instance for each metric for each device and computer.
We were surprised when PRTG's generated sensor count exceeded our estimates by a wide margin. If you need to restrict your sensor count to a licensed limit, you'll spend considerable time after initial discovery deleting unwanted sensor instances.
We liked PRTG's ability to hierarchically group devices and computers into meaningful sets. We found we could easily move nodes or subnets from one group to another as we arranged our network view by criteria such as geographic location or business function. Each server or router inherited login credential and discovery schedule settings from the parent group, or, if we wished, could have its own settings.
Configuring PRTG's sensor views to show important network status information was a breeze. We could choose to see, for example, the top 10 sensors for uptime (or downtime), CPU usage, fastest Web site responses and available disk space. After we applied our sensor view configurations to each of our hierarchical groups, PRTG's console gave us exactly the picture of our network we wished. However, PRTG doesn't have a speedometer-style dashboard, nor does it show real-time graphs and charts of network activity.
PRTG used sensor state and threshold triggers to send notifications and alerts via e-mail notes, SNMP traps, SMS messages or syslog entries. We set threshold values for sensors at various levels of the group hierarchy for device status or speed changes, threshold breaches and traffic volume levels. While these unsophisticated threshold settings will be adequate for many networks, we were disappointed that we couldn't tell PRTG to warn us if, for example, a particular WAN link experienced greater than 20% utilization after midnight or on Saturdays and Sundays.
Helpfully, PRTG understood device dependencies and stopped flooding us with downstream device alerts when, for instance, a switch port failed. PRTG was also smart enough to automatically pause a server's other sensors (CPU, disk space, etc.) when the server stopped responding to pings.
We could also keep an eye on our network while in meetings or away from the office. PRTG's handy iPRTG app for iPhone/iPad/iPod Touch (no Android support) gave us mobile access to the monitoring tool's sensor data and reports. IPRTG displayed current sensor statistics, the status of key sensors, a navigable "sensor tree" of grouped devices and computers, sensor alarms, monitor log entries and network maps. We could remotely pause or resume a sensor's monitoring, acknowledge an alarm, see sensor details and edit sensor settings.
Now called just PRTG, this tool was at one time known as the Paessler Router Traffic Grapher.
Java-based Longitude is easy to use, installs quickly, doesn't require agents, has a middle-of-the-bunch price tag and monitors a veritable plethora of devices, computers and applications. Longitude's simplicity belies its sophistication. We found we could use Longitude not only to keep our network healthy, but also monitor SLA compliance and perform capacity planning analyses.
Longitude comprehensively tracks thousands of operational metrics that it uses in its alerts, reports and charts. It monitors a wide variety of operating systems and environments, software and network infrastructure. Longitude can even keep a watchful eye on user and business metrics.
Longitude's ability to monitor VMware is remarkable. We found that we could automatically collect physical and virtual performance metrics for VMs, hosts, resource pools, clusters, datastores and whole data centers. Longitude consolidated VMware-generated alarms for unified alerting and reporting. We monitored the effect of virtual machines on the physical hardware, and we could optionally take corrective action on any of the performance metrics. Longitude doesn't support Microsoft Hyper-V or Citrix XenServer.
Longitude uses WMI to monitor Windows servers and desktops, SSH to monitor Unix machines and SNMP (via SNMP V1 community strings) to monitor network devices.
We could make Longitude's alert thresholds exactly as complex and realistic as we wished. For example, we used what Heroix terms correlated events to tell Longitude to alert us if a combination of different event conditions occurred. We specified two Event Conditions - file server CPU usage exceeds 50% and file server network connection traffic below 10% -- and then tied these conditions to a "runaway file server process error" correlated event. In a test, we deliberately ran a CPU-intensive program on the file server computer while no one was accessing its files. Longitude dutifully warned us that the file server was behaving strangely.
Longitude can send particular correlated event e-mail alerts to people other than network administrators. For instance, suppose the wireless access point in conference room 5 is not always reliable. In a test, we instructed Longitude to advise conference room users to avoid conference room 5 when other rooms' access points were working, but conference room 5's access point was not.
We were able to associate multiple actions with each alert (for example, sending both e-mail and text messages upon the occurrence of an alert), and we could even suppress events, if we wished, based on date, time, computer ID or the occurrence of a different event.
In addition to notifying you via an e-mail note when it detects a problem, Longitude can also send SMS pages and generate SNMP traps. However, perhaps the best notification is the one that doesn't happen - Longitude can initiate corrective actions at your behest.
For example, when it detected a shortage of available file server disk space, we told Longitude to run a batch file program that deleted .TMP files and did other housecleaning chores on the file server. In another test, we told Longitude to run a script that restarted a Windows service (background process) when it detected that the service had stopped running. Longitude fixed problems nearly instantaneously, long before we could have attended to the problems manually.
Longitude's consolidation and filtering of event logs is a terrific time saver. Plowing through multiple server event logs to locate specific important events is not a fun way to spend the better part of an afternoon, but knowing that critical errors have occurred is key to maximizing uptime and availability. Using pre-built, modifiable filters, Longitude collects event log entries from multiple machines and shows you just the ones you need to see.
In a perfectly platform-neutral manner, Longitude displays the filtered, sorted Unix and Windows server log entries together in the same list. The result truly unifies your system management efforts.
Impressively, Longitude automatically kept itself up-to-date (we could turn auto-update off, if we wished), and it also automatically performed maintenance functions on its network monitoring database.
We found Longitude to be the perfect SLA tracker for documenting the uptime and availability of our servers and applications. Besides monitoring the performance of individual servers, applications and devices, Longitude can take a higher-level view of the network via its SLA feature. Longitude rather neatly aggregated a group of our servers (some clustered, some not) to show, for instance, overall uptime for that group because they logically shared a particular workload.
In another test, when one of three related servers suffered downtime, but the two healthy servers continued to ensure application availability to the business community, Longitude on the one hand accurately and correctly noted the server's downtime on its dashboard and in its monitoring reports.
On the other hand, just as accurately and correctly, its SLA feature reported the overall availability of the shared three-server application as "good." When we tested with an SLA specifying that multiple resources (Web server, application server and database server) must all be available at the same time, Longitude unerringly reported SLA violations when one of the resources failed. Longitude's sophisticated SLA analysis understands the difference between individual server or application monitoring and measuring performance against the terms of an SLA.
We found Longitude's browser-based user interface intuitively easy to navigate and understand. Longitude displays at-a-glance real-time dashboards with pinpoint drill-down capabilities. Longitude's Event Monitor groups events by either device or application, and it can display additional information collected from Windows Event Logs, Syslogs, SNMP Traps and SLAs via its intuitive dashboard. It also scales well - when we simulated the monitoring of a large network, we found we could delegate network segments or specific administrator roles to multiple local Longitude administrators.
When it detects administrator access from an iPad, iPhone or Android device, Longitude displays a mobile app user interface. This interface showed us actual performance data when we received an alert notification, and we used the interface's dashboards to view both summary and detailed status information for servers, applications, devices and virtual machines.
The mobile app dashboard's color-coded pie charts gave us a quick, easy-to-understand picture of our network's health, and we used the dashboard to drill down to see specific data for computers, devices or applications we were curious about. On a mobile device, we could also view (and run) reports, see Event Monitor data or use Longitude's Real Time Performance Monitor.
Heroix Longitude runs on a Windows 2003 Server or Windows 2008 Server machine with at least a 2.4GHz P4 or Xeon processor and 2 GB RAM. Be aware that Longitude is written in Java, an interpreted bytecode language, and needs somewhat more horsepower than a native Windows application.
HP Intelligent Management Center (IMC)
Intelligent Management Center (IMC) was by far the most technically demanding of the products we reviewed. The download file was about three times the size of the other products (1.2GB vs. 400MB), the base platform Administrator Guide manual is a hefty 991 pages and IMC requires considerable database administrator expertise.
Furthermore, at $6,819 for just 100 devices, IMC is expensive.
HP IMC is a successor to the now defunct "OpenView" product and is intended for small- to midsized networks. For large networks, HP sells HP Automated Network Management Suite, which we reviewed last March.
HP got IMC in its acquisition of 3Com. Retooled slightly, IMC is actually better than Automated Network Management Suite at monitoring and managing non-HP hardware.
IMC is a mixture of native code and Java with versions that run on either Windows Server or Linux. In addition to monitoring computers and devices via SNMP, IMC manages device configurations (backup, restore and compare), checks (and can remediate) device configurations against policies that you establish and presents a unified virtual LAN/ACL interface for managing devices from different vendors (no more command line interface).
We noted that IMC supports some devices more fully than others. While it could poll all devices, it could perform configuration backups for many (but not all) and it could provide full component management (ACL, users, VLAN, etc.) for yet a smaller subset.
IMC's hierarchical model scales well, with each IMC server able to monitor and manage up to about 5,000 devices. Multiple IMC servers collaborate with each other; we only needed to use a single client browser window to administer all the IMC servers. Using its modular architecture, we added user access management, VPN management and traffic analysis. All modules integrated well and shared the same user interface.
IMC's browser-based interface shows a physical view of the network, a topology view and an alarm window. IMC also has windows for viewing reports as well as configuring and managing devices.
The primary window of IMC's browser-based interface displays "widgets," with each widget representing a network resource. IMC populates the primary window with widgets based on its highly accurate discovery of our network. We added, changed, moved and resized these widgets as we described the physical layout of the network to IMC. We quickly and easily added data centers, wiring closets and rack layouts. And we appreciated the ability to set up independent discovery schedules for different network domains.
IMC also automatically created a topology view of our network, based on the devices it discovered. The topology view shows the L2 and L3 links between devices and computers, and IMC gave us a real-time look at traffic conditions at the various links. IMC also displayed a performance view window, in which we could see the top N network traffic generators, note traffic bottlenecks and analyze trends.
We found root cause analysis and downstream event suppression, based on IMC's discovery of L2 and L3 network links, especially helpful when we were trying to understand a cascade of errors caused by the failure of an upstream device. IMC intelligently identified events associated with the root cause and discarded those events that were merely symptoms.
IMC's browser window of alarms used color codes to tell us about errors, warnings and informational events. IMC "recovers" alarms automatically if a device comes back online, or we could manually recover an alarm. Deleting an alarm removed it completely from IMC's database. From the alarm view, we quickly drilled down to view detailed alarm data and device details. IMC showed us recent monitoring information and device configuration features for the failing device.
IMC notifies administrators via e-mail notes or SMS/text messages, and it can forward alarms to other network management products in the form of SNMP traps.
IMC comes with about 200 pre-configured, customizable thresholds for measuring the availability, reachability, and performance of network devices. IMC's monitoring extends beyond devices to include IPSec VPNs, wireless LAN, QoS, VSM and RMON. Compiling a Management Information Base (MIB) into IMC to create one or more new alarm thresholds is somewhat technically challenging. However, the new alarm worked well in our tests.
Reports are useful and easy to understand. IMC used current monitoring data to give us a clear picture of our network's overall health, and it summarized historical data for network trend analysis and capacity planning.
IMC understands virtual environments. It monitored VMware, Microsoft Hyper-V and Red Hat KVM (but not Citrix XenServer) environments, and it gave us a remote console through which we could manage these virtual machines. IMC displayed informative maps of our virtual networks and systems. It automatically tracked our virtual machines' network access ports, and we used IMC to migrate images of virtual machines from one physical server to another.
The WLAN Manager gave us highly useful wireless LAN device configuration, topology, performance monitoring, RF heat mapping, and WLAN service reports, all integrated within the IMC browser-based user interface.
IMC downloaded its software patches from HP automatically, and it also received new firmware version releases for HP devices that we could apply when we wished.
Using the IMC mobile app for iPhone and Android, we easily tracked network performance and health remotely, via the Internet. These apps lacked some of the functions of the primary Web browser interface, such as configuration management. However, IMC's mobile app for iPhone or Android displayed the dashboards and alarms that an administrator who's away from the office would need in order to be aware of network problems. We noted that the IMC mobile app hadn't been updated in over a year, and at times the app seemed more like a "proof of concept" than a real adjunct to IMC.
IMC comes in two versions, Enterprise and Standard. The Enterprise platform can manage more nodes, it includes HP's Network Traffic Analyzer module and it's the tool of choice if you need to administer multiple IMC servers from a single location.
On Windows or Linux, HP suggests using a 2.0 GHz Pentium III or equivalent processor for fewer than 500 nodes, along with 2GB RAM and 50GB disk storage. For more than 500 nodes, HP suggests using a multiple-CPU machine. IMC also requires Microsoft SQL Server (on Windows) and either Oracle or MySQL on Linux. Note that IMC assumes it's the only application that accesses the database. Getting IMC to share an existing SQL Server database with other applications will likely require the help of an HP support person.
Ipswitch WhatsUp Gold (WUG)
WhatsUp Gold (WUG) looks simple, inexpensive and capable at first glance. However, a little experience with WUG reveals that, for all but small networks, it's a complex network monitor with some stringent system requirements, some shortcomings with respect to supported network devices and a price that increases dramatically as you add both nodes and features.
Ipswitch offers Standard, Premium and Distributed Editions of WUG. Optional plug-in modules include WhatsConfigured, Flow Monitor, WhatsVirtual, Flow Publisher, VoIP Monitor, Failover Manager and Scalability Pollers. All editions use Microsoft SQL Server as a network event and device repository, and they use Microsoft IIS as the underlying Web server for displaying WUG's user interface.
WUG's Standard Edition is an excellent, basic monitoring tool for not-too-large, uncomplicated networks. We found the Web browser-based main console to be intuitive. It shows network health in an at-a-glance set of dashboards, network maps and graphs.
The dashboard layout is instantly informative and meaningful. WUG organizes views of the network into separate windows for Wireless, Log Management, Flow Monitor, Devices, Inventory, Configuration and Reports, and each view is eminently customizable. We appreciated the appropriate-to-the-task network health data that each view showed us, and we liked the ability to filter results so we could focus on a specific device or a particular time period. Getting a quick, detailed device status display was simply a matter of hovering the mouse cursor over the device's entry in the Device Details view.
WUG's home page window contains a universal dashboard, which reveals just that network information an administrator generally needs to see most frequently. This data included number of devices, current health status, active monitoring, hardware performance, recent alerts and wireless activity. We easily customized WUG's primary display window by right-clicking to remove unwanted sections and dragging and dropping into the primary window the network metrics we wanted to see most often. Other views are similarly customizable. For example, we customized the device status dashboard, which shows a detailed view of a single device's health, by dragging and dropping relevant metrics onto the device status view.
Dealing with problems was straightforward via the Alert Center, which showed alerts, alert acknowledgements and notifications across the network. WUG's performance monitors revealed CPU, disk space, network interface and memory utilizations, along with ping latencies. WUG's network discovery produced a L2/L3 topology map that included asset/inventory data. WUG has the ability to associate JScript or VBScript commands with a particular alert, which let us, for example, restart a failed Windows Service.
The Standard Edition discovered, mapped and inventoried assets across our network. The discovery process automatically assigns a device type to each node, but WUG mis-identified about a quarter of our devices. No problem - fixing these errors was just a matter of clicking on a device in WUG's browser-based interface and correcting its device type.
We liked that we could schedule the discovery process or perform it on demand. WUG's L2/L3 device discovery used ARP, SNMP, SSH and ICMP to note interconnections and dependencies, and, when the information was available via a query, inventory data such as serial number and OS version. WUG uses SNMP, WMI or the VMware API to obtain device data for the device status view.
The Log Management feature is rather helpful and a great time-saver. With it, we saw a consolidated view of syslog entries and Windows events for all the devices. We could easily pinpoint problems, produce reports (filtered by device or type of log entry, if we wished) and set up alerts to notify us of particular entries or issues.
We liked WUG's alert notification flexibility. We set up e-mail, pager, SMS, Web alarm, Windows popup, Klaxon sound and SNMP trap notifications, and we told WUG to restart Windows services and run external scripts when it detected easily-corrected failures. We could also tell WUG to escalate alert notifications with e-mails to additional people when problems persisted.
We viewed several of the more than 200 useful, informative reports not only in the Web browser window but also in Excel, Acrobat PDF and CSV formats, and we could instruct WUG to e-mail these reports.
WUG's Premium and Distributed Editions add significant features, including real-time network monitoring and instant graphs, WMI application monitoring, PowerShell integration and HTTP/FTP synthetic transactions. These upscale editions also add monitoring of UPSs, printers, fans, power supplies and temperature, as well as wireless networks. However, we were disappointed that WUG's wireless network monitoring and mapping supported only Aruba Mobility and Cisco Aironet/Airespace controllers and access points. For these devices, we were able to map access points in WUG and could see real-time wireless network statistics.
The Premium and Distributed Editions (but not the Standard Edition) contain the Dashboard Manager for configuring custom views of network data, and they also have support for instant access to real-time data via WUG's Instant Info.
The Distributed Edition uniquely adds remote monitoring to what the Premium Edition offers: Remote IP Services, Remote SNMP Monitoring, Remote WMI Monitoring, Remote Device Dependencies, WAN Device, Port and Link Monitoring, Remote Alert summaries and Remote Reporting summaries.
The optional plug-in modules should, we think, be part of the basic product. For example, the WhatsConfigured plug-in collects and records existing configurations setting for each monitored device, and an administrator can use it to distribute configuration changes as well as issue alerts when changes are detected.
The Flow Monitor plug-in shows, in a single window, data gathered via Cisco NetFlow, sFlow, J-Flor, and Border Gateway Protocol (BGP) from switches, routers and other network devices. The result is an informative real-time view of LAN/WAN network traffic patterns and bandwidth utilizations. Flow Monitor also identifies the users, applications and protocols that are consuming the greatest bandwidth.
The WhatsVirtual plug-in component groups all VMware machines detected on the network for virtual machine-specific monitoring. WhatsVirtual gave us a separate, VM-appropriate view through which we mapped and monitored virtual machines. Unfortunately, WUG doesn't support Microsoft Hyper-V, Citrix XenServer or Red Hat KVM.
Support for mobile devices rests in the WUG Web server, not in a mobile app. When the Web server detects mobile device access via Mobile Safari, Microsoft Internet Explorer Mobile or Opera Mini, the server emits small-screen Web pages with content designed for a mobile interface. It was somewhat limited - we could choose just between device status information and reports. Helpfully, the mobile interface gave us a list of recent reports, and it let us identify our favorite reports.
To monitor 100 devices, WhatsUp Gold needs at least 2GB of RAM on a 2.4 GHz dual-core machine running Windows Server 2008 and Microsoft SQL Server 2008 Express Edition. For 2,500 devices, WUG needs at least 8GB on a 2.4 GHz quad-core machine running Windows Server 2008 and a separate, dedicated Microsoft SQL Server 2005 machine. For 20,000 devices, WUG requires at least 8GB on a 2.4 GHz eight-core machine plus a separate, dedicated Microsoft SQL Server 2005 eight-core fast machine with at least 32GB RAM.
SolarWinds Orion Network Performance Monitor (NPM) and Server & Application Monitor (SAM)
Network Performance Monitor (NPM) tracks network and server activity, while Server and Application Monitor (SAM) monitors the software running in the servers. NPM and SAM complement each other.
The combination of the two products alerted us to network problems, let us set up sophisticated thresholds, was agentless, gave us useful reports, displayed a helpful network map and let us delegate administrative subtasks. However, the NPM and SAM interfaces are not as responsive as we would've liked, the two products are pricey for larger installations and they don't have the ability to automatically correct problems by running scripts or restarting failed Windows services.
NPM uses the ICMP, SNMP, WMI and Syslog protocols to gather connectivity and performance data from routers, switches, access points and servers. NPM's root cause analysis used this data to correctly and unerringly identify the true network problems we confronted it with.
However, NPM's root cause analysis relies heavily on what NPM terms group dependencies, which we found to be tedious to set up. We had to designate nodes as Parents and Children for NPM to know that, when a Parent failed, NPM should trigger a single alert for the Parent and report as "Unreachable" the (say) 50 Children connected to the Parent. Identifying Parents was easy, but designating the Children was a one-by-one, device-by-device process. NPM only let us tie a single Child or a single group to a Parent.
We would've liked the ability to associate multiple Children with a single Parent. Putting the dependent Children into groups and then associating the Parents with the Child groups was an alternate approach, but it was similarly time-consuming.
We easily set complex and sophisticated NPM thresholds for alerts and notifications. We used the thresholds to describe dependencies for correlated events, sustained conditions and complex combinations of device states. Because NPM's complex thresholds accurately expressed real-world conditions we wanted to know about, we had a higher level of confidence that a real network problem had occurred when NPM triggered an alarm.
NPM's browser-based interface is intuitive to use, albeit a bit sluggish at times. Drilling down to node details displays basic device data. The next lower drill down level reveals specific data, such as RAM usage, disk usage or bytes in/bytes out. Unfortunately, the browser interface is incomplete. Some tasks, such as configuring alerts, can only be done via NPM's native Windows interface.
The dashboard provided a useful summary of our network's performance and availability status at a glance. Customizing NPM to show current alerts, recent events, node lists, network maps and other network status information was a simple process, and we liked the ability to hover the mouse cursor over a node to see a pop-up window containing key metrics for that node.
NPM's customizability extends beyond the dashboard. For instance, we set up the node details window to display exactly the device metrics we were interested in. Even NPM's charts can be tailored to suit specific needs. Merely clicking on a chart starts the customization process. Modifying a variety of charts to use custom data ranges, titles and data was hassle-free.
The ConnectNow feature automates network mapping. After we dragged and dropped devices onto NPM's network map, we clicked the ConnectNow button and NPM automatically mapped the connections between the devices.
Optional NPM modules include NetFlow Traffic Analyzer, VoIP Monitor for VoIP traffic analysis, Virtualization Manager and Wireless Network Monitor.
Virtualization Manager supports VMware and Hyper-V, but not Citrix XenServer or Red Hat KVM. Its real-time dashboard clearly showed virtual machine performance data, including CPU, RAM, disk and storage I/O contention. Performance alarms, based on thresholds we set, integrated nicely with NPM's alerts and notifications. We liked managing our vCenters, clusters and individual virtual machines through a single interface. We also found Virtualization Manager a useful aid for capacity planning.
We noted that NPM works especially well with virtual storage area network (VSAN), Fibre Channel and Cisco devices. For instance, it showed the traffic levels at each VSAN as well as the VSAN ports that were in use. Setting thresholds for VSAN alerts was similar to setting them for other nodes.
NPM's useful and meaningful reports are a snap to select and schedule. Exporting them as Acrobat PDF files is a breeze, and reports are highly customizable via the Report Writer module. However, while the browser-based Report Center offers many pre-built reports, it doesn't generate graphs. Report Center data appears in a table format.
The free NPM Mobile Monitor uses the Remote Desktop Protocol (RDP) and Virtual Network Computing (VNC) to access NPM's functions. When NPM triggered an alarm, the Mobile Monitor displayed the alert. We could then drill down to see the exact nature of the problem. As a nice touch, if you don't have the NPM Mobile Monitor loaded on your smartphone, NPM's Web server detects browser access from a mobile device and displays the user interface in a format suitable for a small screen.
A second, not-free Mobile Admin Monitor is also available. In addition to NPM, it administratively connects to more than 40 applications and operating systems, including Oracle, Microsoft Exchange, Microsoft SQL Server, Windows server, Remedy help desk, Microsoft System Center Mobile Device Manager and other network entities.
SAM (Server & Application Monitor) is a useful tool for inspecting the running services and processes on database, e-mail and other servers. SAM reveals dependent services, and it can help you understand how the failure of one network service affects other services and processes. SAM shows more detailed server activity than does NPM. For example, SAM's performance metrics gave us real-time data on database transaction rates and e-mail traffic loads.
SAM's network map shows, at a glance, the performance of a network's servers. When SAM sent us notifications of problems, based on thresholds we'd set, we drilled down from the network map to discover the nature of the problem. We used SAM's User Experience Monitors, which assess users' Quality of Experience (QoE), to help us determine that a particular server process was using too much CPU. We quickly and confidently realized that high CPU usage rather than high network traffic levels was causing a problem.
SAM monitors a plethora of software applications, tools and services.
NPM and SAM run on at least a 2GHz dual processor machine with 3 GB RAM and Windows 2003 Server or Windows 2008 Server. They also need SQL Server 2005 or 2008.
Network Management Tips: Establish a baseline
Your first step with any new network monitor is the establishment of baseline data. Well-run, highly available networks depend on network administrators and troubleshooters knowing what’s normal. When someone complains about performance and you find what you think might be the culprit bottleneck, a quick check of baseline data can help confirm your diagnosis. By comparing current performance data with baseline data, you can note a burgeoning latency or bandwidth problem before network users see a drop in response times. The analysis of baseline and current data can even suggest ideas for network improvements (such as load-balancing servers) that will help your network run more smoothly.
Network Management Tips: Set up a separate monitoring tool
If the machine running your network monitoring product dies or otherwise goes dark, you want to know right away. You should consider setting up a separate, simple, one-on-one monitor to make sure the network monitor machine is still alive.
We earnestly suggest that administrators of small- to medium-sized networks additionally acquire a free version of a network monitoring tool. (Many of the vendors in this review, for example, offer a free version for monitoring fewer than five nodes.) Use the free version just to keep an eye on the machine running the network monitoring product you buy. If you can, put the separate monitor on a subnet different from the primary network monitoring tool and give the secondary monitor a different Internet link and different e-mail server. You’ll definitely appreciate the extra peace of mind the secondary monitor gives you.
NetXMS stacks up wellNetXMS thresholds are flexible and can accommodate many networking situations. Alerts can be SMS messages and e-mail notes, and the NetXMS root cause analysis function, which consists primarily of event correlation rules, correctly identified the true source of the network problems we confronted it with.Impressively, NetXMS can initiate corrective actions when thresholds are exceeded. These actions include rebooting a device and restarting a process or service on a Windows or Linux machine. The interface, which is easy to use and responsive, even let us associate network infrastructure elements with business functions. This feature is a godsend when multiple network errors happen concurrently on a large network and a network troubleshooter needs to know which business areas are affected by the network errors.
We put NetXMS, through many of the same paces as the competitors in this review. NetXMS fared surprisingly well in our testing. Layer 2/3 network discovery was accurate, and the tool gathered network status and health data via SNMP and, optionally, native NetXMS agents.
Nance runs Network Testing Labs and is the author of Network Programming in C, Introduction to Networking, 4th Edition and Client/Server LAN Programming. His e-mail address is email@example.com.
How We Did It
We evaluated each product's abilities in several different areas: Discovery and enumeration of devices and computers, support for a variety of device manufacturers and device types, global directory integration, graphical depiction of the network, monitoring of network node status (availability), performance and health, alerts and notifications when network problems occur, automated corrective actions, maintenance of trouble tickets (or integration with a help desk tool), support for virtualized environments, cloud support and the production of useful, informative reports. We expected the reports would help us establish baselines, show available and unavailable devices, track each device's availability history, identify trends, give us the basis for accurate capacity planning and help us spot conditions that could result in future network problems.
Our test environment consisted of six routed Fast Ethernet subnet domains with T-1, T-3 and DSL links to the Internet. We ran each network monitoring product's server component(s) on a four-socket HP Proliant computer. Our server software was variously Windows 2008 Server, Windows 2003 Server and Red Hat Enterprise Linux Server. The 150 client computers on our network were a mix of Windows XP, Windows 2003, Windows 2008, Windows 7, Windows Vista, Red Hat Linux and Macintosh platforms. Relational databases on the network were Oracle, Sybase Adaptive Server and Microsoft SQL Server. Our e-mail servers were Sendmail and Microsoft Exchange. Web servers on the network were Internet Information Server (IIS) and Apache.