After 10 years on the market, products should be better at reporting, usability and advanced correlation features
We deployed all of the SIEM (security information and event management) products in a live, production environment and ran them over the course of several months. We were both impressed by the depth of features that some of these tools have and frustrated by how far they still need to go.
"Thou shalt review thy logs!"
While it wasn't exactly on Moses's tablets, it's a commandment present in just about every IT standard, audit methodology and federal regulation an IT outfit has to document it has followed. Ticking off that particular checkbox on regulatory compliance forms forces IT to acknowledge that its systems and applications are generating event logs, that it is saving that data, and that it is reviewing it on an ongoing basis.
How we tested SIEM products
In reality, most IT personnel do turn to their logs at some point in time — usually after something bad has happened. But monitoring them 24/7? All entries? Every minute of every day of every week? Um…no. Unless of course you've deployed a Security Information and Event Management (SIEM) platform. In that case, ticking off the "yes" checkbox might be a little closer to the truth. SIEM platforms help get logging and event data from distributed points A, B and C to a centralized point C, help store it, monitor it, report on it, purge it when the time comes, and ultimately — so the pitch goes — provide the situational awareness necessary to effectively manage IT operational risk.
But do they deliver?
In a word: somewhat. It's a crowded market full of players that make many promises. Unfortunately, none of them completely deliver the whole package at this point in time. We currently track more than a dozen vendors that lay claim in the SIEM space and we invited a subset of them to participate in our test. CheckPoint, eIQ Networks, High Tower, Q1 Labs, NetIQ and TriGeo all agreed to participate, while ArcSight, Cisco and RSA all declined for a multitude of reasons. (Compare products.)
We deployed all of the products in a live, production environment and ran them over the course of several months. We were both impressed by the depth of features that some of these tools have and frustrated by how far they still need to go. User interfaces were clunky, reports were incomplete, data parsing problems are still around, and when it came to trying to figure out what the heck was going on in our Windows environment, most products left us scratching our heads. (One could argue, however, that this is as much Microsoft's fault as anyone else's.)
We found the products from Q1 Labs, High Tower and TriGeo to consistently be the most useful. In the end Q1 Labs' QRadar just barely came out on top. While its user interface could still use some work, it is the Swiss Army knife of SIEM tools we tested. It performed all of its tasks required by our testing reasonably well.
With that nod to the top scoring product, truth be told, if we could take High Tower's user interface, combine it with NetIQ's event manager grid tool, grab TriGeo's integration with Splunk for log aggregation, and pull in Q1 Labs' correlation engine, we would then have one heck of a product. In their current form, however, these products still show much room for improvement.
However, selecting the right SIEM product is almost entirely based on the use cases an organization is trying to fulfill. For example, if you're a midsize business without a dedicated team of security analysts, your needs and cost sensitivity will vary greatly from that of a large multi-national firm. You will most likely require a healthy amount of out-of-the-box functionality while heavy customization is probably not on the agenda.
Likewise, if your primary reason for deploying a SIEM tool is so you can click that "yes I review my logs" audit checkbox and you aren't looking at spending a lot of time on ticketing, workflow and advanced correlation logic, your needs aren't going to match that of a full-featured Security Operations Center (SOC). Some organizations might require a ticketing and workflow system to cut and paste event data into an incident "package," where others might simply need reports that show a set of metrics and pretty graphs. Perhaps the day will come when data storage, user interface, monitoring, event-reduction, ticketing, visualization and reporting mechanisms are all relatively comprehensive, but today the products remain heavily varied in coverage for those features.
If you're a small to midsize business it's hard to beat the easy deployment, easy to use, simply priced and feature-rich products from TriGeo and High Tower. TriGeo has a better adhoc query mechanism where High Tower's well-designed user interface makes using it a more enjoyable experience overall.
NetIQ's Security Manager will be attractive to larger customers that already use NetIQ's AppManager product on the IT operations side of the house. Its modular approach allows for both scalability and deployment customization. It is, however, a bit of a beast to deploy. And based on NetIQ's per-server pricing model, the larger your environment, the more you'll pay.
EIQ's SecureVue provides a good set of features for midsize businesses, has visualization components that are actually useful, and offers a very helpful ability to gather device configuration information for change control monitoring purposes. Its user interface, event reduction capabilities and reporting features could all use some work, however, and it's really expensive.
CheckPoint's offering is a relatively new entrant to the space and will undoubtedly make the short-list for existing CheckPoint customers, but lacks some of the features like asset weighting and cross-device reporting that can be found in the more mature products.
An evolving space
Tools in the SIEM space are not new by IT standards. Basic log parsing and alerting have existed for decades, and what many consider the first commercial SIEM products from companies such as NetForensics, Intellitactics and eSecurity (now Novell) came to market in the late 1990s. However, even a decade later the products are by no means fully mature.
Looking at the history of these products, you'll find that many started with very few components. Some had reporting engines without any real-time user interface. Others had real-time user interfaces but didn't have event reduction engines. Still more supported only firewalls and IDSs while others homed in on operating system-centric events.
Today the products have evolved to include common components regardless of the product's heritage. Those components include a data acquisition mechanism, a data storage and archiving system, an event parsing and normalization mechanism, a reporting mechanism, a query mechanism, and usually some sort of real-time analysis module. That list completed, our testing showed that the maturity of these modules across products varied greatly, and the forewarned buyer will give some thought to what features will be most important to their organization. (See related story.)
Starting with the data acquisition mechanism, all of the products provide (at a minimum) a syslog listener to receive incoming events. However, the maturity of the syslog listeners and the accompanying mechanisms that parse incoming event streams vary widely. For example, NetIQ's product is both inflexible in how it receives Windows events and its mechanisms for gathering syslog data are woefully green. We had to put the NetIQ listener for Cisco ASA devices on one system and another NetIQ listener for our Snort IDS on a different system because the listener couldn't handle data streams from multiple device types. This approach would create a nightmare if the network comprises dozens of syslog-based device types. NetIQ says it's addressing this shortcoming in its next revision due out later this year.
By comparison, the more mature listeners and parsers from CheckPoint, High Tower and Q1 Labs allow you to simply point your device – any device – to the appliance and the SIEM platform will automatically accept the feed, identify the format, and figure out which event came from which device of which type (for example, a syslog-based event from a Cisco ASA firewall vs. a Linux host). This is extremely helpful if you happen to have a centralized syslog implementation already in place as you can then "relay" all inbound syslog messages with something like the syslog-ng (Syslog Next-Generation) "spoof source" configuration directive. But even if you don't have a centralized syslog implementation in place being able to point all devices to a single syslog destination helps make device deployment simple.
Other data acquisition features of these products include support for protocols such as CheckPoint's OPSEC LEA, database scraping mechanisms for products from established security vendors such as ISS and McAfee, and proprietary agents that can run on hosts to acquire non-syslog based event data like that found in vulnerability scanner data and Windows event logs. The products from Q1 Labs and eIQ supported the widest assortment of security devices and platforms out of the box but organizations will want to gather their own compatibility requirements when compiling their SIEM evaluation short-lists.
All SIEM products we tested also offer a mechanism for data storage. Most have a general purpose relational database like Microsoft SQL Server or Oracle under the hood, but there's a growing trend toward using simplified, proprietary databases for large volume event storage. The compelling argument is that one doesn't need all of the features of a modern relational database, so it's best to go slim and purpose built for performance. Q1 Labs, for example, uses a proprietary database whereas High Tower uses an embedded MySQL deployment and NetIQ uses a combination of MS SQL and flat-files. We suspect that the proprietary approach will most likely win out in larger-scale deployments, but time will tell.
Another data storage issue to consider is size. Just a few years ago, packing a terabyte or more onto a single appliance was a substantial challenge, but with increased in average processor power, decreased storage costs, and the move to optimized databases, terabyte data stores are more commonplace in the SIEM world. While our testing did not push any of the databases past 512GB, it was evident that the more data you put into these things, the longer your query times could be. But query times were also product-specific. For example, some of our adhoc queries using High Tower's product to search for text strings took minutes to return where the Q1 Labs product was a lot snappier. High Tower acknowledged that this is a known issue and claims it is addressing it this summer.
With modern hardware most organizations won't see a huge performance problem under 1TB of data. However, for those organizations lured by the temptations of large data sets or complex query loads, we highly suggest embarking on performance tests that are more advanced then what we've done in this test, and ideally perform them before making any purchasing commitments.
In addition to storing normalized events in the database back-end, most products also offer the option of storing a copy of unmodified log entries. Some vendors even offer a hashing mechanism to help with evidence admissibility concerns. We've been unable to find an example case where log-based evidence was outright rejected based solely on the lack of a hash (none of the vendors we asked were able to cite a case, either), but it is a feature some organizations are still talking about as a possible SIEM requirement. Given all the other feature areas that are critical to meeting organizations' needs, this was not a top priority in our testing, but we aren't lawyers so you'll want to consult legal counsel regarding how to tackle the issue.
Finally, almost all of the products have a reporting and analysis engine of some sort – ranging from the extremely basic reports found in the High Tower product to the custom reporting engine in Q1 Labs to the more unique features found in TriGeo's integration of the log aggregation tool Splunk and the business intelligence analysis tool QlikView. We delve much deeper into reporting and the other functions that go hand and hand with it – forensics tools and real-time analysis measures -- later in this article.
The reality of deployments
Installation concerns can be a potential headache with just about any technology, but there are a few scenarios that really compound that pain in the SIEM world. The delivery model (software vs. appliance) does make a difference. Our High Tower deployment – an appliance - came up in about 20 minutes. High Tower had the simplest installation process with Q1 Labs, TriGeo and eIQ not far behind. The appliance delivery model is probably the way to go for most SIEM deployments, and all of the products we tested with the exception of NetIQ's come with an appliance option. Because of its reliance on a healthy amount of software that must be pre-installed, it shouldn't come as a surprise that NetIQ takes the cake for the most overbearing installation process; it'll take a day just getting a dizzying array of Microsoft components up and going before you even start in the NetIQ installation.
But there's another critical installation angle: getting the SIEM platform to accept and properly parse the output from devices in your environment. For our testing we used just shy of 30 devices (see How we did it). While this number may not sound very big (and it certainly isn't for a large enterprise) if you have to configure every device entry within a SIEM product, trust us, it's larger then you'd like it to be.
Products from Q1 Labs and High Tower really shine on the device provisioning front as their platforms sport an auto-identification mechanism that simply auto-detects events coming from new devices, make an educated guess on device type based on parsing the inbound data, and then, only prompts the administrator to confirm the finding when they have time. While we had to make some modifications from time-to-time, we found the products' informed guesses to be correct most of the time. By comparison, in NetIQ's Security Manager we had to setup every single device. Painful for us in our small environment, but it would be excruciatingly brutal for anything larger, and possibly unacceptable for many enterprise-class environments.
Support for specific devices is clearly a big deal in the SIEM world. If you want to receive, store and intelligently monitor security events from a wide range of devices and commercial software packages, your SIEM platform has to understand the output of all the sending devices. For example, if you deploy a remote access product and the SIEM doesn't understand the log format you won't be able to easily correlate or report on user authentication attempts. Fortunately in 2008 most firewall, IDS, IPS, and general networking devices are supported in most SIEM platforms. What gets a bit uglier is comprehensive support for operating systems and more specialized applications.
For example, Q1 Labs and NetIQ did a better job on the Windows front than most, but unfortunately all of the products we tested were confused at some point by the obnoxious range of Active Directory authentication events. Linux support was a bit better but still far from comprehensive. The lack of coverage isn't a show-stopper, but it is annoying and it does demonstrate the general immaturity of the SIEM space.
Another consideration when it comes to device support is the ability to monitor the events from custom applications. The push for SIEM systems to interface with applications that are less IT-security specific – take home-grown banking applications as an example -- will only continue to rise. If this prediction pans out, then two elements become more critical: first, flexible agents that can perform tasks such as scraping events from local logs and databases, and, second, extensible parsing mechanisms that can be customized for proprietary applications. Fortunately most of the products we tested are headed in this direction with NetIQ, CheckPoint and High Tower all shipping parsing development toolkits. This is a substantial change from just two years ago when one couldn't even see the parsing logic, much less use a toolkit to edit it.
We put parsing modifications to the test with the help of NetIQ when we modified one of the NetIQ Cisco ASA parsers to address a log parsing problem caused by a recent Cisco software update. The process is doable but keeping your regular expression skills sharp is highly advisable if you're going down the path of customizing any SIEM product; every product we tested relied on regular expressions for their parsing logic.
When it comes to transporting data, historically many security practitioners have taken a black-and-white stance in the agent vs. agentless debate. On one side of the argument, no one wants to deploy and maintain yet another agent in their computing environment. On the other side of the argument, agents can provide features such as bandwidth throttling, batch transfers and an improvement over the laughably insecure transport method of syslog over UDP.
It's no longer an either/or issue (even if some vendor documents suggest the black-and-white view isn't dead yet); in some scenarios you won't want an agent, and in others, you might. For example, in our deployment we had an office in India that we wanted to tie into our logging infrastructure. We didn't have a ton of log data we needed to backhaul to North America, but had that been the case we would have wanted to either throttle or batch-transfer logs during local off-hours. Q1 Labs, NetIQ, and TriGeo are shipping agents and High Tower has one in the works, but we wouldn't consider any of them as fully functioning as they should be. If buying an SIEM within the next year, selecting the vendors that are expanding functionality in this area would be a wise move.
The smarts in SIEM
Three of the most critical subsystems of SIEM platforms are the reporting engine, the forensics and investigation system, and the real-time analysis or "monitoring" component. The use cases often seen that drive the majority of SIEM product selections typically involve at least one of these subsystems which is why they were at the center of our testing efforts.
On the analysis and monitoring front, the two areas we focused on were methods of event reduction and user-interface usability of the data being filtered up to you. Event reduction is what most of the SIEM vendors tout as their initial value proposition. Anyone who has centralized and then attempted to review their event logs will tell you it's impossible to accomplish the task without some technology to separate the wheat from the chaff.
Even at rates of five to 10 events per second – which is quite low by enterprise standards - you're looking at numbers exceeding 400,000 events per day, a load that will crush even the most battle hardened of security geeks. Ultimately a core function of SIEM is to watch over all of this data and provide the answer to the simplest of questions: what are the few important things that require a deeper look within the event log?
"Correlation" has long been the buzzword used around event reduction, and all of the products we tested contained a correlation engine of some sort. The engines vary in complexity, but they all allow for basic comparisons: if the engine sees A and also sees B or C, then it will go do X. Otherwise, file the event away in storage and move onto the next. We'd love to see someone attack the event reduction challenge with something creative like Bayesian filtering, but for now correlation-based event reduction appears to be the de facto standard.
But even on the correlation front not all engines have been created equal. Q1 Labs' product have the most powerful correlation language that was still intuitive to use. Q1 Labs employs a "building block" model that allows you to layer pieces of logic that are essentially written in English and assemble them into an alert rule. For example, one of the use cases we tackled was the monitoring of login attempts from foreign countries. We wanted to keep a particularly close watch on successful logins from countries in which we don't normally have employees in. To do this, there are a few things that had to be in place: We had to have authentication logs from the majority of systems that would receive external logins (IPsec and SSL VPN concentrators, Web sites, any externally exposed *NIX systems); we had to have the ability to extract usernames and IP addresses from these logs; and, we had to have the ability to map an IP address to a country code. Not rocket science to do without a SIEM, but not entirely trivial, either.
Q1 Labs' QRadar had all of the functionality to do this, and we were able to build a multi-staged rule that essentially said, "If you see a successful login event from any devices whose IP address does not originate from one of the following countries, generate an alert". Because of the normalization and categorization that occurs as events flow into the SIEM, it's possible to specify "successful login event" without getting into the nuances of Linux, Windows, IIS, VPN concentrators. This is the convenience that SIEM can offer.
Most modern SIEM products also ship with at least a minimum set of bundled correlated rules, too. For example, when we brought a new Snort IDS box online, there was a deluge of alerts, the majority of them considered false-positives. Because of useful reduction logic, there was only one alert out of 6,000 that actually appeared on our console across all of the products tested. That alert was based on a predefined correlation rule that looked for a combination of "attack" activity and a successful set of logins within a set period of time.
Our Q1 Labs dashboard claimed an ongoing data reduction ratio of 500,000:1 which sounds about right to us, but an apples-to-apples reduction comparison during our test was not possible because of the variances in supported devices across all platforms. Regardless of correlation method, the "alert vs. event" approach is essential for all products because it allowed us to look for SIEM-generated alerts (the wheat) and only dig into the raw event data (the chaff) when we needed to.
If you're an Apple fan or have any appreciation for good user interface design you're going to be disappointed by every product we tested. High Tower's product was - by far - the simplest and easiest to use: the user interface is laid out relatively cleanly and intuitively, the navigation bar allows you to hop between the logical functional groupings: Incidents, Cases, Assets, Rules, Reports and Administration, and the interface reacts the way you'd expect it to. One user interface, one tool and the designers adhered to basic user interface design principles such as anticipation and coherence (among others).
Unfortunately it went downhill from there: CheckPoint and NetIQ force you to use multiple applications, Q1 Labs' interface was useable only once you got familiar with it, TriGeo's interface was acceptable but not stellar, and eIQ violated a healthy share of basic design principles.
After using these products for a few months we found areas that were slightly annoying at the first go, only grew more aggravating over time. For example, if when performing basic searches on things such as usernames and IP addresses you have to walk through at least six menus (NetIQ), after a few dozen times of repeating meaningless steps it's hard not to get frustrated.
In comparison, if you're in the middle of creating an incident ticket and you can add a piece of event data with nothing more than a right click (High Tower) you're going to really appreciate that convenience. The bottom line is that if you're going to use a component in the products for any reasonable amount of time you'll want an intuitive, easy to use user interface that doesn't frustrate you.
On the reporting front, the two most common reporting scenarios we've seen typically involve scheduled reports that are relatively static and adhoc queries that are typically event driven or used in forensic situations. Static reports often include top-10 lists, incident summaries, items that exceed normal environment thresholds, general user access reports, and pieces of compliance checklists, to name a few. Adhoc queries are typically driven by an investigative action: looking for where user X logged in from, how many places we've seen IP address X, or taking a look at all login activity in a certain region over a certain period of time.
On the static reporting front NetIQ and TriGeo delivered the most polished reports, most likely because of wise choices in the reporting technology (Microsoft SQL Reporting Services and QlikView and Splunk, respectively). Q1 Labs had the easiest to use report designer (although the reports didn't look as good), CheckPoint's were sufficient, with High Tower and eIQ holding up the rear. All of the products came with sample reports and provide the user the ability to design custom ones, and all of the products had the ability to schedule reports and deliver them via e-mail.
We used the adhoc querying mechanisms primarily to investigate suspicious activity triggered by correlated alerts, although we could certainly see how it would be useful for other efforts ranging from full investigations to basic troubleshooting. In most cases the adhoc reporting mechanisms were sufficient, and in one case above average: TriGeo (via the Splunk technology mentioned above) made searching for events easy to do, and also delivered some good reporting functionality. Q1 Labs was a close second.
A look ahead
When looking at the future of SIEM products, take into account the major changes that lie ahead for enterprise security management.
For starters, some of the pricing models are obnoxious given the value (or lack thereof) that these systems deliver. Few enterprises in this tight economy are going to be able to swallow the large numbers required for sizeable deployments.
On the reasonable side of the pricing spectrum are High Tower and Q1 Labs, which have gone the simple route by selling appliances that are based on nothing more than the approximate event-per-second (eps) loads that the SIEM platform will be experiencing; if you're looking at approximately X eps you buy appliance model A, if you're looking at Y eps you should look at appliance model B. Prices start at $18,000 and $19,000 respectively.
But other companies have gone more complex (read potentially more expensive) routes. NetIQ and CheckPoint, for example, charge "per node," although what constitutes a node varies. TriGeo and eIQ charge for both the appliances and the number of nodes. While there are advantages to NetIQ's model (such as very low entry point) that doesn't include all of the required hardware, operating system and software costs (Microsoft Windows and SQL Server Enterprise). Ultimately either the prices for large deployments will need to come down or functionality will have to substantially advance, or perhaps a little of both.
What part of IT wants SIEM?
But some higher-level organizational changes are clearly ahead, too. Ones that may greatly affect which part of IT buys into the SIEM prospects.
As the roles within information security teams continue to evolve and many operational security functions settle into operational IT, there's little doubt that the demarcation point between traditional Network Operations Centers and its younger sibling - the SOCs - will eventually disappear. Time will tell which security functions will remain outside of operational IT, but few can debate that security can (and does) influence the classic IT tenets of performance and availability, and the eventual merger of the two is inevitable. This collision course has some potentially interesting ramifications, however. For starters, do classic network monitoring platforms start to include adequate security contexts, or do security platforms start including classic performance and availability monitoring?
We saw early signs of this collision years ago when products such as Tivoli attempted to provide security add-ons to an already established monitoring platform. It wasn't pretty then and historically the classic network monitoring platforms didn't have either the security "smarts" or the necessary capacity to address Infosec's differing problem-set, which is why few people remember failed attempts like the early versions of the Tivoli Risk Manager.
NetIQ is probably in the best position to gain ground here. Its App Manager suite is a well-established player in the IT operational realm and even though we had our complaints about Security Manager, it's certainly on the path to being a contender in this product space. NetIQ already has a module for basic Security Manager and App Manager integration, but we unfortunately didn't get an opportunity to test it.
Another interesting dynamic is that the correlation engines found in some of the more advanced SIEM products can be re-purposed to tackle more business-facing security challenges.
For example, Joe Mcgee, the CTO from information security service provider Vigilant, has started customizing commercial SIEM products to help tackle fraudulent transactions in the online banking world. By doing some advanced mapping of custom applications, tracking user profiles, user behavior and items such as login locations, his company has been able to help clients reduce online fraud numbers. It's one thing to isolate that one IDS alert that matches a piece of vulnerability data, quite another to stop a fraudulent transaction before the company experiences an actual loss. The former provides a smidgen less of a headache for the average security analyst while the latter provides some tangible business value that even people outside of IT will understand. It's a no-brainer where decision makers should be looking to focus their energy. The more business-facing application of SIEM technology is a win for everyone; security, operational IT, vendors, and businesses all benefit. The question becomes, how soon can we get there?
After all of our installations, device provisioning, troubleshooting, struggling with expiring licensing keys, rule creations, customizations and late nights asking ourselves "where the heck did that come from?" one realization rose above everything else: not using technology to monitor event and log data is a bad risk management practice in 2008. But a close second was the realization that until the products become maturer across the board your decision really should be use-case based: know what features you require first, second and third, pilot before you buy, and know that this space is still maturing…and will continue to do so for at least the next few years.
Shipley is the CTO of Neohapsis, an information risk management consulting firm. He would like to thank Apneet Jolly and Leigh Hollowell from DePaul University for their assistance during the testing. Shipley can be reached at firstname.lastname@example.org.
Learn more about this topic
Google executive explains how the company attempts to avoid downtime using an innovative method.
A look at some of the coolest bits of Chrome experimentation out there, in honor of Google’s 1000th...
You can use the CuBox-i4Pro as an Android machine, a general purpose Linux host with or without...
Sponsored by AT&T
Sponsored by Brocade
Plans call for moveable lightweight structures and translucent canopies
University of Cambridge's recent data center consolidation aims to reduce the university's carbon...
Trying out Windows 10 and want to get more out of it? Try out these top five tips and secrets for the...
Google executive explains how the company attempts to avoid downtime using an innovative method.