Datadog reduces the alert noise for IT support workers

Datadog wants to make it a little less painful for DevOps and operations teams monitoring application performance issues

Datadog reduces the alert noise for IT support workers
TBIT (CC0)

The move to more complex and distributed applications has done wonders for organizational agility and the ability to innovate, but it has also had some flow-on effects for those poor people responsible for managing application uptime on a day to day basis.

Lots of disparate application components means lots of new potential sources of error, and people who carry a pager to be alerted of any issues suffer an increasing number of ill-timed calls.

+ Also on Network World: Application monitoring becomes table stakes in the digital age +

A new offering from application monitoring vendor Datadog seeks to change this paradigm by offering a far more flexible alerting approach. Datadog’s new composite alert feature is intended to reduce alert noise for DevOps and operations teams. The idea being that these practitioners will have less call to spend time on insignificant alerts and will be alerted of orly major issues. In a kind of “boy who cried wolf” metaphor, this should result in better response to issues that matter.

How composite alerts work

Composite alerts allow customers to create customizable combinations of symptoms that often cause major outages, separating signal from noise and accurately identifying major issues within infrastructure and applications. This is a departure from the traditional approach in which alerting is based on singular thresholds for isolated metrics or events, often representing only a symptom of a potentially larger problem. A large portion of these alerts can be inconsequential, requiring intensive manual labor to determine if there is reason for serious concern. With composite alerts, DevOps teams can avoid unnecessary alerts by constraining the conditions that cause an alert to fire in the first place.

In practice, composite alerts can be set for any combination of performance indicators and can add a game-changing nuance to alerting conditions. As an example, an alert may be necessary when the message queues grow too long, but not when a service restarts, which can cause temporary queue growth and trigger false alarms. In this case, a composite alert can be created that triggers only when queue length crosses a threshold and the uptime for the service is greater than 10 minutes. Teams can then disable notifications from the original singular alert to reduce the total number of alerts they receive.

In commenting on the release, and differentiating it from the approach that other vendors take, Datadog put it thusly:

Many companies have tried to solve this problem backwards. Instead of assertive rules on when alerts should be triggered, they take the approach of managing all the fired alerts by doing event aggregation and correlation. These approaches may have their place in monitoring traditional legacy applications and infrastructure, but in modern, dynamic applications, these approaches are very difficult to implemen,t and companies have to dedicate entire teams that play whack-a-mole because of the very high degree of change and agility.

Composite alerts seem to have found favor with beta customers. One customer, Segment, the analytics API and customer data platform, is especially bullish. It says this composite approach has become a crucial part of the way its operations teams work.

In a space that is becoming increasingly competitive, Datadog needs to find avenues to differentiate. Composite alerts, which are available immediately for all Datadog customers, should help it achieve this aim.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10