This column is available in a weekly newsletter called IT Best Practices. Click here to subscribe.
Logs are one of those things that a lot of people take for granted. Every software, device and application generates its own logs, and they are often overlooked until something happens and someone needs to dig into the logs to try to discover a root cause of the issue. Companies that treat logs in this way are missing out on an opportunity to improve their business.
Logs have an interesting property that makes them quite valuable: they are the only common thread across a company's entire technology stack. It doesn't matter if it's network devices, security devices, operating systems or applications—all generate logs. Because of that, and with the proper tools, it's possible to look end-to-end in the infrastructure and the application stack using logs. The result is the ability to see what is happening from node to node, and from process to process.
One company with a log analysis and management tool is Loggly, which has a SaaS offering. Loggly takes customer log data into the cloud to make sense of it and to "reveal what matters" to each customer. What matters might be things like the root cause of an application error, trends that help with capacity planning, notifications about anomalies in systems, insight to security incidents such as a DDoS attack, and more.
Loggly takes streaming log data from whatever sources a company wants to provide. Obviously the more sources, the better the insight that can be derived. Loggly even recommends including logs from ephemeral systems—the servers that get spun up and down as they are needed. When those servers are taken down, they take their logs with them, making it impossible to go back and see what was happening in those systems if the logs aren't captured and preserved.
There are plenty of log management solutions, but Loggly claims it is different because its analysis of customer logs provides highly meaningful information for a wide variety of use cases. Some of the more common use cases include:
- Root cause analysis – This helps answer questions like, Why is performance slow?, Why did our application go down? and "Why is a particular device failing?"
- Error and exception reporting – This enumerates a company's top errors or exceptions that are affecting the business.
- Performance monitoring – A company can learn how its technology stack is performing.
- Transaction and request tracing – This is good for answering questions like, Why did a high-value transaction fail?, and "Which step of a transaction had a problem?"
- Trend analysis and planning – A company can get an understanding of its capacity usage as its business is growing.
- Tracking unusual activity – Anomaly detection capabilities automatically recognize and point out deviations from normal system behavior, helping to answer the question, "Is this activity pattern normal?"
- Security incident investigations – While not specifically a security solution, Loggly can help provide insight to security incidents, for example, to determine the type and source of a DDoS attack.
The goal is to help customers get the answers to these types of questions in just two clicks. Here's the recipe for how that's done:
A Loggly customer streams its log data to the cloud. The volume of data is enormous and the data itself tends to be unstructured and irregular. As the data comes in, Loggly identifies it, parses it and puts it into a semi-structured representation of the data. This is done by a large number of rules built into the Loggly system. The vendor uses standardized formats and serialization techniques called JSON. The customer also can create rules that identify the data and parse it, tag it and put it into some kind of structured format.
Once the data is structured, Loggly presents the customer with a dynamic catalog of their data. This catalog has three levels. The top level is the category level as automatically recognized by Loggly or defined by the customer. The category could be a technology like logs from Java, Apache Web servers, Python, or Cisco. Or it could be something like production operation versus staging operation, or east traffic versus west traffic. However the customer chooses to have the data organized, Loggly coalesces the data into these categories at the highest level.
The next level is the field level. Suppose the category is technology, and the customer wants to view its Apache web logs under the Apache web server category. Loggly will list all of the unique fields that it has seen in the Apache web log. Then the next level down of the data is all of the unique values within a particular field. Say the customer clicks on the status code field. Loggly displays the various Apache web server status codes from the logs. The display might include codes like the 200's, which means everything was fine, and the list will show how many instances of those codes there were. There might be some 404's, file not found errors, and here's how many. And there could be some 503's, internal server error, and here's how many.
Here is a place where Loggly is revealing information that matters. Most people don't think that they have 503's, or internal server errors. But within two clicks – on the category level and then the field level – the customer can see a bird's eye view of everything that this Apache web server has been doing and how it has been behaving. They can immediately see this 503 entry that they probably didn't expect to have. They didn't know to look for it, but Loggly revealed it, and now the customer can investigate the cause of the error.
Loggly also can perform anomaly detection and notification. For example, suppose the customer is monitoring the average response time of a particular application. It's typically a value around 20 milliseconds, but there's a sudden spike to 140 milliseconds. Loggly can detect the anomaly and send an alert for someone to investigate the situation.
According to Loggly, its solution is well-suited to online businesses that operate largely in the cloud; for example, mobile, social, e-commerce, digital advertising, gaming, SaaS, etc. These types of businesses are delivering value over the Internet, and that is the same business model Loggly espouses.
This solution looks to be versatile. While most customers tend to use it for monitoring and troubleshooting, the use cases can get pretty creative. One customer uses Loggly to do search engine optimization by getting insights from its web service logs that it can't get out of Google Analytics. Customers in the gaming industry use the insight they get out of Loggly to optimize the flow of their games and to test how people use certain features of the games.