By now you have probably played with DAVIX, which I introduced two blog posts back. What about the other tools that I introduced in my previous blog post? I mentioned AfterGlow, a tool that you can use to quickly generate link graphs in order to analyze relationships and also Treemap that helps analyze hierarchical data. It's now time to get a bit more formal and look at the process that you would go through to visualize some of your own data.
I like to start with a crisp definition of the problem at hand. Based on the problem I can then go through the information visualization process that you can see in the figure below: The following explains each of the steps in a bit more detail. I know, it might look obvious, but I can tell you, every time I am generating a graph, I am following every single one of these steps. It helps making sure that you get what you really want. So, here we go:
- Define the problem: Make sure that you are really crisp on what the problem is. Being clear on the problem from the beginning on will safe you a lot of headache during the rest of the visualization process.
- Assess available data: Once you know what the problem is, make sure you have the data to solve it. In some cases you will need lookup tables or some additional information that's not already part of the original data. For example, if you are looking to find out where in the world your external network connections are coming from, you will need some geo-lookup table that maps IP addresses to geo locations.
- Filter: The next step is to filter your data. Don't filter too much. Anything that you filter out will not be available anymore. But also don't just leave all the data there. A lot of graphs are not very useful if you have too much information. A tip: When you go through the visualization process for the first time, use a small data set and then in a second round add the rest of the data back in. In some cases I slowly add more and more data until the graph starts looking overloaded.
- Normalize: Now you have to turn your unstructured data into something that you can extract individual fields from. Think of it like you are taking a character stream or a string and you want to store the individual parts in a database. You need to figure out where the sources address or the username is located in the string and then extract it. This is generally done through a parser. Oftentimes I am using awk or perl to do so from scratch.
- Visual transformation: Now you are ready to generate your first graph. Use some graphing tool and visualize the parsed data. The choice of tool heavily depends on your use-case.
- View transformation: The first graph will most likely not be the one that you are happy with. You might want to filter some more of the data, tweak this and that, and probably also start playing with color. You might assign color to specific data points, use red to highlight exceptions, etc. This step will likely involve a few iterations until you are happy with the output.
- Interpret and decide: Once you are happy with the output, you can use it to answer the question that you initially posed. Hopefully you will be able to do so. In some cases you will realize that you have to go back to step one and reassess your problem definition. In other cases you might take your visual output to someone else to help you interpret the output.
I hope this was useful. If you have questions about individual steps of the process, let me know. I would love to hear from you. If you want to read more about the information visualization process, I encourage you to read up on it in Chapter 4 of Applied Security Visualization. Next time we are going to look at how you can generate TSV files to generate your own treemaps.