- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
Page 3 of 4
Generating graphs that are easy to understand and comprehend involves the two important principles of expressiveness and effectiveness. Not following these principles will result in graphs that are either confusing or simply wrong.
Two principles that are known as the Mackinlay criterion7 can be used to further improve legibility and efficiency of graphs. The first principle, Mackinlay’s expressiveness criterion, states the following:
Mackinlay, J., “Automatic Design of Graphical Presentations,” Ph.D. Dissertation, Computer Science Dept., Stanford University, Stanford, California, 1986.
A set of facts is expressible in a visual language if the sentences (i.e., the visualizations) in the language express all the facts in the set of data, and only the facts in the data.
This sounds very theoretical, but let’s look at it. In Figure 1-5, the length of the bars in the graph does not encode facts from the underlying data. It therefore, does not follow the expressiveness criteria. Although this example might look too obvious, keep this principle in mind when designing your own graphs. After you have generated the graph, think hard about what it really communicates.
Figure 1-5 This graph illustrates the violation of Mackinlay’s expressiveness principle. The graph does not encode the facts in the dataset. This data merely needed a tabular presentation.
The second Mackinlay criterion reads as follows:
A visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization.
This ties directly back to the discussions throughout this chapter. By applying all the principles we have discussed so far, we will come up with more effective visualizations, based on Mackinlay’s effectiveness principle.
When creating graphs, you should pay attention to a few simple design guidelines to generate easy to read, efficient, and effective graphs. You should know and understand the following list of graph design principles:
Reduce nondata ink.
Try to apply these principles on your graphs, and notice how they do not just esthetically improve, but also get much simpler to understand.
One of the most powerful lessons that I have learned stems from Edward Tufte. In his book, The Visual Display of Quantitative Information, he talks about the data-ink ratio. The data-ink ratio is defined by the amount of ink that is used to display the data in a graph, divided by the total amount of ink that was used to plot the entire graph. For example, take any bar chart. If the chart uses a bounding box, an excessive number of grid lines, or unnecessary tick marks on the axes, it increases the ink that was used to paint nondata elements in the graph. Three-dimensional bars and background images are some of the worst offenders of this paradigm. Get rid of them. They do not add anything to make a graph more legible and do not help to communicate information more clearly. Reduce nondata ink. It is a simple principle, but it is very powerful. Figure 1-6 shows how a graph can look before and after applying the principles of reducing nondata ink. The right side of the figure shows the same data as on the left side, but in a way that is much more legible.
Figure 1-6 An example illustrating the data to ink-ratio and how reducing the ratio helps improve the legibility of a graph
We briefly touched on the topic of perception in the preceding section. One perceptual principle relates to the number of different attributes used to encode information. If you have to display multiple data dimensions in the same graph, make sure not to exceed five distinct attributes to encode them. For example, if you are using shapes, do not use more than five shapes. If you are using hue (or color), keep the number of distinct colors low. Although the human visual system can identify many different colors, our short-term memory cannot retain more than about eight of them for a simple image.
To reduce search time for viewers of a graph and to help them detect patterns and recognize important pieces of information, a school of psychology called Gestalt theory8 is often consulted. Gestalt principles are a set of visual characteristics. They can be used to highlight data, tie data together, or separate it. The six Gestalt principles are presented in the following list and illustrated in Figure 1-7:
- Contrary to a few visualization books that I have read, Gestalt is not the German word for pattern. Gestalt is hard to translate. It is a word for the silhouette, the form, the body, or the looks of a thing.
Proximity: Objects grouped together in close proximity are perceived as a unit. Based on the location, clusters and outliers can be identified.
Closure: Humans tend to perceive objects that are almost a closed form (such as an interrupted circle) as the full form. If you were to cover this line of text halfway, you would still be able to guess the words. This principle can be used to eliminate bounding boxes around graphs. A lot of charts do not need the bounding box; the human visual system “simulates” it implicitly.
Similarity: Be it color, shape, orientation, or size, we tend to group similar-looking elements together. We can use this principle to encode the same data dimensions across multiple displays. If you are using the color red to encode malicious IP addresses in all of your graphs, there is a connection that the visual system makes automatically.
Continuity: Elements that are aligned are perceived as a unit. Nobody would interpret every little line in a dashed line as its own data element. The individual lines make up a dashed line. We should remember this phenomenon when we draw tables of data. The grid lines are not necessary; just arranging the items is enough.
Enclosure: Enclosing data points with a bounding box, or putting them inside some shape, groups those elements together. We can use this principle to highlight data elements in our graphs.
Connection: Connecting elements groups them together. This is the basis for link graphs. They are a great way to display relationships in data. They make use of the “connection” principle.
Figure 1-7 Illustration of the six Gestalt principles. Each of the six images illustrates one of the Gestalt principles. They show how each of the principles can be used to highlight data, tie data together, and separate it.
A piece of advice for generating graphical displays is to emphasize exceptions. For example, use the color red to highlight important or exceptional areas in your graphs. By following this advice, you will refrain from overusing visual attributes that overload graphs. Stick to the basics, and make sure your graphs communicate what you want them to communicate.
Figure 1-8 This bar chart illustrates the principle of highlighting exceptions. The risk in the sales department is the highest, and this is the only bar that is colored.
A powerful method of showing and highlighting important data in a graph is to compare graphs. Instead of just showing the graph with the data to be analyzed, also show a graph that shows “normal” behavior or shows the same data, but from a different time (see Figure 1-9). The viewer can then compare the two graphs to immediately identify anomalies, exceptions, or simply differences.
Graphs without legends or graphs without axis labels or units are not very useful. The only time when this is acceptable is when you want the viewer to qualitatively understand the data and the exact units of measure or the exact data is not important. Even in those cases, however, a little bit of text is needed to convey what data is visualized and what the viewer is looking at. In some cases, the annotations can come in the form of a figure caption or a text bubble in the graph (see Figure 1-10). Annotate as much as needed, but not more. You do not want the graphs to be overloaded with annotations that distract from the real data.
Figure 1-9 Two bar charts. The left chart shows normal behavior. The right side shows a graph of current data. Comparing the two graphs shows immediately that the current data does not look normal.
Figure 1-10 The left side bar chart does not contain any annotations. It is impossible for a user to know what the data represents. The right side uses axis labels, as well as text to annotate the outlier in the chart.
Whenever possible, make sure that the graphs do not only show that something is wrong or that there seems to be an “exception.” Make sure that the viewers have a way to identify the root cause through the graph. This is not always possible in a single graph. In those cases, it might make sense to show a second graph that can be used to identify the root cause. This principle helps you to utilize graphs to make decisions and act upon findings (see Figure 1-11). A lot of visualizations are great about identifying interesting areas in graphs and help identify outliers but they do not help to take action. Have you ever asked yourself, “So what?” This is generally the case for graphs where root causes are not shown.
Figure 1-11 This chart illustrates how causality can be shown in a chart. The number of servers failing per month is related to the temperature in the datacenter.
By applying all the previously discussed principles, you will generate not just visually pleasing graphs and data visualizations, but also ones that are simple to read and ones that communicate information effectively.