Suzanne Widup, MSIA graduated with honors from the MSIA program at Norwich University in 2007. The remainder of this article is entirely her own work with minor edits.
* * *
In 2007, I was asked to develop a new information security metric for a research class – a metric that would quantify a risk factor that was either difficult to measure, or had not been adequately studied. Given that many of the variables in information security are challenging to measure – such as brand impact of a security incident – the goal was to give practitioners better tools for determining risk. With that in mind, I chose to focus on data-breach events. An academic literature search yielded Hasan and Yurick's 2006 paper: A Statistical Analysis of Disclosed Storage Security Breaches, which provided a statistical analysis of breaches between January 2005 and June 2006, and examined 209 incidents from several perspectives, including data and organizational types and breach vectors.
In looking at their paper, I was interested to see if breach vector trending held over a longer period of time. I also wanted to examine the individual incidents in more depth to determine if there was additional information to be gleaned. The results of the study were published in The Leaking Vault - Five Years of Data Breaches. This study is the largest of its kind to date, and covers 2,807 incidents from publicly disclosed sources between 2005 and 2009.
In presenting the findings from the study to various audiences, I have encountered a couple of questions repeatedly. First, "What are these good for?", referring to the statistics that show specific breach vectors broken down by industry or data type. I tell people these analyses help information security practitioners to understand where their greatest risk resides – both in terms of the vector for an incident, and the size of the breach in terms of records lost. The statistics can be used to spot the low hanging fruit of risk, and guide individuals in how to best reduce the likelihood of a breach, or the impact it would have. They can also be used in awareness programs to try to change common behaviors that put the organization's data at risk.
For example, the laptop vector stood out consistently over the course of the study as the incident leader. Laptops are most commonly stolen (95% of the time) rather than lost. They are frequently stolen from vehicles, so one application of the breach statistics would be to educate the employees about not leaving the machines in their vehicles. Another application of the same finding would be to look at data encryption on those laptops so that even if the machine is stolen or lost, it doesn't automatically mean the data are subject to compromise.
Another finding from the study was that while the laptop vector accounted for 49% of all incidents, it was not the loss leader for records disclosed. Since the number of records determines the scope of the breach event, this is an important finding to consider. Laptops accounted for only 6% of the records lost, whereas the hacking vector accounted for 45%, at 327 million records. The application here would be to look at both the perimeter controls and the detective abilities of an organization. The former are geared towards preventing the breach, and defense in depth is an important consideration. The latter are critical to reduce the damage from the breach by discovering and acting to contain it as soon as possible.
Although the impact of a breach is largely dependent on the number of records disclosed, a common problem with individual data breach incident reports is that the organization may not be legally required to disclose that figure. In fact, over the course of the study, 34% of the incidents listed no finite number for records disclosed. These poorly reported data make the true number of records (and thus the number of people whose data are disclosed) significantly underreported. Although the known number of records involved is 721.9 million, a conservative estimate of the additional records that may have been disclosed adds another 7.6 million to that total, based on the median exposed per vector per year.
The second question I have frequently encountered is "Hasn't this material already been published, since the data sources are from publicly disclosed events?" While each individual event has been gleaned from public sources, some of them come to light only through Freedom of Information Act requests by organizations like the Open Security Foundation. Some of the feedback from the study's publication is that the data are not new – they have been disclosed already. Certainly, in some form each of the events has been publicly exposed – but the study of data breaches as a whole is akin to studying a disease's infection vectors. Even though the individual cases may be known, there is much to be learned by exploring trends in the overall phenomena.
In the study, the vector and records disclosed findings are broken out by the type of organization and data, as well as the relationship between the organization and the data subject (such as employee, customer, patient). All of these are examined in the study, and practitioners can look at the findings and apply them to their specific situation. Recommendations are made to address the findings, and to help those who are responsible for assessing risk to put their efforts where they will do the most good. Without a study of the breach incidents as a whole, this type of trending would be impossible.
In the next article, Suzanne Widup explores key findings from her report.
* * *
Suzanne Widup, MSIA, has significant experience in workplace investigation, digital forensics, e-discovery and litigation support. Her background includes 16 years of security and Unix system administration, technical support, and software development. In addition, in what doesn't sound like much spare time, Suzanne is a certified Graduate Gemologist and a Graduate Jeweler, a certified Precious Metal Clay instructor, and the founder of the Yahoo Silk Painting group.