Initially, I set out to write this blog about the security risks involved with the misperception of numerical data, and the problems with conventional wisdom. However, my internet readings led me slightly off course, in pursuit of understanding some recent malware statistics.
Taking a break from exploit analysis and watching TV, I recently found myself surfing the web for some current security statistics, which is something IT managers should probably do, once in awhile. Typically, reviewing this kind of mundane data is only performed to provide information about various IT trends -common attack vectors, frequent 3rd party exploits-to aid in decision making tasks, such as, security resource allocation. Should we be concentrating more on UTM, end-point security, or extrusion protection?
Although, it is also a good practice for understanding these security statistics.
At some point, I will go on one of my rants regarding, "Information is data that is merely viewed, whereas, knowledge is data that is understood", but for now, I'll only scratch the surface. Do know that, anyone can memorize and regurgitate numbers, or as often mislabeled, statistics, but far fewer can actually understand their meaning, relevance or truth.
This is the point where my train of thought was temporarily derailed. I was about to give some glorious examples, of misrepresentation of security data, from leading industry resources. Then I stumbled upon a few articles from Computer World that didn't quite make sense.
On April 4th, Computer World reported that, "the total number of viruses will reach 1 million by year's end, according to security experts". Then, four days later, Computer World stated that, according to Symantec, malware's million mark was reached in the latter portion of 2007. How does a milestone number like that get overlooked for four months, resulting in a speculative article that is only found to be an editorial error, just four days later?
However, a larger question is: What is the true significance of the 1M barrier breakthrough? Is it, perhaps, just a nice big round number with an extra zero that will make for a good news story? Will any security strategies or mechanisms change, that wouldn't have, say when malware reached 950,000? You probably know the answers.
On the other hand, it is a milestone, or at least a measurement. The real significance is the rate of change, and the dramatic increase in malware. Symantec's, Internet Security Threat Report Volume XIII, just released, gives an in-depth analysis of threat activity for the last six months of 2007. One of the most significant developments revealed, is its observations of malicious code trends. The study highlights the exponential growth of malware last year. With a total of 1.1 million code threats, it reports that 711,912 of them were discovered last year. This would indicate that 64% of all of these threats were from last year alone. Has the internet really become that dangerous, in the last year?
I don't know. But I do know that this type of statistical reporting is good for the security vendor's business. I also know that in the absence of certain variables, that any data set can be skewed to produce favorable results. After reading their report, I had many unanswered questions. They state that all previous reports were based upon, "the number of malicious code reports received from enterprise and home users", and that the current report also examines "malicious code according to potential infections". How does this affect relative data between reports?
When counting malware threats, how many are truly unique? How many are variant strains? And what are the actual criteria used to discriminate between the two? How has this discrimination criteria changed over time, in adaptation to the evolving morphing engines, capable of producing polymorphic and metamorphic malware? What about malware that is now appearing embedded in new devices, such as iPods or USB drives? Is there any relative adjustments based on the growth of US Internet usage (currently about 72% of the population)?
I'm not challenging the work of Symantec by any means. I am trying to get people to question numbers and statistics presented by any one vendor. It is important to question methodologies and inquire about absent variables. Optimally, having access to raw data, allows one to perform their own statistical analyses, and generate specific custom security metrics, if needed.
Although, in the end, I didn't think anyone has to take out their calculator, to mathematically agree, that malware is a growing problem.
I can be statistically analyzed at: firstname.lastname@example.org