Why antivirus software is slow

Even a bad AV technology can be valuable, because protection against, say, 30 percent of all threats is still a lot better than protection against 0 percent of all threats.

However, besides the lousy protection, there's still plenty not to like about old-school AV technology.

The average person may not know whether AV software really protects her or not, but she generally knows that it is slow. This is certainly the most common complaint I hear about the technology from average consumers.

So why is most AV so slow?

Let's start by looking at the time people notice it most - when their computers are starting up.

Yes, any software that's going to protect you proactively needs to load up when the computer starts, and that could take a bit of time.

But AV products seem to feel the need to check the files on your computer for signs of bad stuff, and that is often what takes up the time.

The idea behind scanning your computer for bad stuff on bootup is that there might be things on your machine that have been newly determined as bad.

So, maybe there's a screensaver you downloaded a week ago, but your AV company just decided today that it is bad.

Or, in some cases, you might have got bad stuff on the computer when the AV software wasn't running.

For instance, you might have a dual-boot machine, meaning you have a second operating system on the machine that can write to the same disk drive. Maybe you run Windows and Linux, and downloaded some Windows virus while running Linux (where you're unlikely to be running AV).

The typical thing for AV software to do is to look at each file on your filesystem, determining whether or not it's bad. With most AV software, that process of judging a single file is stupidly inefficient.

For instance, many vendors rely heavily on a technique called cryptographic signature matching, but do so in an unintelligent way.

Antivirus software reviews

First, let's look at what cryptographic signature matching is. AV vendors would like to do exact matching and say, "This file we're looking at is an exact digital copy of this bad file we saw yesterday."

However, they don't want to have to put every piece of malware ever seen on customers' computers - that would take up too much space and would put even more ammunition in the hands of the bad guys.

Instead, they use some cryptography that takes the file as an input and spits out a number that is a fixed size.

The interesting thing is that the number that comes out appears to be purely random, but every time they enter the same input, the same output pops out.

The numbers that pop out of this algorithm are big numbers - so big that they won't ever see two different inputs that give the same output.

This algorithm lets AV vendors say, "If a file's cryptographic signature is 267,947,292,070,674,700,781,823,225,417,604,638,969, it is bad."

Now, they just have to store this number, not the whole file. The bad guy might like to try to produce bad software that gives the same results as popular good software.

For instance, he might try to produce software giving the same cryptographic signature as some version of Microsoft Word, hoping that it will make it harder for vendors to come up with a signature, because a cryptographic signature would give lots of false positives.

But the cryptography is the special sauce making this impossible. The number that pops out really is about as good as random, so the most plausible thing a bad guy could do here is write lots of new malware until one finally gives the same result as some legitimate file.

And, as you might guess, it would take too many tries to be practical, even if all of the bad guys in the world got together to work on the problem.

Now that we understand cryptographic signatures, let's look at how AV vendors can apply them to this problem.

What they'd like to do when looking at a file is determine its cryptographic signature, then look up the signature in a database to see if it's bad. And hopefully a database lookup will be blazingly fast.

In fact, there are well-known algorithms where this kind of lookup should indeed be essentially instantaneous. The lookup should be a heck of a lot faster than calculating the cryptographic signature.

Let's assume for the moment that that's what actually happens (often it is not). How long does it take to calculate a cryptographic signature?

Well, the cost is dominated by the amount of time it takes to read the file off your hard drive. Everything else that happens is almost irrelevant.

The fastest hard drives today can read about 125MB per second. If your AV software is going to scan, say, 40GB of files, you are going to spend at least 5 minutes of physical time waiting while the disk is busy feeding data to the AV system, in an absolutely ideal world.

In the meantime, when other programs try to access the disk, everything slows down.

Your other applications wait for a pause in the AV workload, and then there's a performance hit when the disk has to move around for the various applications.

If you're doing a whole system scan where you have to do a cryptographic signature of every file, the net result is that you can expect things to go very slowly.

But, for some AV systems, the story is much worse because there's a lot of additional work for every single file that gets scanned. Instead of just being able to ask, "Now that I processed this file, is its signature in the database?" and get an immediate answer, what typically happens is something more like this:

I just processed a file.

Its signature is

267,947,292,070,674,700,781,823,225,417,604,638,969.

Let's call that signature S.

Is S equal to 221,813,778,319,841,458,802,559,260,686,979,204,948?

If so, the file is malware.

Is S equal to 251,101,867,517,644,804,202,829,601,749,226,265,414?

If so, the file is malware.

Is S equal to 311,677,264,076,308,212,862,459,632,720,079,837,243?

If so, the file is malware.

...

Is S equal to 11,701,885,383,227,023,807,765,753,397,431,618,256?

If so, the file is malware.

In one of these bad systems, the question is asked once for every piece of malware that has a cryptographic signature. This approach doesn't scale very well to today's malware problem. Let's see why.

Security software reviews

There are about 10,000 new pieces of malware created each day (most of them are automatically generated from other pieces of malware, to avoid detection).

Let's assume that an AV company can catch them all. Let's also assume that the company has been adding 10,000 signatures a day for only a year. That's 3,650,000 signatures.

If it takes a millionth of a second to process one signature (and it probably will take a few millionths), it would take 3.65 seconds to process all those signatures.

In reality, AV companies have other techniques they prefer to use if they don't have to use cryptographic signatures.

They'd like to be able to capture as many pieces of malware as they can with a single signature, and since they generally won't see all of the 10,000 new pieces of malware a day, they're going to focus their signature writing on the "most important" pieces of malware.

As you'd expect, big companies generally prioritize what their big corporate customers are sending them over stuff they get from smaller companies, and individuals are very likely to be ignored - even the biggest companies have only a few dozen analysts dealing with these kinds of issues at any given time.

With so much malware, cryptographic checksums are a really important technique. It is easy to write one (automated systems in the backend can easily write signatures), and those signatures are easy to eliminate if they turn out to be wrong.

Certainly, if designed properly [for technical people, one should clearly use hash table lookups or a similarly efficient data structure. But many AV systems still use tree-based algorithms, or even linear scans!], cryptographic signatures can improve efficiency. The stupid way in which they tend to be handled is an artifact of the way signatures have been done forever, this notion of one rule following another, following another.

It worked well when there were only tens of thousands of pieces of malware in total, but it doesn't anymore.

AV vendors are starting to shift to smarter ways of dealing with cryptographic signatures.

But even when they do, they still have all the noncryptographic signatures. Again, with a traditional AV engine, vendors hope their regular signatures will capture most of the bad stuff.

So, as there's more bad stuff that avoids AV engines, they'd like to get signatures that will detect lots of pieces of malware, hopefully even stuff that hasn't been created yet.

As long as there's a big focus on traditional signatures for protection, there are going to be many signatures that can take a lot of time to run, even when vendors do a better job with cryptographic signatures.

Another reason why signatures proliferate and performance decreases as malware grows is because AV vendors generally can't easily remove old signatures.

Vendors typically don't keep enough data to determine whether old signatures are unnecessary because of new signatures. Nor do they collect enough information to know when a signature can be removed because the malware it caught doesn't circulate anymore.

That might sound risky, but there is malware that wouldn't even work if you did manage to get it on your machine, just because of the way systems have evolved since the good old days of the DOS operating system.

Internet security suite reviews

How can I speed up Antivirus?

Now that we know a bit more about why AV is a dog, the issue becomes what the end user can do about it.

You can choose your AV product based on raw performance numbers, but performance isn't everything. And most products perform well enough when only doing on-access scans.

It's on-demand scans that people notice most, and I recommend that people turn this feature off.

There's generally no compelling reason to do a scan of your entire system, particularly if it's going to degrade performance.

You might worry that you aren't being protected at all, but AV software is most effective running on access scans, meaning that the AV engine scans files right before you go to use them. Malware can't hurt your system if you don't run it, so who cares if it is lying dormant on your disk?

The only significant benefit of a full system scan is that you can find bad stuff before you accidentally give it to someone else.

However, almost no malware spreads that way these days, and even if it did, one would hope that the person you gave it to was also running some sort of effective host protection.

All in all, I don't think this case is worth slowing down your machine more than necessary.

Also, note that these full system scans usually occur at least once a day - whenever the AV system downloads new signatures.

Though, for most people who leave their computer on all day, this may not have an impact, because it tends to be in the middle of the night.

A lot of these problems stem from the fact that most AV technologies were not built for scale.

See also: Why most Antivirus doesn't work (well)

Extracted from 'The Myths of Security' by John Viega

ISBN 978-0-596-52302-2

Copyright © 2009 John Viega. All rights reserved. Used with permission.

Published by O'Reilly Media Inc.

1005 Gravenstein Highway North, Sebastopol, CA 95472

Antivirus software reviews

Security software reviews

Internet security suite reviews

This story, "Why antivirus software is slow" was originally published by PC Advisor (UK).

From CSO: 7 security mistakes people make with their mobile device
Join the discussion
Be the first to comment on this article. Our Commenting Policies