Distil Networks uses device fingerprints to detect malicious web bots

Web applications are subject to click fraud, comment spam, content scraping, and more. Bot detection and mitigation can close these vulnerabilities.

This column is available in a weekly newsletter called IT Best Practices.  Click here to subscribe.  

Who's that coming to your website? Is it friend or foe? Is it a customer wanting to buy your products, or someone or something wanting to steal your web content? Is it a community member that wants to post a relevant comment, or a spammer intent on planting junk links and content in your open comments section? Is it a real person clicking on an ad, or a web bot driving up fraudulent clicks?

Web applications are increasingly being subjected to automated threats such as click fraud, comment spam, content scraping, abusive account creation, and more. These and other illicit or unwanted activities are described in detail in the OWASP Automated Threat Handbook for Web Applications.

This article is about one vendor’s approach to defeat unwanted web traffic, whether it's automated or human-driven. I should point out that there are desirable and highly useful web bots too, such as the web crawlers used by search engines to find and index content, and chat bots that are used to fetch information and bring it into chat rooms where humans meet. Any solutions designed to defeat malicious bots have to allow the good ones to proceed.

In the past few years, web bots have become quite sophisticated. They have gone from being simple scripts running on a single server to distributed advanced bots known as advanced persistent bots, or APBs. They are either advanced, meaning they can load JavaScript, hold onto cookies, or load up external resources, or persistent in that they can randomize their IP address, headers and user agents. They try to get around detection more so than they have before. The majority of bots are now considered advanced or persistent.

According to the web application defense company Distil Networks, 73% of bots have more than one IP for a single attack on a website, and 20% of them use more than a hundred IPs. Traditional tools that are doing IP blocks aren't able to keep up with such diversionary tactics. This is one way that bots are becoming more persistent. Distil Networks also points out that the majority of bots today can load JavaScript. Moreover, bots are now mimicking human behavior by doing things like pausing between page requests and moving the mouse. These tactics make them harder to detect, but they also throw off analytics tools that are used to measure the effectiveness of websites and their content.

Distil uses a combination of techniques to try to detect APBs. The first step is to make sure a web visitor – human or bot – is actually who they say they are. When a browser request comes in, Distil interrogates the headers to see if the visitor is lying about their identity. The request says, "I am Chrome running on a PC." Distil checks to see if the request is running the right JavaScript engine, that the headers are formatted the right way, that the TCP packets make sense for the operating system it claims to have running, that the browser is multi-threaded in the right fashion—basically checking that everything about the browser makes sense.

Using this basic interrogation approach, Distil says it can catch the intercepted proxies that hackers use to automate malicious or unwanted requests. By weeding out the requests that are lying about their true identity, Distil claims it can eliminate 70% to 80% of the threats at the outset. That's the easy part.

Then there are bots that have gotten to be so advanced that they are automating an actual browser. In essence, they are a man in the browser. Distil uses machine learning to distinguish these bots from real visitors.

Fundamentally a bot browsing pattern is going to look different from legitimate traffic. Distil profiles dozens of metrics to reveal anomalies. For example, what time of day are they coming in? Where did they come from? What was the previous site they visited? What was their entry point to your site? How did they navigate through your site? What pages did they go through? By profiling these and other bits of information, Distil has learned that bots end up being either really random or really systematic. The patterns help Distil try to identify what is not real.

Most security solutions today use an access control list (ACL) or other blocking mechanism based on a single IP address. When a bad actor is discovered, the solution blocks their IP address. As mentioned, APBs often use multiple IP addresses. Blocking one of them does little good when the bad guy just changes to a different IP.

In contrast, Distil blocks bad actors based on a digital fingerprint, which is comprised of a browser and the machine it is running on. Even if the bad actor shifts IP, if the request profile is has the same fingerprint, Distil can identify it as bad too. This increases the burden of obfuscation for the bad guys. Once Distil identifies a fingerprint as belonging to a malicious actor, that knowledge gets shared across Distil's customer network to block it for all customers.

Some hackers have advanced beyond attacking web applications and have started to go after the API calls that power the web app, or they've figured out native mobile apps that power the API. The API calls have access to the same database, same infrastructure and the same data that the website does, so hackers just program against the API to get what they want. Distil Networks addresses API security from three different angles: web, server-to-server and mobile APIs. The solution acts as an automatic shield against API hijacking, scraping and abuse.

Automated web bots are used against  companies in so many ways that the use cases are practically unlimited. The sophistication of some of the uses is surprising. For example, there is a financial hedge fund that regularly sends bots to the website of a publicly traded company to gather information about product inventories. By gathering this same information every week, it's possible to make deductions of how well the products are selling. In other words, if there were 100 widgets in inventory last week, and there are only 70 this week, then 30 widgets must have been sold in that timeframe. This provides the hedge fund with approximate sales data that it can use to make decisions about buying or selling the company's stock ahead of its earnings call.

It's stunningly clever for the hedge fund, but it probably leaves the public company feeling like its financial data was stolen, which in a sense, it was. Bot detection and mitigation is the way to close this and many other types of web vulnerabilities.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2016 IDG Communications, Inc.