Machine learning sets new standard in network-based application whitelisting

Using machine learning to automate the current manual process used to gain application visibility brings network visibility and analytics to a new level. By harnessing the power of the semantic Web through machine learning, emerging solutions let you act faster and proactively defend against cyber threats.  

The timing is good, because new applications are introduced at incredible rates: More than 300 applications are created each day, and the Apple AppStore adds some 40,000 apps per month.  While everyone benefits from the application diversity, they prove to be a cyber security challenge.

Security teams must dedicate skilled analysts to monitor and analyze their organizations’ network activity to determine if the applications are white-listed or authorized. Unauthorized applications must be blocked from entering the enterprise network, since they might consume bandwidth, lower productivity, compromise critical data or pose potential security threats.

Even if the impact of the applications is minimal, it is critical that security organization have complete visibility. Security organizations that employ static methods (that is, manual reverse engineering) struggle to keep pace with identifying and classifying new applications, especially given the rate at which new applications are being introduced. They must use machine learning to discover applications, extract signatures and create white lists. Security analysts can then have the incisive intelligence necessary to act early, without using costly and scarce resources.  

Manual reverse engineering: DPI's Achilles' heel

While deep packet inspection (DPI) solutions have historically held a place in the cyber security solution landscape, a reliance on analysts to manually reverse-engineer applications is quickly becoming the DPI Achilles' heel. DPI solutions provide visibility and enforcement of traffic policies on traffic (flow) for which a packet payload signature exists. In a recent study of a Fortune 50 gateway, DPIs were able to provide detailed visibility into 19% of network traffic, coarse visibility into 64% of the traffic captured (i.e. HTTP), and 17% of the traffic was classified as unknown.

So when faced with unidentified traffic, DPI solutions must employ manual reverse engineering, which requires weeks of investigative work by skilled analysts to identify and generate a signature from a new application, and then appropriately classify it. In the meantime, the unidentified application continues to run on the network, compromising security and operational efficiency.

Innovation Changes the Game

Fortunately, machine learning has emerged to enable automated discovery of all enterprise application, signature extraction and white list generation. Machine learning provides the ability to add context to Internet traffic based on a superior understanding of relationships among data. And machine learning can accomplish this without the use of highly skilled and paid analysts.

Of course, application signature detection, decoding and classification using machine learning is not as simple as replacing humans with machines. The machines must be powered with advanced analytics critical to executing the auto-discovery aimed at identifying the nature of any unknown traffic on the network at any instant. The network traffic might be unknown for a variety of reasons, including:

• A never-before-seen network protocol or user application

• An evolution of a known protocol or application

• The emergence of new internal modes of a known application

By automatically and incrementally learning those signatures, their nature, and their associated evolution over time using advanced analytics, machine learning-based solutions provide security organizations with a clear understanding of the five “Ws”:

• "What" (type of application)

• "Why" (purpose of the application)

• "Who" (the application owner/users)

• "Where" (network addresses involved with these applications)

• "When" (point at which new control policies are required to be enforced)  

Breaking it Down: The Operational Life Cycle

With this background in mind, we can walk through a machine learning-based application discovery, signature extraction and white list generation solution's operational life cycle.

The process must begin by focusing on all the network traffic sessions for which security analysts want better visibility. Next, that traffic must be grouped cohesively to extract accurate signatures. Because the traffic seen on the wire does not lend itself to grouping based on protocols and applications (since they might not be known yet), multiple levels of filtering and clustering are required to create cohesive groups that can be used to generate reliable signatures.

Guaranteeing a high level of cohesiveness translates to more precise and reliable signatures. Once the solution processes each group independently using advanced statistical algorithms, it can extract precise signatures and their corresponding protocol/application labels (that is, names).

If validation is necessary, once these signatures (which now become a proxy for the identity of the protocol or the application) are known, the security analyst can assess, offline, the validity of the extracted signatures, apply any modifications if required, and run batteries of coverage and collision tests. When the security analyst is satisfied with the outcome, the signature can then be approved, and associated control policies can be exported to the DPI system in place.

Now that the signatures are known and the protocol or the application has been identified, the organization can use the information in a variety of ways — to set policy, gain visibility or prevent user access to those applications. The result is a more efficient and secure network, unaffected by the risks unauthorized applications can cause.

In order to maintain the integrity of networks, without limiting productivity or adding operational expenses, administrators and security analysts need a replacement for the formerly days-long manual reverse-engineering process. The only option that can meet the security threshold — while maintaining the flexibility and productivity afforded by modern-day devices and applications — is the use of machine learning to automate application discovery, signature extraction and white list generation. This approach delivers the visibility, context and control required to keep networks secure.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10