Training AI to Recognize Cyber Security Threats

Posted by Geoff Stoker on Jan 31, 2019 11:23:42 AM



In 2012, a team of researchers supported US forces in Afghanistan by identifying where insurgents were hiding explosive weapons.[1] Which of course would allow the soldiers to thwart potential attacks before they could happen.  However, the researchers were not your typical intelligence analysts. Rather than manually pouring over reports, they leveraged powerful artificial intelligence algorithms that were designed to model threat actors.

This project, known as “SCARE”, was one of several efforts funded by the Department of Defense in the wake of the Iraq war – in an effort to counter asymmetric threats by using computational means.  The result: a new sub-field of artificial intelligence called “Security Informatics.”

Security Informatics holds a lot of promise for cybersecurity.  Think of how Google’s PageRank algorithm changed search.  The search engine market was a crowded one in the 1990’s.  But Google, not even the first to market, made an outsized impact due to algorithms.

Enter CYR3CON, the company that is the first to commercialize Security Informatics.  Their CEO, Paulo Shakarian, previously led the group applying SCARE to combat operations in Afghanistan.  “They key is to model the threat actor using computational means,” Shakarian explains, “standard AI and machine learning techniques used for simpler tasks like predicting advertising views simply won’t withstand an active adversary.”

Talking to Shakarian, you quickly realize that larger quantities of data, purchasing more threat intel feeds, and continually hiring analysts is not the right way to predict adversary activity.  For example, there have been nearly 30,000 CVE’s (vulnerability disclosures) released in the last two years – exceeding the prior four.  Yet hackers only use 2.6% of vulnerabilities in the exploits they write.  For example, this past fall, a single Microsoft Office vulnerability accounted for 37% of malware downloads.[2]

Yet a 2017 study by MIT Lincoln Labs showed that using data from NIST (i.e. CVSS scoring) and open sources like social media, exploitation in the wild could not be predicted in any useful way.[3] This is where CYR3CON’s “Priority” product steps in.  Validated by multiple peer-reviewed studies, and winning a Defense Innovation Award[4], CYR3CON Priority is the first commercial product that leverages the science of Security Informatics – and it does so in a big way.

Priority has obtained over a 20-fold improvement in precision over any existing approach.  It further rank-orders vulnerabilities by the probability they are used in an attack.  At the high end, it often identifies vulnerabilities that are 3,000% more likely to be exploited than normal.  The solution goes beyond counting the number of discussions, but also considers multiple categories of indicators that include the content of hacker discussions (deepweb, darkweb, social media, certain surface web sites), the reputational and social influence of the hackers, and various types of meta-data (technical vulnerability information, open sources, vendor information, POC source code, etc.). 

“The key here is to have multiple indicators and train the machine learning algorithm on data concerning real-world exploits in the wild,” Shakarian explains, “predicting exploits is like finding a needle in a haystack – but you need to take action or you get what we had with Equifax where they decided poorly.”  What Shakarian says makes sense – decisions on patching need to involve a component of the threat.  Quantifying that threat in an empirical way – considering multiple streams of indicators – is a challenging task – and that’s what the MIT study illustrated.  Sure, an experienced analyst can manually read information available for a particular CVE – but that stops being practical when an enterprise has thousands or tens of thousands of open vulnerabilities.

Nuanced? Yes.  But valuable?  Highly.  Sometimes solving major cybersecurity problems requires a leap out of the box -and leveraging a new type of AI that was developed from the ground up to predict the activities of threat actors seems to make a lot of sense.  CYR3CON has an exciting roadmap that includes tools to help other security functions like endpoint, vendor risk, and operations – it will be exciting to see how these capabilities shape the industry.





Topics: Threat Intelligence, Information Technology, Information Security, machine learning, Artificial Intelligence