There is much discussion about machine learning applied to cybersecurity. Many believe that machine learning will ultimately revolutionize the industry. As IT infrastructure becomes more complicated and intelligence, network traffic, and log data become more difficult to deal with — even with the best analytical tools — machine learning may not be just “a solution”. It may be the only solution.
There is a problem with many machine learning techniques though. Many act as a “black box”. How can one trust the results of such a system?
But wait, what about false positive rate — let’s say the system has a false positive rate of 10% — shouldn’t we just trust it most of the time?
But the issue is more nuanced than that. Let’s suppose we buy the system with the 10% false positive rate. Over three months, it actually did a lot better — it had only a 5% false positive rate. But then month 4 happens — and the false positive rate for that month goes to 15%.
So, did it break? Or should we just have expected this as it was performing above specification for so long?
It could have been either. If something fundamentally changed, then the drop in accuracy is serious. The whole system could potentially become useless in such a case. However, if there was not a fundamental change, and it was just an “off month” then we should expect the false positive rate to return to the original specification.
The key is knowing the difference. The problem is that many systems do not offer that level of transparency. However, there are various methods that can help with this — and we can expect that, as machine learning evolves, many of these solutions will start to trickle into the market.