Anomaly detection optimization using big data and deep learning to reduce false-positive

  • 13 Oct 2020
  • Published Resarch - Informatics & Communication

Researchers

Khloud Al Jallad, Mohamad Aljnidi and Mohammad Said Desouki

Published in

Journal of Big Data, Volume 7, Article number 68, August 2020.


Abstract

Anomaly-based Intrusion Detection System (IDS) has been a hot research topic because of its ability to detect new threats rather than only memorized signatures threats of signature-based IDS. Especially after the availability of advanced technologies that increase the number of hacking tools and increase the risk impact of an attack. The problem of any anomaly-based model is its high false-positive rate. The high false-positive rate is the reason why anomaly IDS is not commonly applied in practice. Because anomaly-based models classify an unseen pattern as a threat where it may be normal but not included in the training dataset. This type of problem is called overfitting where the model is not able to generalize. Optimizing Anomaly-based models by having a big training dataset that includes all possible normal cases may be an optimal solution but could not be applied in practice. Although we can increase the number of training samples to include much more normal cases, still we need a model that has more ability to generalize. In this research paper, we propose applying deep model instead of traditional models because it has more ability to generalize. Thus, we will obtain less false-positive by using big data and deep model. We made a comparison between machine learning and deep learning algorithms in the optimization of anomaly-based IDS by decreasing the false-positive rate. We did an experiment on the NSL-KDD benchmark and compared our results with one of the best used classifiers in traditional learning in IDS optimization. The experiment shows 10% lower false-positive by using deep learning instead of traditional learning.

Keywords: Intrusion detection systems (IDS), Security intelligence optimization, Unknown threats, big data, NSL-KDD dataset, False-positive.

Link to read full paper

https://doi.org/10.1186/s40537-020-00346-1