☆ 4.6 Article

A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

ELECTRONICS (2022)

Journal

ELECTRONICS

Volume 11, Issue 2, Pages -

Publisher

MDPI

DOI: 10.3390/electronics11020213

Keywords

imbalanced learning; adversarial learning; generative models; generative adversarial networks; oversampling; intrusion detection systems; machine learning; deep learning

Funding

DoD Center of Excellence in AI and Machine Learning (CoE-AIML) at Howard University [W911NF-20-2-0277]
U.S. Army Research Laboratory, by the Microsoft Research and by the US National Science Foundation [1828811, CNS/SaTC 2039583]
Division Of Human Resource Development
Direct For Education and Human Resources [1828811] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study highlights the limitations of current public datasets for training effective intelligent intrusion detection systems, emphasizing the importance of utilizing dynamically generated data in an adversarial setting. It suggests that training models using imbalanced and adversarial learning is crucial for enhancing the efficacy and performance of intrusion detection systems.

An effective anomaly-based intelligent IDS (AN-Intel-IDS) must detect both known and unknown attacks. Hence, there is a need to train AN-Intel-IDS using dynamically generated, real-time data in an adversarial setting. Unfortunately, the public datasets available to train AN-Intel-IDS are ineluctably static, unrealistic, and prone to obsolescence. Further, the need to protect private data and conceal sensitive data features has limited data sharing, thus encouraging the use of synthetic data for training predictive and intrusion detection models. However, synthetic data can be unrealistic and potentially bias. On the other hand, real-time data are realistic and current; however, it is inherently imbalanced due to the uneven distribution of anomalous and non-anomalous examples. In general, non-anomalous or normal examples are more frequent than anomalous or attack examples, thus leading to skewed distribution. While imbalanced data are commonly predominant in intrusion detection applications, it can lead to inaccurate predictions and degraded performance. Furthermore, the lack of real-time data produces potentially biased models that are less effective in predicting unknown attacks. Therefore, training AN-Intel-IDS using imbalanced and adversarial learning is instrumental to their efficacy and high performance. This paper investigates imbalanced learning and adversarial learning for training AN-Intel-IDS using a qualitative study. It surveys and synthesizes generative-based data augmentation techniques for addressing the uneven data distribution and generative-based adversarial techniques for generating synthetic yet realistic data in an adversarial setting using rapid review, structured reporting, and subgroup analysis.

A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

Journal

ELECTRONICS

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

Journal

ELECTRONICS

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper