4.6 Article

A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

Journal

ELECTRONICS
Volume 11, Issue 2, Pages -

Publisher

MDPI
DOI: 10.3390/electronics11020213

Keywords

imbalanced learning; adversarial learning; generative models; generative adversarial networks; oversampling; intrusion detection systems; machine learning; deep learning

Funding

  1. DoD Center of Excellence in AI and Machine Learning (CoE-AIML) at Howard University [W911NF-20-2-0277]
  2. U.S. Army Research Laboratory, by the Microsoft Research and by the US National Science Foundation [1828811, CNS/SaTC 2039583]
  3. Division Of Human Resource Development
  4. Direct For Education and Human Resources [1828811] Funding Source: National Science Foundation

Ask authors/readers for more resources

The study highlights the limitations of current public datasets for training effective intelligent intrusion detection systems, emphasizing the importance of utilizing dynamically generated data in an adversarial setting. It suggests that training models using imbalanced and adversarial learning is crucial for enhancing the efficacy and performance of intrusion detection systems.
An effective anomaly-based intelligent IDS (AN-Intel-IDS) must detect both known and unknown attacks. Hence, there is a need to train AN-Intel-IDS using dynamically generated, real-time data in an adversarial setting. Unfortunately, the public datasets available to train AN-Intel-IDS are ineluctably static, unrealistic, and prone to obsolescence. Further, the need to protect private data and conceal sensitive data features has limited data sharing, thus encouraging the use of synthetic data for training predictive and intrusion detection models. However, synthetic data can be unrealistic and potentially bias. On the other hand, real-time data are realistic and current; however, it is inherently imbalanced due to the uneven distribution of anomalous and non-anomalous examples. In general, non-anomalous or normal examples are more frequent than anomalous or attack examples, thus leading to skewed distribution. While imbalanced data are commonly predominant in intrusion detection applications, it can lead to inaccurate predictions and degraded performance. Furthermore, the lack of real-time data produces potentially biased models that are less effective in predicting unknown attacks. Therefore, training AN-Intel-IDS using imbalanced and adversarial learning is instrumental to their efficacy and high performance. This paper investigates imbalanced learning and adversarial learning for training AN-Intel-IDS using a qualitative study. It surveys and synthesizes generative-based data augmentation techniques for addressing the uneven data distribution and generative-based adversarial techniques for generating synthetic yet realistic data in an adversarial setting using rapid review, structured reporting, and subgroup analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available