4.5 Article

Resampling imbalanced data for network intrusion detection datasets

Journal

JOURNAL OF BIG DATA
Volume 8, Issue 1, Pages -

Publisher

SPRINGERNATURE
DOI: 10.1186/s40537-020-00390-x

Keywords

Oversampling; Undersampling; Resampling; Imbalanced Data; Network Intrusion Detection Systems; SMOTE; ADASYN; Artificial Neural Networks; Macro precision; Macro recall

Funding

  1. Askew Institute of the University of West Florida

Ask authors/readers for more resources

This research investigates the influence of resampling on the performance of Artificial Neural Network multi-class classifiers, showing that oversampling increases training time while undersampling decreases it; in cases of extreme data imbalance, both oversampling and undersampling significantly increase recall; moderate data imbalances may not be greatly affected by resampling, but with it, especially oversampling, more minority data (attacks) can be detected.
Machine learning plays an increasingly significant role in the building of Network Intrusion Detection Systems. However, machine learning models trained with imbalanced cybersecurity data cannot recognize minority data, hence attacks, effectively. One way to address this issue is to use resampling, which adjusts the ratio between the different classes, making the data more balanced. This research looks at resampling's influence on the performance of Artificial Neural Network multi-class classifiers. The resampling methods, random undersampling, random oversampling, random undersampling and random oversampling, random undersampling with Synthetic Minority Oversampling Technique, and random undersampling with Adaptive Synthetic Sampling Method were used on benchmark Cybersecurity datasets, KDD99, UNSW-NB15, UNSW-NB17 and UNSW-NB18. Macro precision, macro recall, macro F1-score were used to evaluate the results. The patterns found were: First, oversampling increases the training time and undersampling decreases the training time; second, if the data is extremely imbalanced, both oversampling and undersampling increase recall significantly; third, if the data is not extremely imbalanced, resampling will not have much of an impact; fourth, with resampling, mostly oversampling, more of the minority data (attacks) were detected.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available