☆ 4.5 Article

Semi-supervised machine learning framework for network intrusion detection

JOURNAL OF SUPERCOMPUTING (2022)

期刊

JOURNAL OF SUPERCOMPUTING

卷 78, 期 11, 页码 13122-13144

出版社

SPRINGER

DOI: 10.1007/s11227-022-04390-x

关键词

Network intrusion detection; Fisher score; Information gain; PCA; Tri-LightGBM

类别

Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

National Natural Science Foundation of China [U1804263, 61877010]
Natural Science Foundation of Fujian Province China [2021J01616, 2020J01130167, 2021J01625]
Joint Straits Fund of Key Program of the National Natural Science Foundation of China [U1705262]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a semi-supervised machine learning framework for network intrusion detection, which combines multi-strategy feature filtering, PCA, and an improved Tri-LightGBM model based on stratified sampling to enhance detection accuracy and classification performance.

Network intrusion detection plays an important role as tools for managing and identifying potential threats, which presents various challenges. Redundant features and difficult marking in data cause a long-term problem in network traffic detection. In this paper, we propose a semi-supervised machine learning framework based on multi-strategy feature filtering, principal component analysis (PCA), and an improved Tri-Light Gradient Boosting Machine (Tri-LightGBM) based on stratified sampling. This multi-strategy feature filtering method employing Fisher score and Information gain can select features that have good category discrimination and are more relevant to category labels. After that, we combine PCA to convert multiple features into comprehensive features, which are used as the input of the Tri-LightGBM model. Tri-LightGBM can exploit unlabeled data cooperatively and maintain a large disagreement among the base learners. Moreover, we propose a stratified sampling based on labeled categories to reduce the probability of being selected as the same category during the model update process. Thus, the Tri-LightGBM based on stratified sampling can compensate for the classification error rate caused by the imbalance of the dataset. The semi-supervised machine learning framework is evaluated on two intrusion detection evaluation datasets, namely UNSW-NB15 and CIC-IDS-2017. The evaluation results show that the multi-strategy feature filtering method can increase the accuracy, recall, precision, and F-measure by up to 0.5%, and reduce the false-positive rate by up to 0.5%. Furthermore, the precision rate of minority categories can be increased by about 1-2%.

Semi-supervised machine learning framework for network intrusion detection

期刊

JOURNAL OF SUPERCOMPUTING

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Semi-supervised machine learning framework for network intrusion detection

期刊

JOURNAL OF SUPERCOMPUTING

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文