4.6 Article

Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data

Publisher

SPRINGER
DOI: 10.1007/s10586-021-03249-9

Keywords

Distributed; Concept drift; Ensemble; Streaming

Ask authors/readers for more resources

Network security is crucial in the digital age, with research focusing on evolving and secure mechanisms for secure communications. This paper presents distributed machine learning based ensemble techniques for detecting concept drift and attacks in network traffic, achieving high accuracy on various datasets. Machine learning, coupled with new technologies, offers promising solutions to combat the ever-increasing pace of network-based attacks.
Ever since the internet became part of the everyday lives of humans providing network security has been considered of utmost importance. Over the years lot of time and energy has been devoted by people in the research community and industry to provide better, improved and secure mechanisms to ensure secure communications on the internet. Amongst the many fields of study, the most prominent and ever evolving one has been the study of network traffic for attack detection and mitigation. The advent of new technologies has led to an increase in the pace of network based attacks and therefore novel modified approaches are needed to be able to cope with these latest trends. Distributed machine learning with the development of new tools and frameworks like RDD structure in Apache Spark provides an immense scope of growth in this direction. Moreover, the dynamic nature of present day network traffic called concept drift has also necessitated studying solutions from a different angle. We, therefore, in this paper have worked on distributed machine learning based ensemble techniques to detect the presence of concept drift in network traffic and detect network based attacks. The work has been done in three parts. Firstly, two classifiers, namely, Random Forest and Logistic Regression have been used as level '0 ' learners and Support Vector Machine has been used as level '1 ' learner. Secondly, to handle the process of concept drift we have used a sliding window based K-means clustering. And thirdly ensemble based techniques for detection of attacks in the traffic. The experiments have been performed on three datasets, namely, the NSL-KDD dataset, the CIDDS-2017 dataset and generated Testbed dataset. These tests have been conducted on different machines by varying the number of executor cores to study time latency in a distributed environment. An accuracy of 93% on NSL-KDD, 98% on CIDDS-2017 and 97% on Testbed datasets for SVM based blending model have been achieved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available