4.5 Article

Improving malware detection using big data and ensemble learning

Journal

COMPUTERS & ELECTRICAL ENGINEERING
Volume 86, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compeleceng.2020.106729

Keywords

Apache Spark; Big data; Ensemble learning; Malware detection; Stacking; Weighted voting

Ask authors/readers for more resources

Malware detection and classification play a critical role in computer and network security. Although, many machine learning models have been used in the detection of malicious binaries, however, the performance of ensemble methods has not been investigated extensively. Besides, the massive volume of malware has established it as a big data problem forcing security researchers and practitioners to deploy big data technologies to manage, store, analyze, and visualize malware data. In this paper, the authors have designed two methods based on ensemble learning and big data for improving the performance of malware detection at a large scale. The first method is based on the weighted voting strategy of ensemble learning, and the second method chooses an optimal set of base classifiers for stacking purpose. The proposed methods are implemented using Apache Spark, a popular big data processing framework, and their performance is tested and evaluated on a dataset of 198,350 Windows files including 100,200 malicious and 98,150 benign samples. The experimental results successfully validate the effectiveness of the proposed approach since it improves the generalization performance in detecting new malware. (C) 2020 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available