Journal
COMPUTERS & ELECTRICAL ENGINEERING
Volume 86, Issue -, Pages -Publisher
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compeleceng.2020.106729
Keywords
Apache Spark; Big data; Ensemble learning; Malware detection; Stacking; Weighted voting
Ask authors/readers for more resources
Malware detection and classification play a critical role in computer and network security. Although, many machine learning models have been used in the detection of malicious binaries, however, the performance of ensemble methods has not been investigated extensively. Besides, the massive volume of malware has established it as a big data problem forcing security researchers and practitioners to deploy big data technologies to manage, store, analyze, and visualize malware data. In this paper, the authors have designed two methods based on ensemble learning and big data for improving the performance of malware detection at a large scale. The first method is based on the weighted voting strategy of ensemble learning, and the second method chooses an optimal set of base classifiers for stacking purpose. The proposed methods are implemented using Apache Spark, a popular big data processing framework, and their performance is tested and evaluated on a dataset of 198,350 Windows files including 100,200 malicious and 98,150 benign samples. The experimental results successfully validate the effectiveness of the proposed approach since it improves the generalization performance in detecting new malware. (C) 2020 Elsevier Ltd. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available