☆ 4.5 Article

Improving malware detection using big data and ensemble learning

COMPUTERS & ELECTRICAL ENGINEERING (2020)

Journal

COMPUTERS & ELECTRICAL ENGINEERING

Volume 86, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.compeleceng.2020.106729

Keywords

Apache Spark; Big data; Ensemble learning; Malware detection; Stacking; Weighted voting

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Malware detection and classification play a critical role in computer and network security. Although, many machine learning models have been used in the detection of malicious binaries, however, the performance of ensemble methods has not been investigated extensively. Besides, the massive volume of malware has established it as a big data problem forcing security researchers and practitioners to deploy big data technologies to manage, store, analyze, and visualize malware data. In this paper, the authors have designed two methods based on ensemble learning and big data for improving the performance of malware detection at a large scale. The first method is based on the weighted voting strategy of ensemble learning, and the second method chooses an optimal set of base classifiers for stacking purpose. The proposed methods are implemented using Apache Spark, a popular big data processing framework, and their performance is tested and evaluated on a dataset of 198,350 Windows files including 100,200 malicious and 98,150 benign samples. The experimental results successfully validate the effectiveness of the proposed approach since it improves the generalization performance in detecting new malware. (C) 2020 Elsevier Ltd. All rights reserved.

Improving malware detection using big data and ensemble learning

Journal

COMPUTERS & ELECTRICAL ENGINEERING

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Improving malware detection using big data and ensemble learning

Journal

COMPUTERS & ELECTRICAL ENGINEERING

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper