4.4 Article

Minimized feature overhead malware detection machine learning model employing MRMR-based ranking

Journal

Publisher

WILEY
DOI: 10.1002/cpe.6992

Keywords

feature overhead; feature selection; machine learning malware detection; maximum relevance minimum redundancy; static malware detection

Ask authors/readers for more resources

To improve the speed and efficiency of malware detection, a machine learning model with optimized preprocessing, feature selection, and classifier parameter tuning is proposed. Experimental results demonstrate that the model achieves excellent performance on various metrics while reducing feature overhead.
To deal with the huge amount of data, minimizing the overhead will play a key role in speedy and efficient malware detection. We propose a machine learning (ML) malware detection model with preprocessing to limit the feature overhead. The portable-executable (PE) header information that retains meaningful and distinctive information has been considered to classify benign and malware files. The dataset is preprocessed by applying transformation, outlier detection and filling, and smoothing techniques. A maximum relevance minimum redundancy-based feature selection method is deployed to assign the rank and score to each feature retaining the maximum relevant and minimal redundant information. Based on the obtained rank, many subsets of features have been created and investigated against support vector machine (SVM) and k-nearest neighbors (k-NN) with parametric tuning. The proposed ML model integrated with data preprocessing, feature selection, and SVM-polynomial classifier has superior performance. This model is eliminating 63.8% feature overhead with accuracy above 99.1% for the benchmark datasets. To examine the robustness of the proposed model, new balanced and imbalanced datasets are created using new malware. The test results are encouraging with accuracy and specificity above 96.68%, 97.65%, and 91.57%, respectively. Interestingly, the proposed model is not trained using the newly created dataset.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available