☆ 4.5 Article

A cost analysis of machine learning using dynamic runtime opcodes for malware detection

COMPUTERS & SECURITY (2019)

期刊

COMPUTERS & SECURITY

卷 85, 期 -, 页码 138-155

出版社

ELSEVIER ADVANCED TECHNOLOGY

DOI: 10.1016/j.cose.2019.04.018

关键词

Malicious code; Network security; Machine learning; Computer security; Malware

类别

Computer Science, Information Systems

资金

EPSRC [CSIT 2 EP/N508664/1]
EPSRC [EP/K003445/1, EP/R007187/1, EP/K004379/1, EP/N508664/1] Funding Source: UKRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The ongoing battle between malware distributors and those seeking to prevent the onslaught of malicious code has, so far, favored the former. Anti-virus methods are faltering with the rapid evolution and distribution of new malware, with obfuscation and detection evasion techniques exacerbating the issue. Recent research has monitored low-level opcodes to detect malware. Such dynamic analysis reveals the code at runtime, allowing the true behaviour to be examined. While previous research uses machine learning techniques to accurately detect malware using dynamic runtime opcodes, underpinning datasets have been poorly sampled and inadequate in size. Further, the datasets are always fixed size and no attempt, to our knowledge, has been made to examine the cost of retraining malware classification models on datasets which grow continually. In the literature, researchers discuss the explosion of malware, yet opcode analyses have used fixed-size datasets, with no deference to how this model will cope with retraining on escalating datasets. The research presented here examines this problem, and makes several novel contributions to the current body of knowledge. First, the performance of 23 machine learning algorithms are investigated with respect to the largest run trace dataset in the literature. Second, following an extensive hyperparameter selection process, the performance of each classifier is compared, on both accuracy and computational costs (CPU time). Lastly, the cost of retraining and testing updatable and non-updatable classifiers, both parallelized and non-parallelized, is examined with simulated escalating datasets. This provides insight into how implemented malware classifiers would perform, given simulated dataset escalation. We find that parallelized RandomForest, using 4 cores, provides the optimal performance, with high accuracy and low training and testing times. (C) 2019 Elsevier Ltd. All rights reserved.

A cost analysis of machine learning using dynamic runtime opcodes for malware detection

期刊

COMPUTERS & SECURITY

出版社

ELSEVIER ADVANCED TECHNOLOGY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A cost analysis of machine learning using dynamic runtime opcodes for malware detection

期刊

COMPUTERS & SECURITY

出版社

ELSEVIER ADVANCED TECHNOLOGY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文