4.6 Article

Effective method for detecting malicious PowerShell scripts based on hybrid features

期刊

NEUROCOMPUTING
卷 448, 期 -, 页码 30-39

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2021.03.117

关键词

Powershell; Abstract syntax tree; Scripts detection; Machine learning

资金

  1. National Natural Science Foundation of China [61902265, U20B2045]
  2. National Key Research and Development Program of China [2016QY13Z2302, 2018YFB0804103]
  3. Sichuan Science and Technology Program [2020YFG0047, 2020YFG0374]
  4. Sichuan University Postdoc Research Foundation [2019SCU12068]
  5. Guangxi Key Laboratory of Cryptography and Information Security [GCIS201921]
  6. Fundamental Research Funds for the Central Universities

向作者/读者索取更多资源

The article introduces a detection model for malicious PowerShell scripts based on hybrid features, which analyzes the differences in text characters, functions, tokens, and nodes of the abstract syntax tree to classify malicious and benign samples. The model achieves high accuracy even in a complex dataset.
At present, network attacks are rampant in the Internet world, and the attack methods of hackers are changing steadily. PowerShell is a programming language based on the command line and.NET framework, with powerful functions and good compatibility. Therefore, hackers often use PowerShell malicious scripts to attack the victims in APT attacks. When these malicious PowerShell scripts are executed, hackers can control the victim's computer or leave a backdoor on their computers. In this paper, a detection model of malicious PowerShell scripts based on hybrid features is proposed, we analyzed the differences between malicious and benign samples in text characters, functions, tokens and the nodes of the abstract syntax tree. Firstly, the script of PowerShell is embedded by FastText. Then the textual features, token features and the nodes features of PowerShell code extracted from the abstract syntax tree are added. Finally, the hybrid features of scrips will be classified by a Random Forest classifier. In the experiment, the malicious scripts are inserted into the benign scripts to weaken the features of the malicious samples in the level of abstract syntax tree nodes and tokens, which makes the scripts more complex. Even in such a complex data set, the proposed model which is based on hybrid features still achieves an accuracy of 97.76% in fivefold cross-validation. Moreover, the accuracy of this proposed model on the original scripts is 98.93%, which means that the proposed model has the ability to classify complex scripts. (c) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据