4.7 Article

Classification of ransomware families with machine learning based on N-gram of opcodes

出版社

ELSEVIER
DOI: 10.1016/j.future.2018.07.052

关键词

Ransomware classification; Static analysis; Opcode; Machine learning; N-gram

资金

  1. NSFC [61773229, 61771273, 61202358]
  2. National High-tech RAMP
  3. D Program of China [2015AA016102]
  4. Guangdong Natural Science Foundation [2018A030313422]
  5. RD Program of Shenzhen [JCYJ20160 531174259309, JCYJ20170307153032483, JCYJ20160331184 440545, JCYJ20170307153157440]
  6. Interdisciplinary Research Project of Graduate School at Shenzhen of Tsinghua University [JC2017005]

向作者/读者索取更多资源

Ransomware is a special type of malware that can lock victims' screen and/or encrypt their files to obtain ransoms, resulting in great damage to users. Mapping ransomware into families is useful for identifying the variants of a known ransomware sample and for reducing analysts' workload. However, ransomware that can fingerprint the environment can evade the precious work of dynamic analysis. To the best of our knowledge, to overcome this shortcoming, we are the first to propose an approach based on static analysis to classifying ransomware. First, opcode sequences from ransomware samples are transformed into N-gram sequences. Then, Term frequency-Inverse document frequency (TF-IDF) is calculated for each N-gram to select feature N-grams so that these N-grams exhibit better discrimination between families. Finally, we treat the vectors composed of the TF values of the feature N-grams as the feature vectors and subsequently feed them to five machine-learning methods to perform ransomware classification. Six evaluation criteria are employed to validate the model. Thorough experiments performed using real datasets demonstrate that our approach can achieve the best Accuracy of 91.43%. Furthermore, the average F1-measure of the wannacry ransomware family is up to 99%, and the Accuracy of binary classification is up to 99.3%. The proposed method can detect and classify ransomware that can fingerprint the environment. In addition, we discover that different feature dimensions are required for achieving similar classifier performance with feature N-grams of diverse lengths. (C) 2018 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据