4.7 Article

CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains

期刊

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2018.06.006

关键词

Android malware; Dynamic analysis; Classification; Deep Learning; Markov chains

资金

  1. Comunidad Autdnoma de Madrid [S2013/ICE-3095]
  2. Spanish Ministry of Science and Education and Competitivity (MINECO)
  3. European Regional Development Fund (FEDER) [TIN2014-56494-C4-4-P, TIN2017-85727-C4-3-P]
  4. Justice Programme of the European Union [723180 - RiskTrack - JUST-2015-JCOO-AG/JUST-2015-JCOO-AG-1]

向作者/读者索取更多资源

Malware writers are usually focused on those platforms which are most used among common users, with the aim of attacking as many devices as possible. Due to this reason, Android has been heavily attacked for years. Efforts dedicated to combat Android malware are mainly concentrated on detection, in order to prevent malicious software to be installed in a target device. However, it is equally important to put effort into an automatic classification of the type, or family, of a malware sample, in order to establish which actions are necessary to mitigate the damage caused. In this paper, we present CANDYMAN, a tool that classifies Android malware families by combining dynamic analysis and Markov chains. A dynamic analysis process allows to extract representative information of a malware sample, in form of a sequence of states, while a Markov chain allows to model the transition probabilities between the states of the sequence, which will be used as features in the classification process. The space of features built is used to train classical Machine Learning, including methods for imbalanced learning, and Deep Learning algorithms, over a dataset of malware samples from different families, in order to evaluate the proposed method. Using a collection of 5,560 malware samples grouped into 179 different families (extracted from the Drebin dataset), and once made a selection based on a minimum number of relevant and valid samples, a final set of 4,442 samples grouped into 24 different malware families was used. The experimental results indicate a precision performance of 81.8% over this dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据