4.7 Article

Semi-supervised two-phase familial analysis of Android malware with normalized graph embedding

期刊

KNOWLEDGE-BASED SYSTEMS
卷 218, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2021.106802

关键词

Android malware; Normalized graph embedding; Familial analysis; Semi-supervised learning

资金

  1. National Natural Science Foundation of China [61672421]
  2. Ministry of Education of the People's Republic of China [2020KJ010801]

向作者/读者索取更多资源

The article introduces a system for analyzing the familial of Android malware, named GSFDroid. This system utilizes graph features and Graph Convolutional Networks to embed features, improving the efficiency of downstream analytics tasks. By using a simple graph feature normalization method to standardize embedded APK features, the system effectively clusters new malware samples from unknown families.
With the widespread use of smartphones, Android malware has posed serious threats to its security. Given the explosive growth of Android malware variants, detecting malware families are crucial for identifying new security threats, triaging, and building reference datasets. Building behavior profiles of Android applications (apps) with holistic graph-based features would help to retain program semantics and resist obfuscation. It is more effective to use representation with the low-dimensional feature, which could reduce calculation cost and improve the efficiency of downstream analytics tasks. To achieve this goal, we design and develop a practical system for the familial analysis of Android malware named GSFDroid. We first use graph-based features that contain structural information to analyze app behavior. Then, we employ Graph Convolutional Networks (GCNs) to embed nodes into a continuous and low-dimensional space, which improves the efficiency of downstream analytics tasks. Note that distributions of the learned feature vectors of APKs are not aligned and centered caused by the random initialization and propagation strategy of GCN, whose different scales can harm the performance of downstream tasks. Inspired by the z -score, we propose a simple graph feature normalization to standardize the embedded APK features. Finally, instead of fully supervised or unsupervised learning, we propose a two-phased familial analysis method fusing a semi-supervised classifier with a cluster operation on high uncertain score samples respect to the classifier. Promising experimental results based on real-world datasets demonstrate that our approach significantly outperforms state-of-the-art approaches, and can effectively cluster new malware samples from unknown families. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据