3.8 Proceedings Paper

Android Malware Detection Through a Pre-trained Model for Code Understanding

出版社

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-031-21333-5_105

关键词

Android; Malware; Pre-trained model; Embedding; CodeT5

向作者/读者索取更多资源

This study utilizes CodeT5 pre-trained language model to generate context and semantic aware embeddings for a better representation of the behavior of Android applications. It shows how these embeddings can be used to train a recurrent neural network for malware detection tasks, and presents promising results.
Despite the large number of approaches proposed for detecting malicious applications targeting platforms such as Android, malware continuously evolves in order to avoid its detection and reach the users. Likewise, malware detection engines are continuously improved, trying to detect the most modern malware. Most of these detection tools employ signatures or machine learning models, trained on thousands of features, such as API calls, permissions or using taint analysis, among many others, and using machine learning classification algorithms such as decision trees, ensemble methods or deep learning. However, the use of these features leads to biased models due to the use of limited datasets, without considering the real semantics (goals and intentions) of the malicious sample. In this paper, we conduct an initial study of the use of context and semantic aware embeddings generated with the CodeT5 pre-trained language model for a better representation of the behaviour of Android applications. After decompiling a sample to Java, it is possible to generate embeddings from chunks of the source code, generating a rich representation of the sample. We show how these embeddings can be used to train a recurrent neural network for malware detection tasks, evidencing promising results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据