☆ 3.8 Proceedings Paper

Android Malware Detection Through a Pre-trained Model for Code Understanding

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING & AMBIENT INTELLIGENCE (UCAMI 2022) (2023)

Journal

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING & AMBIENT INTELLIGENCE (UCAMI 2022)

Volume 594, Issue -, Pages 1055-1060

Publisher

SPRINGER INTERNATIONAL PUBLISHING AG

DOI: 10.1007/978-3-031-21333-5_105

Keywords

Android; Malware; Pre-trained model; Embedding; CodeT5

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study utilizes CodeT5 pre-trained language model to generate context and semantic aware embeddings for a better representation of the behavior of Android applications. It shows how these embeddings can be used to train a recurrent neural network for malware detection tasks, and presents promising results.

Despite the large number of approaches proposed for detecting malicious applications targeting platforms such as Android, malware continuously evolves in order to avoid its detection and reach the users. Likewise, malware detection engines are continuously improved, trying to detect the most modern malware. Most of these detection tools employ signatures or machine learning models, trained on thousands of features, such as API calls, permissions or using taint analysis, among many others, and using machine learning classification algorithms such as decision trees, ensemble methods or deep learning. However, the use of these features leads to biased models due to the use of limited datasets, without considering the real semantics (goals and intentions) of the malicious sample. In this paper, we conduct an initial study of the use of context and semantic aware embeddings generated with the CodeT5 pre-trained language model for a better representation of the behaviour of Android applications. After decompiling a sample to Java, it is possible to generate embeddings from chunks of the source code, generating a rich representation of the sample. We show how these embeddings can be used to train a recurrent neural network for malware detection tasks, evidencing promising results.

Android Malware Detection Through a Pre-trained Model for Code Understanding

Journal

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING & AMBIENT INTELLIGENCE (UCAMI 2022)

Publisher

SPRINGER INTERNATIONAL PUBLISHING AG

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Android Malware Detection Through a Pre-trained Model for Code Understanding

Journal

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING & AMBIENT INTELLIGENCE (UCAMI 2022)

Publisher

SPRINGER INTERNATIONAL PUBLISHING AG

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper