4.6 Article

Why an Android App Is Classified as Malware: Toward Malware Classification Interpretation

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3423096

Keywords

Android malware; interpretability; machine learning; interpretable AI

Funding

  1. Singapore Ministry of Education Academic Research Fund Tier 1 [2018-T1-002069]
  2. National Research Foundation, Prime Ministers Office, Singapore under its National Cybersecurity RD Program [NRF2018 NCR-NCR005-0001]
  3. Singapore National Research Foundation under NCR Award [NSOE003-0001, NRF2018NCR-NSOE004-0001]
  4. NRF Investigatorship [NRFI06-2020-0022]
  5. Research Grants Council of the Hong Kong Special Administrative Region, China [CUHK 14210717]
  6. National Natural Science Foundation of China [62002084]

Ask authors/readers for more resources

This article introduces a novel and interpretable machine learning-based approach (XMal) for Android malware detection and analysis. XMal not only accurately classifies malware, but also explains the classification results and malicious behavior descriptions, addressing the gaps in existing research.
Machine learning-(ML) based approach is considered as one of the most promising techniques for Android malware detection and has achieved high accuracy by leveraging commonly used features. In practice, most of the ML classifications only provide a binary label to mobile users and app security analysts. However, stakeholders are more interested in the reason why apps are classified as malicious in both academia and industry. This belongs to the research area of interpretable ML but in a specific research domain (i.e., mobile malware detection). Although several interpretable ML methods have been exhibited to explain the final classification results in many cutting-edge Artificial Intelligent-based research fields, until now, there is no study interpreting why an app is classified as malware or unveiling the domain-specific challenges. In this article, to fill this gap, we propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result meanwhile. (1) The first classification phase of XMAL hinges multi-layer perceptron and attention mechanism and also pinpoints the key features most related to the classification result. (2) The second interpreting phase aims at automatically producing neural language descriptions to interpret the core malicious behaviors within apps. We evaluate the behavior description results by leveraging a human study and an in-depth quantitative analysis. Moreover, we further compare XMAL with the existing interpretable ML-based methods (i.e., Drebin and LIME) to demonstrate the effectiveness of XMAL. We find that XMAL is able to reveal the malicious behaviors more accurately. Additionally, our experiments show that XMAL can also interpret the reason why some samples are misclassified by ML classifiers. Our study peeks into the interpretable ML through the research of Android malware detection and analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available