4.6 Article

Malware Detection Inside App Stores Based on Lifespan Measurements

Journal

IEEE ACCESS
Volume 9, Issue -, Pages 119967-119976

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3107903

Keywords

Malware; Internet; Engines; Machine learning; Ecosystems; Static analysis; Feature extraction; Machine learning; app~stores; google play malware; android malware; malware detection; potentially harmful apps

Ask authors/readers for more resources

This paper proposes a solution based on machine learning algorithms to detect PHAs in application markets, using the lifespan of applications in Google Play as a criterion to avoid the bias of antivirus engines. The solution has shown a 90% accuracy score and offers a complementary method to existing machine learning models for detecting PHAs.
Potentially Harmful Apps (PHAs), like any other type of malware, are a problem. Even though Google tries to maintain a clean app ecosystem, Google Play Store is still one of the main vectors for spreading PHAs. In this paper, we propose a solution based on machine learning algorithms to detect PHAs inside application markets. Being the application markets one of the main entry vectors, a solution capable of detecting PHAs submitted or in submission to those markets is needed. This solution is capable of detecting PHAs inside an application market and can be used as a filtering method, to automatically block the publishing of novel PHAs. The proposed solution is based on application static analysis, and even though several static analysis solutions have been developed, the innovation of this system is based on its training and the creation of its dataset. We have created a new dataset that uses as criteria the lifespan of an application inside Google Play, the shorter time an application is active inside an application market the higher the probability that this is a PHA. This criterion was added in order to avoid the usage and bias of antivirus engines for detecting malware. Involving the lifespan as criteria we created a new method of detection that does not replicate any existing antivirus engines. Experimental results have proved that this solution obtains a 90% accuracy score, using a dataset of 91,203 applications published on the Google Play Store. Despite showing a decrease in accuracy, compared with other machine learning models focused on detecting PHAs; it is necessary to take into account that this is a complementary and different method. The presented work can be combined with other static and dynamic machine learning models, since its training is drastically different, as it was based on lifespan measurements.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available