Characterizing Android apps' behavior for effective detection of malapps at large scale

X Wang, W Wang, Y He, J Liu, Z Han… - Future generation …, 2017 - Elsevier
X Wang, W Wang, Y He, J Liu, Z Han, X Zhang
Future generation computer systems, 2017Elsevier
Android malicious applications (malapps) have surged and been sophisticated, posing a
great threat to users. How to characterize, understand and detect Android malapps at a large
scale is thus a big challenge. In this work, we are motivated to discover the discriminatory
and persistent features extracted from Android APK files for automated malapp detection at a
large scale. To achieve this goal, firstly we extract a very large number of features from each
app and categorize the features into two groups, namely, app-specific features as well as …
Abstract
Android malicious applications (malapps) have surged and been sophisticated, posing a great threat to users. How to characterize, understand and detect Android malapps at a large scale is thus a big challenge. In this work, we are motivated to discover the discriminatory and persistent features extracted from Android APK files for automated malapp detection at a large scale. To achieve this goal, firstly we extract a very large number of features from each app and categorize the features into two groups, namely, app-specific features as well as platform-defined features. These feature sets will then be fed into four classifiers (i.e., Logistic Regression, linear SVM, Decision Tree and Random Forest) for the detection of malapps. Secondly, we evaluate the persistence of app-specific and platform-defined features on classification performance with two data sets collected in different time periods. Thirdly, we comprehensively analyze the relevant features selected by Logistic Regression classifier to identify the contributions of each feature set. We conduct extensive experiments on large real-world app sets consisting of 213,256 benign apps collected from six app markets, 4,363 benign apps from Google Play market, and 18,363 malapps. The experimental results and our analysis give insights regarding what discriminatory features are most effective to characterize malapps for building an effective and efficient malapp detection system. With the selected discriminatory features, the Logistic Regression classifier yields the best true positive rate as 96% with a false positive rate as 0.06%.
Elsevier
Showing the best result for this search. See all results