Abstract
Smartphones are becoming increasingly ubiquitous. Like recommended best practices for personal computers, users are encouraged to install antivirus and intrusion detection software on their mobile devices. However, even with such software these devises are far from being fully protected. Given that application stores are the source of most applications, malware detection on these platforms is an important issue. Based on our intuition, which suggests that an application’s suspicious behavior will be noticed by some users and influence their feedback, we present an approach for analyzing user reviews in mobile application stores for the purpose of detecting malicious apps. The proposed method transfers an application’s text reviews to numerical features in two main steps: (1) extract domain-phrases based on external domain-specific textual corpus on computer and network security, and (2) compute three statistical features based on domain-phrases occurrences. We evaluated the proposed methods on 2,506 applications along with their 128,863 reviews collected from “Amazon AppStore”. The results show that proposed method yields an AUC of 86% in the detection of malicious applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Statista: Number of smartphones sold to end users worldwide from 2007 to 2016 (2017). https://www.statista.com/statistics/263437/global-smartphone-sales-to-end-users-since-2007/. Accessed Aug 2017
Statista: Number of available apps in the Apple App Store from July 2008 to January 2017 (2017). https://www.statista.com/statistics/263795/number-of-available-apps-in-the-apple-app-store/. Accessed Aug 2017
Statista: Number of available applications in the Google Play Store from December 2009 to June 2017 (2017). https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store/. Accessed Aug 2017
Statista: Number of available apps in the Amazon Appstore from March 2011 to April 2016 (2017). https://www.statista.com/statistics/307330/number-of-available-apps-in-the-amazon-appstore/. Accessed Aug 2017
Check Point: Viking horde: a new type of android malware on Google Play (2017). https://blog.checkpoint.com/2016/05/09/viking-horde-a-new-type-of-android-malware-on-google-play/. Accessed Aug 2017
Check Point: Dresscode android malware discovered on Google Play (2017). https://blog.checkpoint.com/2016/08/31/dresscode-android-malware-discovered-on-google-play/. Accessed Aug 2017
Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30143-1_11
Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G.A., Reynar, J.: Building a sentiment summarizer for local service reviews. In: WWW Workshop on NLP in the Information Explosion Era, vol. 14, pp. 339–348 (2008)
Portier, K., Greer, G.E., Rokach, L., Ofek, N., Wang, Y., Biyani, P., Yu, M., Banerjee, S., Zhao, K., Mitra, P., et al.: Understanding topics and sentiment in an online cancer survivor community. JNCI Monogr. 47, 195–198 (2013)
Hadad, T., Sidik, B., Ofek, N., Puzis, R., Rokach, L.: User feedback analysis for mobile malware detection. In: ICISSP, pp. 83–94 (2017)
Xie, L., Zhang, X., Seifert, J.P., Zhu, S.: pBMDS: a behavior-based malware detection system for cellphone devices. In: Proceedings of the Third ACM Conference on Wireless Network Security, pp. 37–48. ACM (2010)
Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for Android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 15–26. ACM (2011)
Shabtai, A., Tenenboim-Chekina, L., Mimran, D., Rokach, L., Shapira, B., Elovici, Y.: Mobile malware detection through analysis of deviations in application network behavior. Comput. Secur. 43, 1–18 (2014)
Aung, Z., Zaw, W.: Permission-based Android malware detection. Int. J. Sci. Technol. Res. 2, 228–234 (2013)
Zhang, Y., Yang, M., Xu, B., Yang, Z., Gu, G., Ning, P., Wang, X.S., Zang, B.: Vetting undesirable behaviors in Android apps with permission use analysis. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 611–622. ACM (2013)
Rastogi, V., Chen, Y., Jiang, X.: Droidchameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, pp. 329–334. ACM (2013)
Yang, Z., Yang, M., Zhang, Y., Gu, G., Ning, P., Wang, X.S.: Appintent: analyzing sensitive data transmission in Android for privacy leakage detection. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 1043–1054. ACM (2013)
Zheng, M., Sun, M., Lui, J.C.: Droid analytics: a signature based analytic system to collect, extract, analyze and associate Android malware. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 163–171. IEEE (2013)
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: 2007 Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, pp. 421–430. IEEE (2007)
Ofek, N., Rokach, L., Mitra, P.: Methodology for connecting nouns to their modifying adjectives. In: Gelbukh, A. (ed.) CICLing 2014. LNCS, vol. 8403, pp. 271–284. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54906-9_22
Katz, G., Ofek, N., Shapira, B.: Consent: context-based sentiment analysis. Knowl.-Based Syst. 84, 162–178 (2015)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)
Ofek, N., Shabtai, A.: Dynamic latent expertise mining in social networks. IEEE Internet Comput. 18, 20–27 (2014)
Choi, Y., Cardie, C.: Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 793–801. Association for Computational Linguistics (2008)
Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van Der Goot, E., Halkia, M., Pouliquen, B., Belyaeva, J.: Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013)
Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst. Appl. 36, 6527–6535 (2009)
Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: EMNLP, vol. 4, pp. 412–418 (2004)
Ofek, N., Katz, G., Shapira, B., Bar-Zev, Y.: Sentiment analysis in transcribed utterances. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 27–38. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_3
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Baron, N.S.: Language of the internet. In: The Stanford Handbook for Language Engineers, pp. 59–127 (2003)
Onix: Onix text retrieval toolkit (2016). http://www.lextek.com/manuals/onix/stopwords1.html. Accessed Apr 2016
Twitter: Twitter dictionary: a guide to understanding twitter lingo (2016). http://www.webopedia.com/quick_ref/Twitter_Dictionary_Guide.asp. Accessed Apr 2016
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Dunham, K.: Mobile Malware Attacks and Defense. Syngress, Maryland Heights (2008)
Syngress, E.O., O’Farrell, N.: Hackproofing Your Wireless Network (2002)
Shostack, A.: Threat Modeling: Designing for Ssecurity. Wiley, Hoboken (2014)
Nayak, U., Rao, U.H.: The InfoSec Handbook: An Introduction to Information Security. Apress, New York City (2014)
Bosworth, S., Kabay, M.E.: Computer Security Handbook. Wiley, Hoboken (2002)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528. ACM (2003)
De Marneffe, M.C., MacCartney, B., Manning, C.D., et al.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, vol. 6, pp. 449–454 (2006)
Ranveer, S., Hiray, S.: Comparative analysis of feature extraction methods of malware detection. Int. J. Comput. Appl. 120, 1–7 (2015)
VirusTotal: Virustotal, a free online service that analyzes files and urls enabling the identification of viruses, worms, trojans and other kinds of malicious content. https://www.virustotal.com/. Accessed Aug 2017
Šrndic, N., Laskov, P.: Detection of malicious pdf files based on hierarchical document structure. In: Proceedings of the 20th Annual Network & Distributed System Security Symposium (2013)
Nissim, N., Cohen, A., Moskovitch, R., Shabtai, A., Edry, M., Bar-Ad, O., Elovici, Y.: ALPD: Active learning framework for enhancing the detection of malicious pdf files. In: 2014 IEEE Joint Conference on Intelligence and Security Informatics Conference (JISIC), pp. 91–98. IEEE (2014)
Quinlan, J.R.: C4. 5: Programming for Machine Learning, p. 38. Morgan Kauffmann, Burlington (1993)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)
Walker, S.H., Duncan, D.B.: Estimation of the probability of an event as a function of several independent variables. Biometrika 54, 167–179 (1967)
Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)
Amazon: Amazon appstore (2017). http://www.amazon.com/mobile-apps/b?node=2350149011. Accessed Aug 2017
WEKA. http://www.cs.waikato.ac.nz/ml/weka/. Accessed Aug 2017
Bishop, C.M.: Pattern recognition. Mach. Learn. 128, 1–58 (2006)
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, pp. 1137–1145 (1995)
Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. Int. J. Comput. Appl. Technol. 35, 183–193 (2009)
Oommen, T., Baise, L.G., Vogel, R.M.: Sampling bias and class imbalance in maximum-likelihood logistic regression. Math. Geosci. 43, 99–120 (2011)
Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. ACM (2010)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Hadad, T., Puzis, R., Sidik, B., Ofek, N., Rokach, L. (2018). Application Marketplace Malware Detection by User Feedback Analysis. In: Mori, P., Furnell, S., Camp, O. (eds) Information Systems Security and Privacy. ICISSP 2017. Communications in Computer and Information Science, vol 867. Springer, Cham. https://doi.org/10.1007/978-3-319-93354-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-93354-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93353-5
Online ISBN: 978-3-319-93354-2
eBook Packages: Computer ScienceComputer Science (R0)