Application Marketplace Malware Detection by User Feedback Analysis

Hadad, Tal; Puzis, Rami; Sidik, Bronislav; Ofek, Nir; Rokach, Lior

doi:10.1007/978-3-319-93354-2_1

Tal Hadad¹²,
Rami Puzis¹²,
Bronislav Sidik¹²,
Nir Ofek¹² &
…
Lior Rokach¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 867))

Included in the following conference series:

International Conference on Information Systems Security and Privacy

437 Accesses
1 Citations

Abstract

Smartphones are becoming increasingly ubiquitous. Like recommended best practices for personal computers, users are encouraged to install antivirus and intrusion detection software on their mobile devices. However, even with such software these devises are far from being fully protected. Given that application stores are the source of most applications, malware detection on these platforms is an important issue. Based on our intuition, which suggests that an application’s suspicious behavior will be noticed by some users and influence their feedback, we present an approach for analyzing user reviews in mobile application stores for the purpose of detecting malicious apps. The proposed method transfers an application’s text reviews to numerical features in two main steps: (1) extract domain-phrases based on external domain-specific textual corpus on computer and network security, and (2) compute three statistical features based on domain-phrases occurrences. We evaluated the proposed methods on 2,506 applications along with their 128,863 reviews collected from “Amazon AppStore”. The results show that proposed method yields an AUC of 86% in the detection of malicious applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Statista: Number of smartphones sold to end users worldwide from 2007 to 2016 (2017). https://www.statista.com/statistics/263437/global-smartphone-sales-to-end-users-since-2007/. Accessed Aug 2017
Statista: Number of available apps in the Apple App Store from July 2008 to January 2017 (2017). https://www.statista.com/statistics/263795/number-of-available-apps-in-the-apple-app-store/. Accessed Aug 2017
Statista: Number of available applications in the Google Play Store from December 2009 to June 2017 (2017). https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store/. Accessed Aug 2017
Statista: Number of available apps in the Amazon Appstore from March 2011 to April 2016 (2017). https://www.statista.com/statistics/307330/number-of-available-apps-in-the-amazon-appstore/. Accessed Aug 2017
Check Point: Viking horde: a new type of android malware on Google Play (2017). https://blog.checkpoint.com/2016/05/09/viking-horde-a-new-type-of-android-malware-on-google-play/. Accessed Aug 2017
Check Point: Dresscode android malware discovered on Google Play (2017). https://blog.checkpoint.com/2016/08/31/dresscode-android-malware-discovered-on-google-play/. Accessed Aug 2017
Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30143-1_11
Chapter Google Scholar
Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G.A., Reynar, J.: Building a sentiment summarizer for local service reviews. In: WWW Workshop on NLP in the Information Explosion Era, vol. 14, pp. 339–348 (2008)
Google Scholar
Portier, K., Greer, G.E., Rokach, L., Ofek, N., Wang, Y., Biyani, P., Yu, M., Banerjee, S., Zhao, K., Mitra, P., et al.: Understanding topics and sentiment in an online cancer survivor community. JNCI Monogr. 47, 195–198 (2013)
Article Google Scholar
Hadad, T., Sidik, B., Ofek, N., Puzis, R., Rokach, L.: User feedback analysis for mobile malware detection. In: ICISSP, pp. 83–94 (2017)
Google Scholar
Xie, L., Zhang, X., Seifert, J.P., Zhu, S.: pBMDS: a behavior-based malware detection system for cellphone devices. In: Proceedings of the Third ACM Conference on Wireless Network Security, pp. 37–48. ACM (2010)
Google Scholar
Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for Android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 15–26. ACM (2011)
Google Scholar
Shabtai, A., Tenenboim-Chekina, L., Mimran, D., Rokach, L., Shapira, B., Elovici, Y.: Mobile malware detection through analysis of deviations in application network behavior. Comput. Secur. 43, 1–18 (2014)
Article Google Scholar
Aung, Z., Zaw, W.: Permission-based Android malware detection. Int. J. Sci. Technol. Res. 2, 228–234 (2013)
Google Scholar
Zhang, Y., Yang, M., Xu, B., Yang, Z., Gu, G., Ning, P., Wang, X.S., Zang, B.: Vetting undesirable behaviors in Android apps with permission use analysis. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 611–622. ACM (2013)
Google Scholar
Rastogi, V., Chen, Y., Jiang, X.: Droidchameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, pp. 329–334. ACM (2013)
Google Scholar
Yang, Z., Yang, M., Zhang, Y., Gu, G., Ning, P., Wang, X.S.: Appintent: analyzing sensitive data transmission in Android for privacy leakage detection. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 1043–1054. ACM (2013)
Google Scholar
Zheng, M., Sun, M., Lui, J.C.: Droid analytics: a signature based analytic system to collect, extract, analyze and associate Android malware. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 163–171. IEEE (2013)
Google Scholar
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: 2007 Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, pp. 421–430. IEEE (2007)
Google Scholar
Ofek, N., Rokach, L., Mitra, P.: Methodology for connecting nouns to their modifying adjectives. In: Gelbukh, A. (ed.) CICLing 2014. LNCS, vol. 8403, pp. 271–284. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54906-9_22
Chapter Google Scholar
Katz, G., Ofek, N., Shapira, B.: Consent: context-based sentiment analysis. Knowl.-Based Syst. 84, 162–178 (2015)
Article Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)
Google Scholar
Ofek, N., Shabtai, A.: Dynamic latent expertise mining in social networks. IEEE Internet Comput. 18, 20–27 (2014)
Article Google Scholar
Choi, Y., Cardie, C.: Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 793–801. Association for Computational Linguistics (2008)
Google Scholar
Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van Der Goot, E., Halkia, M., Pouliquen, B., Belyaeva, J.: Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013)
Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst. Appl. 36, 6527–6535 (2009)
Article Google Scholar
Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: EMNLP, vol. 4, pp. 412–418 (2004)
Google Scholar
Ofek, N., Katz, G., Shapira, B., Bar-Zev, Y.: Sentiment analysis in transcribed utterances. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 27–38. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_3
Chapter Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Baron, N.S.: Language of the internet. In: The Stanford Handbook for Language Engineers, pp. 59–127 (2003)
Google Scholar
Onix: Onix text retrieval toolkit (2016). http://www.lextek.com/manuals/onix/stopwords1.html. Accessed Apr 2016
Twitter: Twitter dictionary: a guide to understanding twitter lingo (2016). http://www.webopedia.com/quick_ref/Twitter_Dictionary_Guide.asp. Accessed Apr 2016
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Article Google Scholar
Dunham, K.: Mobile Malware Attacks and Defense. Syngress, Maryland Heights (2008)
Google Scholar
Syngress, E.O., O’Farrell, N.: Hackproofing Your Wireless Network (2002)
Google Scholar
Shostack, A.: Threat Modeling: Designing for Ssecurity. Wiley, Hoboken (2014)
Google Scholar
Nayak, U., Rao, U.H.: The InfoSec Handbook: An Introduction to Information Security. Apress, New York City (2014)
Google Scholar
Bosworth, S., Kabay, M.E.: Computer Security Handbook. Wiley, Hoboken (2002)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Google Scholar
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528. ACM (2003)
Google Scholar
De Marneffe, M.C., MacCartney, B., Manning, C.D., et al.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, vol. 6, pp. 449–454 (2006)
Google Scholar
Ranveer, S., Hiray, S.: Comparative analysis of feature extraction methods of malware detection. Int. J. Comput. Appl. 120, 1–7 (2015)
Google Scholar
VirusTotal: Virustotal, a free online service that analyzes files and urls enabling the identification of viruses, worms, trojans and other kinds of malicious content. https://www.virustotal.com/. Accessed Aug 2017
Šrndic, N., Laskov, P.: Detection of malicious pdf files based on hierarchical document structure. In: Proceedings of the 20th Annual Network & Distributed System Security Symposium (2013)
Google Scholar
Nissim, N., Cohen, A., Moskovitch, R., Shabtai, A., Edry, M., Bar-Ad, O., Elovici, Y.: ALPD: Active learning framework for enhancing the detection of malicious pdf files. In: 2014 IEEE Joint Conference on Intelligence and Security Informatics Conference (JISIC), pp. 91–98. IEEE (2014)
Google Scholar
Quinlan, J.R.: C4. 5: Programming for Machine Learning, p. 38. Morgan Kauffmann, Burlington (1993)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)
Article Google Scholar
Walker, S.H., Duncan, D.B.: Estimation of the probability of an event as a function of several independent variables. Biometrika 54, 167–179 (1967)
Article MathSciNet Google Scholar
Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)
Article Google Scholar
Amazon: Amazon appstore (2017). http://www.amazon.com/mobile-apps/b?node=2350149011. Accessed Aug 2017
WEKA. http://www.cs.waikato.ac.nz/ml/weka/. Accessed Aug 2017
Bishop, C.M.: Pattern recognition. Mach. Learn. 128, 1–58 (2006)
Google Scholar
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, pp. 1137–1145 (1995)
Google Scholar
Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. Int. J. Comput. Appl. Technol. 35, 183–193 (2009)
Article Google Scholar
Oommen, T., Baise, L.G., Vogel, R.M.: Sampling bias and class imbalance in maximum-likelihood logistic regression. Math. Geosci. 43, 99–120 (2011)
Article Google Scholar
Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. ACM (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beersheba, Israel
Tal Hadad, Rami Puzis, Bronislav Sidik, Nir Ofek & Lior Rokach

Authors

Tal Hadad
View author publications
You can also search for this author in PubMed Google Scholar
Rami Puzis
View author publications
You can also search for this author in PubMed Google Scholar
Bronislav Sidik
View author publications
You can also search for this author in PubMed Google Scholar
Nir Ofek
View author publications
You can also search for this author in PubMed Google Scholar
Lior Rokach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Tal Hadad , Rami Puzis , Bronislav Sidik , Nir Ofek or Lior Rokach .

Editor information

Editors and Affiliations

Consiglio Nazionale delle Ricerche, Pisa, Italy
Paolo Mori
Plymouth University, Plymouth, United Kingdom
Steven Furnell
MODESTE/ESEO, Angers, France
Olivier Camp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hadad, T., Puzis, R., Sidik, B., Ofek, N., Rokach, L. (2018). Application Marketplace Malware Detection by User Feedback Analysis. In: Mori, P., Furnell, S., Camp, O. (eds) Information Systems Security and Privacy. ICISSP 2017. Communications in Computer and Information Science, vol 867. Springer, Cham. https://doi.org/10.1007/978-3-319-93354-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-93354-2_1
Published: 09 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93353-5
Online ISBN: 978-3-319-93354-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics