research-article

A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps

Authors:

Jathushan Rajasegaran,

Naveen Karunanayake,

Ashanie Gunathillake,

Suranga Seneviratne,

Guillaume JourjonAuthors Info & Claims

WWW '19: The World Wide Web Conference

Pages 3165 - 3171

https://doi.org/10.1145/3308558.3313427

Published: 13 May 2019 Publication History

Abstract

Counterfeit apps impersonate existing popular apps in attempts to misguide users. Many counterfeits can be identified once installed, however even a tech-savvy user may struggle to detect them before installation. In this paper, we propose a novel approach of combining content embeddings and style embeddings generated from pre-trained convolutional neural networks to detect counterfeit apps. We present an analysis of approximately 1.2 million apps from Google Play Store and identify a set of potential counterfeits for top-10,000 apps. Under conservative assumptions, we were able to find 2,040 potential counterfeits that contain malware in a set of 49,608 apps that showed high similarity to one of the top-10,000 popular apps in Google Play Store. We also find 1,565 potential counterfeits asking for at least five additional dangerous permissions than the original app and 1,407 potential counterfeits having at least five extra third party advertisement libraries.

References

[1]

Charu C Aggarwal, Alexander Hinneburg, and Daniel A Keim. 2001. On the surprising behavior of distance metrics in high dimensional spaces. In ICDT. Springer.

Digital Library

[2]

Fawad Ahmed and M. Y. Siyal. 2006. A Secure and Robust Wavelet-Based Hashing Scheme for Image Authentication. In Advances in Multimedia Modeling. 51-62.

Digital Library

[3]

Pablo Fernández Alcantarilla, Jesús Nuevo, and Adrien Bartoli. 2013. Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces. In BMVC. 1-9.

[4]

Benjamin Andow, Adwait Nadkarni, Blake Bassett, William Enck, and Tao Xie. 2016. A study of grayware on Google Play. In Security and Privacy Workshops (SPW), 2016 IEEE. IEEE.

[5]

Ionut Arghire. 2017. Fake Netflix App Takes Control of Android Devices. http://www.securityweek.com/fake-netflix-app-takes-control-android-devices.

[6]

Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. In NDSS.

[7]

Artem Babenko, Anton Slesarev, Alexander Chigorin, and Victor S. Lempitsky. 2014. Neural Codes for Image Retrieval. CoRR abs/1404.1777(2014). arxiv:1404.1777

[8]

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded Up Robust Features. In Computer Vision-ECCV. Springer Berlin Heidelberg, 404-417.

Digital Library

[9]

Sean Bell and Kavita Bala. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG)(2015).

Digital Library

[10]

Sean Bell and Kavita Bala. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG)(2015).

Digital Library

[11]

Iker Burguera, Urko Zurutuza, and Simin Nadjm-Tehrani. 2011. Crowdroid: Behavior-based malware detection system for android. In Proc. of the 1st ACM workshop on Security and privacy in smartphones and mobile devices. ACM, 15-26.

Digital Library

[12]

Rishi Chandy and Haijie Gu. 2012. Identifying spam in the iOS app store. In Proc. of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. ACM, 56-59.

Digital Library

[13]

Sam Costello. 2018. How Many Apps Are in the App Store?https://www.lifewire.com/how-many-apps-in-app-store-2000252. Accessed: 2018-04-12.

[14]

Jonathan Crussell, Clint Gibler, and Hao Chen. 2013. Andarwin: Scalable detection of semantically similar Android applications. In European Symposium on Research in Computer Security. Springer, 182-199.

[15]

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576(2015).

[16]

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2015. Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. CoRR abs/1505.07376(2015). arxiv:1505.07376

Digital Library

[17]

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using Convolutional Neural Networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition.

[18]

Clint Gibler, Ryan Stevens, Jonathan Crussell, Hao Chen, Hui Zang, and Heesook Choi. 2013. Adrob: Examining the landscape and impact of Android application plagiarism. In Proc. of the 11th MobiSys. ACM.

Digital Library

[19]

Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. 2012. Riskranker: Scalable and accurate zero-day Android malware detection. In Proc. of the 10th international conference on Mobile systems, applications, and services. ACM, 281-294.

Digital Library

[20]

Muhammad Ikram, Narseo Vallina-Rodriguez, Suranga Seneviratne, Mohamed Ali Kaafar, and Vern Paxson. 2016. An Analysis of the Privacy and Security Risks of Android VPN Permission-enabled Apps. In Proc. of the 2016 ACM on Internet Measurement Conference.

Digital Library

[21]

Statista Inc.2018. Number of available applications in the Google Play Store from December 2009 to December 2017. https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store/.

[22]

Chris Jager. 2018. Scam Alert: Fake CBA And ANZ Bank Apps Discovered On Google Play Store. https://www.lifehacker.com.au/2018/09/scam-alert-fake-cba-and-anz-banking-apps-found-on-google-play-store/. Accessed: 2018-10-15.

[23]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In European conference on computer vision. Springer, 304-317.

Digital Library

[24]

Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, and Mingli Song. 2017. Neural Style Transfer: A Review. arXiv preprint arXiv:1705.04058(2017).

[25]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV. Springer, 694-711.

[26]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proc. of the 31st ICML. 1188-1196.

Digital Library

[27]

G. Levi and T. Hassner. 2016. LATCH: Learned arrangements of three patch codes. In IEEE Winter Conference on Applications of Computer Vision. 1-9.

[28]

Ping Li, Trevor J Hastie, and Kenneth W Church. 2006. Very sparse random projections. In Proc. of the 12th ACM SIGKDD. ACM, 287-296.

Digital Library

[29]

David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2 (2004).

Digital Library

[30]

Luka Malisa, Kari Kostiainen, and Srdjan Capkun. 2017. Detecting Mobile Application Spoofing Attacks by Leveraging User Visual Similarity Perception. In Proc. of the Seventh ACM on Conference on Data and Application Security and Privacy(CODASPY '17). ACM, New York, NY, USA, 289-300.

Digital Library

[31]

Luka Malisa, Kari Kostiainen, Michael Och, and Srdjan Capkun. 2016. Mobile application impersonation detection using dynamic user interface extraction. In European Symposium on Research in Computer Security. Springer, 217-237.

[32]

Shin Matsuo and Keiji Yanai. 2016. CNN-based style vector for style image retrieval. In Proc. of the 2016 ACM ICMR. ACM, 309-312.

Digital Library

[33]

David Nister and Henrik Stewenius. 2006. Scalable Recognition with a Vocabulary Tree. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

Digital Library

[34]

Sarah Perez. 2013. Developer Spams Google Play With Ripoffs Of Well-Known Apps Again. http://techcrunch.com.

[35]

Ville Satopaa, Jeannie Albrecht, David Irwin, and Barath Raghavan. 2011. Finding a” kneedle” in a haystack: Detecting knee points in system behavior. In Distributed Computing Systems Workshops (ICDCSW), 2011 31st International Conference on. IEEE.

Digital Library

[36]

Suranga Seneviratne, Harini Kolamunna, and Aruna Seneviratne. 2015. A measurement study of tracking in paid mobile applications. In Proc. of the 8th ACM Conference on Security & Privacy in Wireless and Mobile Networks. ACM, 7.

Digital Library

[37]

Suranga Seneviratne, Aruna Seneviratne, Mohamed Ali Kaafar, Anirban Mahanti, and Prasant Mohapatra. 2015. Early detection of spam mobile apps. In Proc. of the 24th International Conference on World Wide Web.

Digital Library

[38]

Suranga Seneviratne, Aruna Seneviratne, Mohamed Ali Kaafar, Anirban Mahanti, and Prasant Mohapatra. 2017. Spam Mobile Apps: Characteristics, Detection, and in the Wild Analysis. In To Appear in Proc. of Transactions on the Web (TWEB). ACM.

Digital Library

[39]

Asaf Shabtai, Uri Kanonov, Yuval Elovici, Chanan Glezer, and Yael Weiss. 2012. Andromaly: A behavioral malware detection framework for android devices. Journal of Intelligent Information Systems 38, 1 (2012), 161-190.

Digital Library

[40]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556(2014). arxiv:1409.1556

[41]

Mingshen Sun, Mengmeng Li, and John Lui. 2015. DroidEagle: Seamless detection of visually similar Android apps. In Proc. of the 8th ACM Conference on Security & Privacy in Wireless and Mobile Networks. ACM.

Digital Library

[42]

Didi Surian, Suranga Seneviratne, Aruna Seneviratne, and Sanjay Chawla. 2017. App Miscategorization Detection: A Case Study on Google Play. IEEE TKDE 29, 8 (2017).

[43]

Wei Ren Tan, Chee Seng Chan, Hernán E Aguirre, and Kiyoshi Tanaka. 2016. Ceci n'est pas une pipe: A deep convolutional network for fine-art paintings classification. In Image Processing (ICIP), 2016 IEEE International Conference on. IEEE.

[44]

Nicolas Viennot, Edward Garcia, and Jason Nieh. 2014. A measurement study of Google Play. In ACM SIGMETRICS Performance Evaluation Review. ACM.

Digital Library

[45]

Kyle Wagner. 2012. Fake Angry Birds Space Android App Is Full Of Malware. https://www.gizmodo.com.au/2012/04/psa-fake-angry-birds-space-android-app-is-full-of-malware/.

[46]

Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600-612.

Digital Library

[47]

Dong-Jie Wu, Ching-Hao Mao, Te-En Wei, Hahn-Ming Lee, and Kuo-Ping Wu. 2012. Droidmat: Android malware detection through manifest and api calls tracing. In Information Security (Asia JCIS), 2012 Seventh Asia Joint Conference on. IEEE, 62-69.

Digital Library

[48]

Zhen Xie and Sencun Zhu. 2015. AppWatcher: Unveiling the underground market of trading mobile app reviews. In Proc. of the 8th ACM Conference on Security & Privacy in Wireless and Mobile Networks. ACM.

Digital Library

[49]

Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droid-Sec: Deep learning in Android malware detection. In ACM SIGCOMM Computer Communication Review.

Digital Library

[50]

H. Zhang, M. Schmucker, and X. Niu. 2007. The Design and Application of PHABS: A Novel Benchmark Platform for Perceptual Hashing Algorithms. In IEEE International Conference on Multimedia and Expo. 887-890.

[51]

Yajin Zhou and Xuxian Jiang. 2012. Dissecting Android malware: Characterization and evolution. In Security and Privacy (SP), 2012 IEEE Symposium on. IEEE.

Digital Library

Cited By

Deng XZhang MDong XHu XLin ZXing L(2024)Detect Counterfeit Mini-apps: A Case Study on WeChatProceedings of the ACM Workshop on Secure and Trustworthy Superapps10.1145/3689941.3695773(1-10)Online publication date: 19-Nov-2024
https://dl.acm.org/doi/10.1145/3689941.3695773
Steinböck MBleier JRainer MUrban TUtz CLindorfer MSpinellis DConstantinou EBacchelli A(2024)Comparing Apples to Androids: Discovery, Retrieval, and Matching of iOS and Android Apps for Cross-Platform AnalysesProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644896(348-360)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644896
Bhardwaj DGuthoff CDabrowski AFahl SKrombholz K(2024)Mental Models, Expectations and Implications of Client-Side Scanning: An Interview Study with ExpertsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642310(1-24)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642310
Show More Cited By

Recommendations

Spam Mobile Apps: Characteristics, Detection, and in the Wild Analysis

The increased popularity of smartphones has attracted a large number of developers to offer various applications for the different smartphone platforms via the respective app markets. One consequence of this popularity is that the app markets are also ...
An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective
WWW '17: Proceedings of the 26th International Conference on World Wide Web

With the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper ...
Early Detection of Spam Mobile Apps
WWW '15: Proceedings of the 24th International Conference on World Wide Web

Increased popularity of smartphones has attracted a large number of developers to various smartphone platforms. As a result, app markets are also populated with spam apps, which reduce the users' quality of experience and increase the workload of app ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '19: The World Wide Web Conference

May 2019

3620 pages

ISBN:9781450366748

DOI:10.1145/3308558

Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '19

WWW '19: The Web Conference

May 13 - 17, 2019

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
480
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)12

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Deng XZhang MDong XHu XLin ZXing L(2024)Detect Counterfeit Mini-apps: A Case Study on WeChatProceedings of the ACM Workshop on Secure and Trustworthy Superapps10.1145/3689941.3695773(1-10)Online publication date: 19-Nov-2024
https://dl.acm.org/doi/10.1145/3689941.3695773
Steinböck MBleier JRainer MUrban TUtz CLindorfer MSpinellis DConstantinou EBacchelli A(2024)Comparing Apples to Androids: Discovery, Retrieval, and Matching of iOS and Android Apps for Cross-Platform AnalysesProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644896(348-360)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644896
Bhardwaj DGuthoff CDabrowski AFahl SKrombholz K(2024)Mental Models, Expectations and Implications of Client-Side Scanning: An Interview Study with ExpertsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642310(1-24)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642310
Alecci MSamhi JBissyande TKlein JRoychoudhury APaiva AAbreu RStorey M(2024)Revisiting Android App CategorizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639094(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639094
Wu RZhang FGuan JZheng ZDu XShen X(2022)DREW: Efficient Winograd CNN Inference with Deep ReuseProceedings of the ACM Web Conference 202210.1145/3485447.3511985(1807-1816)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3511985
Rothstein MWilbanks JBeskow LBrelsford KBrothers KDoerr MEvans BHammack-Aviran CMcGowan MTovino S(2021)Unregulated Health Research Using Mobile Devices: Ethical Considerations and Policy RecommendationsJournal of Law, Medicine & Ethics10.1177/107311052091704748:S1(196-226)Online publication date: 1-Jan-2021
https://doi.org/10.1177/1073110520917047
Hao QLuo LJan SWang GKim YKim JVigna GShi E(2021)It's Not What It Looks Like: Manipulating Perceptual Hashing based ApplicationsProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security10.1145/3460120.3484559(69-85)Online publication date: 12-Nov-2021
https://dl.acm.org/doi/10.1145/3460120.3484559
Zhao JMasood RSeneviratne S(2021)A Review of Computer Vision Methods in Network SecurityIEEE Communications Surveys & Tutorials10.1109/COMST.2021.308647523:3(1838-1878)Online publication date: Nov-2022
https://doi.org/10.1109/COMST.2021.3086475
Shelke SAgu E(2021)TBI2Vec: Traumatic Brain Injury Smartphone Sensing using AutoEncoder Embeddings2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671306(4770-4779)Online publication date: 15-Dec-2021
https://doi.org/10.1109/BigData52589.2021.9671306
Heiberg SKrips KWillemson J(2021)Mobile Voting – Still Too Risky?Financial Cryptography and Data Security. FC 2021 International Workshops10.1007/978-3-662-63958-0_23(263-278)Online publication date: 1-Mar-2021
https://dl.acm.org/doi/10.1007/978-3-662-63958-0_23
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten