research-article

Early Detection of Spam Mobile Apps

Authors:

Suranga Seneviratne,

Aruna Seneviratne,

Mohamed Ali Kaafar,

Anirban Mahanti,

Prasant MohapatraAuthors Info & Claims

WWW '15: Proceedings of the 24th International Conference on World Wide Web

Pages 949 - 959

https://doi.org/10.1145/2736277.2741084

Published: 18 May 2015 Publication History

Abstract

Increased popularity of smartphones has attracted a large number of developers to various smartphone platforms. As a result, app markets are also populated with spam apps, which reduce the users' quality of experience and increase the workload of app market operators. Apps can be "spammy" in multiple ways including not having a specific functionality, unrelated app description or unrelated keywords and publishing similar apps several times and across diverse categories. Market operators maintain anti-spam policies and apps are removed through continuous human intervention. Through a systematic crawl of a popular app market and by identifying a set of removed apps, we propose a method to detect spam apps solely using app metadata available at the time of publication. We first propose a methodology to manually label a sample of removed apps, according to a set of checkpoint heuristics that reveal the reasons behind removal. This analysis suggests that approximately 35% of the apps being removed are very likely to be spam apps. We then map the identified heuristics to several quantifiable features and show how distinguishing these features are for spam apps. Finally, we build an Adaptive Boost classifier for early identification of spam apps using only the metadata of the apps. Our classifier achieves an accuracy over 95% with precision varying between 85%-95% and recall varying between 38%-98%. By applying the classifier on a set of apps present at the app market during our crawl, we estimate that at least 2.7% of them are spam apps.

References

[1]

Language Detection API. http://detectlanguage.com, 2013.

[2]

PrivMetrics. http://privmetrics.org/publications, 2014.

[3]

I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. D. Spyropoulos, and P. Stamatopoulos. Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. arXiv preprint cs/0009009, 2000.

[4]

H. B. Aradhye, G. K. Myers, and J. A. Herson. Image analysis for efficient categorization of image-based spam e-mail. In Proc. of 8th ICDAR, pages 914--918. IEEE, 2005.

Digital Library

[5]

S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proc. of the 10th KDD, pages 59--68. ACM, 2004.

Digital Library

[6]

F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on Twitter. In Proc. of the 2010 CEAS.

[7]

E. Blanzieri and A. Bryl. A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 29(1):63--92, 2008.

Digital Library

[8]

I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. Crowdroid: Behavior-based malware detection system for Android. In Proc. of the 1st SPSM, pages 15--26. ACM, 2011.

Digital Library

[9]

O. Canales, V. Monaco, T. Murphy, E. Zych, et al. A stylometry system for authenticating students taking online tests. Pace University, 2011.

[10]

R. Chandy and H. Gu. Identifying spam in the iOS app store. In Proc. of the 2nd WebQuality, pages 56--59. ACM, 2012.

Digital Library

[11]

P.-A. Chirita, J. Diederich, and W. Nejdl. Mailrank: Using ranking for spam detection. In Proc. of the 14th CIKM, 2005.

Digital Library

[12]

G. V. Cormack, J. M. Gómez Hidalgo, and E. P. Sánz. Spam filtering for short messages. In Proc. of the 16th CIKM, 2007.

Digital Library

[13]

J. Crussell, C. Gibler, and H. Chen. Andarwin: Scalable detection of semantically similar Android applications. In Computer Security--ESORICS 2013, pages 182--199. Springer.

[14]

H. Drucker, S. Wu, and V. N. Vapnik. Support Vector Machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048--1054, 1999.

Digital Library

[15]

M. Erdélyi, A. Garzó, and A. A. Benczür. Web spam classification: a few features worth more. In Proc. of the 2011 WebQuality, pages 27--34. ACM, 2011.

Digital Library

[16]

J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson. Offline/realtime traffic classification using semi-supervised learning. Performance Evaluation, 64(9-12):1194--1213, 2007.

Digital Library

[17]

D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Proc. of the 7th WebDB, pages 1--6. ACM, 2004.

Digital Library

[18]

R. Flesch. A new readability yardstick. Journal of Applied Psychology, 32(3):221, 1948.

[19]

Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. of the 13th ICML, 1996.

[20]

J. M. Gomez Hidalgo, G. C. Bringas, E. P. Sanz, and F. C. García. Content based SMS spam filtering. In Proc. of the 2006 DocEng, pages 107--114. ACM, 2006.

Digital Library

[21]

A. Gorla, I. Tavecchia, F. Gross, and A. Zeller. Checking app behavior against app descriptions. In Proc. of the 36th ICSE, pages 1025--1035, 2014.

Digital Library

[22]

M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang. Riskranker: Scalable and accurate zero-day Android malware detection. In Proc. of the 10th MobiSys, pages 281--294. ACM, 2012.

Digital Library

[23]

Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. of the 13th VLDB, pages 576--587. VLDB Endowment, 2004.

Digital Library

[24]

Apple. Inc. App store review guidelines. https://developer.apple.com/app-store/review/, 2014.

[25]

Apple. Inc. Common app rejections. https://developer.apple.com/app-store/review/rejections/, 2014.

[26]

AVG Threat Labs. Inc. Website safety ratings and reputation. http://www.avgthreatlabs.com/website-safety-reports, 2014.

[27]

Google. Inc. Ads. http://developer.android.com/distribute/googleplay/policies/ads.html, 2014.

[28]

Google. Inc. Google Play developer program policies. https://play.google.com/about/developer-content-policy.html, 2014.

[29]

Google. Inc. Intellectual property. http://developer.android.com/distribute/googleplay/policies/ip.html, 2014.

[30]

Google. Inc. Rating your application content for GooglePlay. https://support.google.com/googleplay/android-developer/answer/188189, 2014.

[31]

Google. Inc. Spam. http://developer.android.com/distribute/googleplay/, 2014.

[32]

Oracle. Inc. Naming a package. http://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html, 2014.

[33]

N. Jindal and B. Liu. Review spam detection. In Proc. of the 16th WWW, pages 1189--1190. ACM, 2007.

Digital Library

[34]

N. Jindal and B. Liu. Opinion spam and analysis. In Proc. of the 2008 WSDM, pages 219--230. ACM, 2008.

Digital Library

[35]

R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial intelligence, 97(1):273--324, 1997.

Digital Library

[36]

V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In Proc. of the 2006 AIRWeb.

[37]

B. Leiba, J. Ossher, V. Rajan, R. Segal, and M. N. Wegman. SMTP path analysis. In Proc. of the 2005 CEAS.

[38]

V. Metsis, I. Androutsopoulos, and G. Paliouras. Spam filtering with naive Bayes-which naive Bayes? In Proc. of 2006 CEAS.

[39]

G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In Proc. of the 2005 AIRWeb.

[40]

A. Mukherjee and B. Liu. Improving gender classification of blog authors. In Proc. of the 2010 EMNLP, pages 207--217. Association for Computational Linguistics, 2010.

Digital Library

[41]

A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proc. of the 15th WWW, pages 83--92. ACM, 2006.

Digital Library

[42]

J. Oberheide and C. Miller. Dissecting the Android bouncer. https://jon.oberheide.org/files/summercon12-bouncer.pdf, 2012.

[43]

P. Oscar and V. Roychowdbury. Leveraging social networks to fight spam. IEEE Computer, 38(4):61--68, 2005.

Digital Library

[44]

P. Pantel, D. Lin, et al. Spamcop: A spam classification & organization program. In Proc. of AAAI-98 Workshop on Learning for Text Categorization, pages 95--98, 1998.

[45]

H. Peng, C. Gates, B. Sarma, N. Li, Y. Qi, R. Potharaju, C. Nita-Rotaru, and I. Molloy. Using probabilistic generative models for ranking risks of Android apps. In Proc. of the 19th CCS, pages 241--252. ACM, 2012.

Digital Library

[46]

S. Perez. Developer spams Google Play with ripoffs of well-known apps again. http://techcrunch.com, 2013.

[47]

S. Perez. Nearly 60K low-quality apps booted from GooglePlay Store in February. http://techcrunch.com, 2013.

[48]

S. Perez. iTunes App Store now has 1.2 million apps, has seen 75 billion downloads to date. http://techcrunch.com, 2014.

[49]

D. Rowinski. Apple iOS App Store adding 20,000 apps a month, hits 40 billion downloads. http://readwrite.com, 2013.

[50]

M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 workshop, volume 62, pages 98--105, 1998.

[51]

D. Sculley and G. M. Wachman. Relaxed online SVMs for spam filtering. In Proc. of the 30th SIGIR. ACM, 2007.

Digital Library

[52]

S. Seneviratne, A. Seneviratne, D. Kaafar, A. Mahanti, and P. Mohapatra. Why my app got deleted: Detection of spam mobile apps. Technical report, NICTA, Australia, 2014.

[53]

S. Seneviratne, A. Seneviratne, P. Mohapatra, and A. Mahanti. Predicting user traits from a snapshot of apps installed on a smartphone. ACM SIGMOBILE MC2R, 18(2):1--8, 2014.

Digital Library

[54]

R. Senter and E. Smith. Automated Readability Index. Technical Report AMRL-TR-66-220, Aerospace Medical Research Laboratories, 1967.

[55]

I. Soboroff, I. Ounis, J. Lin, and I. Soboroff. Overview of the TREC-2012 microblog track. In Proc. of the 21st TREC, 2012.

[56]

N. Viennot, E. Garcia, and J. Nieh. A measurement study of Google Play. In Proc. of the 2014 SIGMETRICS. ACM, 2014.

Digital Library

[57]

A. H. Wang. Don't follow me: Spam detection in Twitter. In Proc. of the 2010 SECRYPT, pages 1--10. IEEE, 2010.

[58]

Wikipedia. Wikipedia:lists of common misspellings. http://en.wikipedia.org/wiki/, 2014.

[59]

Y. Zhou, Z. Wang, W. Zhou, and X. Jiang. Hey, you, get off of my market: Detecting malicious apps in official and alternative Android markets. In Proc. of the 2012 NDSS.

Cited By

Guo BDing YYao LLiang YYu Z(2020)The Future of False Information Detection on Social MediaACM Computing Surveys10.1145/339388053:4(1-36)Online publication date: 11-Jul-2020
https://dl.acm.org/doi/10.1145/3393880
Edwards GGonzales MSullivan MBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)Robocalling: STIRRED AND SHAKEN! - An Investigation of Calling Displays on Trust and Answer RatesProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376679(1-12)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376679
Karunanayake NRajasegaran JGunathillake ASeneviratne SJourjon G(2020)A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play StoreIEEE Transactions on Mobile Computing10.1109/TMC.2020.3007260(1-1)Online publication date: 2020
https://doi.org/10.1109/TMC.2020.3007260
Show More Cited By

Index Terms

Early Detection of Spam Mobile Apps
1. Information systems

Recommendations

Spam Mobile Apps: Characteristics, Detection, and in the Wild Analysis

The increased popularity of smartphones has attracted a large number of developers to offer various applications for the different smartphone platforms via the respective app markets. One consequence of this popularity is that the app markets are also ...
An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective
WWW '17: Proceedings of the 26th International Conference on World Wide Web

With the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper ...
Mining and characterizing hybrid apps
WAMA 2016: Proceedings of the International Workshop on App Market Analytics

Mobile apps have grown tremendously over the past few years. To capitalize on this growth and to attract more users, implementing the same mobile app for different platforms has become a common industry practice. Building the same app natively for each ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '15: Proceedings of the 24th International Conference on World Wide Web

May 2015

1460 pages

ISBN:9781450334693

General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy

Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '15

Sponsor:

IW3C2

WWW '15: 24th International World Wide Web Conference

May 18 - 22, 2015

Florence, Italy

Acceptance Rates

WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
539
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Guo BDing YYao LLiang YYu Z(2020)The Future of False Information Detection on Social MediaACM Computing Surveys10.1145/339388053:4(1-36)Online publication date: 11-Jul-2020
https://dl.acm.org/doi/10.1145/3393880
Edwards GGonzales MSullivan MBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)Robocalling: STIRRED AND SHAKEN! - An Investigation of Calling Displays on Trust and Answer RatesProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376679(1-12)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376679
Karunanayake NRajasegaran JGunathillake ASeneviratne SJourjon G(2020)A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play StoreIEEE Transactions on Mobile Computing10.1109/TMC.2020.3007260(1-1)Online publication date: 2020
https://doi.org/10.1109/TMC.2020.3007260
Dou YLi WLiu ZDong ZLuo JYu P(2019)Uncovering download fraud activities in mobile app marketsProceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3341161.3345306(671-678)Online publication date: 27-Aug-2019
https://dl.acm.org/doi/10.1145/3341161.3345306
Ververis VIsaakidis MWeber VFabian BPapadopoulos GSamaras GWeibelzahl SJannach DSantos O(2019)Shedding Light on Mobile App Store CensorshipAdjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization10.1145/3314183.3324965(193-198)Online publication date: 6-Jun-2019
https://dl.acm.org/doi/10.1145/3314183.3324965
Wang HLi HGuo Y(2019)Understanding the Evolution of Mobile App Ecosystems: A Longitudinal Measurement Study of Google PlayThe World Wide Web Conference10.1145/3308558.3313611(1988-1999)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313611
Rajasegaran JKarunanayake NGunathillake ASeneviratne SJourjon G(2019)A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit AppsThe World Wide Web Conference10.1145/3308558.3313427(3165-3171)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313427
Al-Subaihin ASarro FBlack SCapra L(2019)Empirical comparison of text-based mobile apps similarity measurement techniquesEmpirical Software Engineering10.1007/s10664-019-09726-524:6(3290-3315)Online publication date: 24-Jun-2019
https://doi.org/10.1007/s10664-019-09726-5
Wang HLi HLi LGuo YXu GZaidman AKamei YHill E(2018)Why are Android apps removed from Google Play?Proceedings of the 15th International Conference on Mining Software Repositories10.1145/3196398.3196412(231-242)Online publication date: 28-May-2018
https://dl.acm.org/doi/10.1145/3196398.3196412
Dewang RSingh A(2018)State-of-art approaches for review spammer detectionJournal of Intelligent Information Systems10.1007/s10844-017-0454-750:2(231-264)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s10844-017-0454-7
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten