skip to main content
10.1145/2736277.2741084acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Early Detection of Spam Mobile Apps

Published: 18 May 2015 Publication History

Abstract

Increased popularity of smartphones has attracted a large number of developers to various smartphone platforms. As a result, app markets are also populated with spam apps, which reduce the users' quality of experience and increase the workload of app market operators. Apps can be "spammy" in multiple ways including not having a specific functionality, unrelated app description or unrelated keywords and publishing similar apps several times and across diverse categories. Market operators maintain anti-spam policies and apps are removed through continuous human intervention. Through a systematic crawl of a popular app market and by identifying a set of removed apps, we propose a method to detect spam apps solely using app metadata available at the time of publication. We first propose a methodology to manually label a sample of removed apps, according to a set of checkpoint heuristics that reveal the reasons behind removal. This analysis suggests that approximately 35% of the apps being removed are very likely to be spam apps. We then map the identified heuristics to several quantifiable features and show how distinguishing these features are for spam apps. Finally, we build an Adaptive Boost classifier for early identification of spam apps using only the metadata of the apps. Our classifier achieves an accuracy over 95% with precision varying between 85%-95% and recall varying between 38%-98%. By applying the classifier on a set of apps present at the app market during our crawl, we estimate that at least 2.7% of them are spam apps.

References

[1]
Language Detection API. http://detectlanguage.com, 2013.
[2]
PrivMetrics. http://privmetrics.org/publications, 2014.
[3]
I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. D. Spyropoulos, and P. Stamatopoulos. Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. arXiv preprint cs/0009009, 2000.
[4]
H. B. Aradhye, G. K. Myers, and J. A. Herson. Image analysis for efficient categorization of image-based spam e-mail. In Proc. of 8th ICDAR, pages 914--918. IEEE, 2005.
[5]
S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proc. of the 10th KDD, pages 59--68. ACM, 2004.
[6]
F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on Twitter. In Proc. of the 2010 CEAS.
[7]
E. Blanzieri and A. Bryl. A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 29(1):63--92, 2008.
[8]
I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. Crowdroid: Behavior-based malware detection system for Android. In Proc. of the 1st SPSM, pages 15--26. ACM, 2011.
[9]
O. Canales, V. Monaco, T. Murphy, E. Zych, et al. A stylometry system for authenticating students taking online tests. Pace University, 2011.
[10]
R. Chandy and H. Gu. Identifying spam in the iOS app store. In Proc. of the 2nd WebQuality, pages 56--59. ACM, 2012.
[11]
P.-A. Chirita, J. Diederich, and W. Nejdl. Mailrank: Using ranking for spam detection. In Proc. of the 14th CIKM, 2005.
[12]
G. V. Cormack, J. M. Gómez Hidalgo, and E. P. Sánz. Spam filtering for short messages. In Proc. of the 16th CIKM, 2007.
[13]
J. Crussell, C. Gibler, and H. Chen. Andarwin: Scalable detection of semantically similar Android applications. In Computer Security--ESORICS 2013, pages 182--199. Springer.
[14]
H. Drucker, S. Wu, and V. N. Vapnik. Support Vector Machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048--1054, 1999.
[15]
M. Erdélyi, A. Garzó, and A. A. Benczür. Web spam classification: a few features worth more. In Proc. of the 2011 WebQuality, pages 27--34. ACM, 2011.
[16]
J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson. Offline/realtime traffic classification using semi-supervised learning. Performance Evaluation, 64(9-12):1194--1213, 2007.
[17]
D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Proc. of the 7th WebDB, pages 1--6. ACM, 2004.
[18]
R. Flesch. A new readability yardstick. Journal of Applied Psychology, 32(3):221, 1948.
[19]
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. of the 13th ICML, 1996.
[20]
J. M. Gomez Hidalgo, G. C. Bringas, E. P. Sanz, and F. C. García. Content based SMS spam filtering. In Proc. of the 2006 DocEng, pages 107--114. ACM, 2006.
[21]
A. Gorla, I. Tavecchia, F. Gross, and A. Zeller. Checking app behavior against app descriptions. In Proc. of the 36th ICSE, pages 1025--1035, 2014.
[22]
M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang. Riskranker: Scalable and accurate zero-day Android malware detection. In Proc. of the 10th MobiSys, pages 281--294. ACM, 2012.
[23]
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. of the 13th VLDB, pages 576--587. VLDB Endowment, 2004.
[24]
Apple. Inc. App store review guidelines. https://developer.apple.com/app-store/review/, 2014.
[25]
Apple. Inc. Common app rejections. https://developer.apple.com/app-store/review/rejections/, 2014.
[26]
AVG Threat Labs. Inc. Website safety ratings and reputation. http://www.avgthreatlabs.com/website-safety-reports, 2014.
[27]
Google. Inc. Ads. http://developer.android.com/distribute/googleplay/policies/ads.html, 2014.
[28]
Google. Inc. Google Play developer program policies. https://play.google.com/about/developer-content-policy.html, 2014.
[29]
Google. Inc. Intellectual property. http://developer.android.com/distribute/googleplay/policies/ip.html, 2014.
[30]
Google. Inc. Rating your application content for GooglePlay. https://support.google.com/googleplay/android-developer/answer/188189, 2014.
[31]
Google. Inc. Spam. http://developer.android.com/distribute/googleplay/, 2014.
[32]
Oracle. Inc. Naming a package. http://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html, 2014.
[33]
N. Jindal and B. Liu. Review spam detection. In Proc. of the 16th WWW, pages 1189--1190. ACM, 2007.
[34]
N. Jindal and B. Liu. Opinion spam and analysis. In Proc. of the 2008 WSDM, pages 219--230. ACM, 2008.
[35]
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial intelligence, 97(1):273--324, 1997.
[36]
V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In Proc. of the 2006 AIRWeb.
[37]
B. Leiba, J. Ossher, V. Rajan, R. Segal, and M. N. Wegman. SMTP path analysis. In Proc. of the 2005 CEAS.
[38]
V. Metsis, I. Androutsopoulos, and G. Paliouras. Spam filtering with naive Bayes-which naive Bayes? In Proc. of 2006 CEAS.
[39]
G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In Proc. of the 2005 AIRWeb.
[40]
A. Mukherjee and B. Liu. Improving gender classification of blog authors. In Proc. of the 2010 EMNLP, pages 207--217. Association for Computational Linguistics, 2010.
[41]
A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proc. of the 15th WWW, pages 83--92. ACM, 2006.
[42]
J. Oberheide and C. Miller. Dissecting the Android bouncer. https://jon.oberheide.org/files/summercon12-bouncer.pdf, 2012.
[43]
P. Oscar and V. Roychowdbury. Leveraging social networks to fight spam. IEEE Computer, 38(4):61--68, 2005.
[44]
P. Pantel, D. Lin, et al. Spamcop: A spam classification & organization program. In Proc. of AAAI-98 Workshop on Learning for Text Categorization, pages 95--98, 1998.
[45]
H. Peng, C. Gates, B. Sarma, N. Li, Y. Qi, R. Potharaju, C. Nita-Rotaru, and I. Molloy. Using probabilistic generative models for ranking risks of Android apps. In Proc. of the 19th CCS, pages 241--252. ACM, 2012.
[46]
S. Perez. Developer spams Google Play with ripoffs of well-known apps again. http://techcrunch.com, 2013.
[47]
S. Perez. Nearly 60K low-quality apps booted from GooglePlay Store in February. http://techcrunch.com, 2013.
[48]
S. Perez. iTunes App Store now has 1.2 million apps, has seen 75 billion downloads to date. http://techcrunch.com, 2014.
[49]
D. Rowinski. Apple iOS App Store adding 20,000 apps a month, hits 40 billion downloads. http://readwrite.com, 2013.
[50]
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 workshop, volume 62, pages 98--105, 1998.
[51]
D. Sculley and G. M. Wachman. Relaxed online SVMs for spam filtering. In Proc. of the 30th SIGIR. ACM, 2007.
[52]
S. Seneviratne, A. Seneviratne, D. Kaafar, A. Mahanti, and P. Mohapatra. Why my app got deleted: Detection of spam mobile apps. Technical report, NICTA, Australia, 2014.
[53]
S. Seneviratne, A. Seneviratne, P. Mohapatra, and A. Mahanti. Predicting user traits from a snapshot of apps installed on a smartphone. ACM SIGMOBILE MC2R, 18(2):1--8, 2014.
[54]
R. Senter and E. Smith. Automated Readability Index. Technical Report AMRL-TR-66-220, Aerospace Medical Research Laboratories, 1967.
[55]
I. Soboroff, I. Ounis, J. Lin, and I. Soboroff. Overview of the TREC-2012 microblog track. In Proc. of the 21st TREC, 2012.
[56]
N. Viennot, E. Garcia, and J. Nieh. A measurement study of Google Play. In Proc. of the 2014 SIGMETRICS. ACM, 2014.
[57]
A. H. Wang. Don't follow me: Spam detection in Twitter. In Proc. of the 2010 SECRYPT, pages 1--10. IEEE, 2010.
[58]
Wikipedia. Wikipedia:lists of common misspellings. http://en.wikipedia.org/wiki/, 2014.
[59]
Y. Zhou, Z. Wang, W. Zhou, and X. Jiang. Hey, you, get off of my market: Detecting malicious apps in official and alternative Android markets. In Proc. of the 2012 NDSS.

Cited By

View all
  • (2020)The Future of False Information Detection on Social MediaACM Computing Surveys10.1145/339388053:4(1-36)Online publication date: 11-Jul-2020
  • (2020)Robocalling: STIRRED AND SHAKEN! - An Investigation of Calling Displays on Trust and Answer RatesProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376679(1-12)Online publication date: 21-Apr-2020
  • (2020)A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play StoreIEEE Transactions on Mobile Computing10.1109/TMC.2020.3007260(1-1)Online publication date: 2020
  • Show More Cited By

Index Terms

  1. Early Detection of Spam Mobile Apps

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '15: Proceedings of the 24th International Conference on World Wide Web
    May 2015
    1460 pages
    ISBN:9781450334693

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 18 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. android
    2. mobile apps
    3. spam
    4. spam apps

    Qualifiers

    • Research-article

    Conference

    WWW '15
    Sponsor:
    • IW3C2

    Acceptance Rates

    WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)The Future of False Information Detection on Social MediaACM Computing Surveys10.1145/339388053:4(1-36)Online publication date: 11-Jul-2020
    • (2020)Robocalling: STIRRED AND SHAKEN! - An Investigation of Calling Displays on Trust and Answer RatesProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376679(1-12)Online publication date: 21-Apr-2020
    • (2020)A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play StoreIEEE Transactions on Mobile Computing10.1109/TMC.2020.3007260(1-1)Online publication date: 2020
    • (2019)Uncovering download fraud activities in mobile app marketsProceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3341161.3345306(671-678)Online publication date: 27-Aug-2019
    • (2019)Shedding Light on Mobile App Store CensorshipAdjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization10.1145/3314183.3324965(193-198)Online publication date: 6-Jun-2019
    • (2019)Understanding the Evolution of Mobile App Ecosystems: A Longitudinal Measurement Study of Google PlayThe World Wide Web Conference10.1145/3308558.3313611(1988-1999)Online publication date: 13-May-2019
    • (2019)A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit AppsThe World Wide Web Conference10.1145/3308558.3313427(3165-3171)Online publication date: 13-May-2019
    • (2019)Empirical comparison of text-based mobile apps similarity measurement techniquesEmpirical Software Engineering10.1007/s10664-019-09726-524:6(3290-3315)Online publication date: 24-Jun-2019
    • (2018)Why are Android apps removed from Google Play?Proceedings of the 15th International Conference on Mining Software Repositories10.1145/3196398.3196412(231-242)Online publication date: 28-May-2018
    • (2018)State-of-art approaches for review spammer detectionJournal of Intelligent Information Systems10.1007/s10844-017-0454-750:2(231-264)Online publication date: 1-Apr-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media