skip to main content
10.1145/3543507.3583377acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

The Chameleon on the Web: an Empirical Study of the Insidious Proactive Web Defacements

Published:30 April 2023Publication History

ABSTRACT

Web defacement is one of the major promotional channels for online underground economies. It regularly compromises benign websites and injects fraudulent content to promote illicit goods and services. It inflicts significant harm to websites’ reputations and revenues and may lead to legal ramifications. In this paper, we uncover proactive web defacements, where the involved web pages (i.e., landing pages) proactively deface themselves within browsers using JavaScript (i.e., control scripts). Proactive web defacements have not yet received attention from research communities, anti-hacking organizations, or law-enforcement officials. To detect proactive web defacements, we designed a practical tool, PACTOR. It runs in the browser and intercepts JavaScript API calls that manipulate web page content. It takes snapshots of the rendered HTML source code immediately before and after the intercepted API calls and detects proactive web defacements by visually comparing every two consecutive snapshots. Our two-month empirical study, using PACTOR, on 2,454 incidents of proactive web defacements shows that they can evade existing URL safety-checking tools and effectively promote the ranking of their landing pages using legitimate content/keywords. We also investigated the vendor network of proactive web defacements and reported all the involved domains to law-enforcement officials and URL-safety checking tools.

References

  1. 2016. adblockparser. https://pypi.org/project/adblockparser/.Google ScholarGoogle Scholar
  2. 2016. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation. https://tranco-list.eu/.Google ScholarGoogle Scholar
  3. 2021. EasyList. https://easylist.to/easylist/easylist.txt.Google ScholarGoogle Scholar
  4. 2021. EasyPrivacy. https://easylist.to/easylist/easyprivacy.txt.Google ScholarGoogle Scholar
  5. 2022. Baidu url security center. https://bsb.baidu.com.Google ScholarGoogle Scholar
  6. 2022. Google safe browsing. https://transparencyreport.google.com/safe-browsing/search.Google ScholarGoogle Scholar
  7. 2022. In-depth guide to how Google Search works. https://developers.google.com/search/docs/fundamentals/how-search-works.Google ScholarGoogle Scholar
  8. 2022. “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. https://github.com/fxsjy/jieba.Google ScholarGoogle Scholar
  9. 2022. Norton safe web. https://safeweb.norton.com/.Google ScholarGoogle Scholar
  10. 2022. OpenCV. https://opencv.org/.Google ScholarGoogle Scholar
  11. 2022. Philippines to Shut 175 Online Casinos, Deport 40,000 Chinese. https://www.bloomberg.com/news/articles/2022-09-27/philippines-to-shut-175-online-casinos-deport-40-000-chinese.Google ScholarGoogle Scholar
  12. 2022. Policies for Content Posted by Users on Search. https://www.google.com/intl/en-US/search/policies/usercontent/.Google ScholarGoogle Scholar
  13. 2022. Selenium. https://www.selenium.dev/.Google ScholarGoogle Scholar
  14. 2022. Tencent url security center. https://urlsec.qq.com.Google ScholarGoogle Scholar
  15. 2022. Term frequency - inverse document frequency. https://en.wikipedia.org/wiki/Tf-idf.Google ScholarGoogle Scholar
  16. 2022. VirusTotal. https://www.virustotal.com.Google ScholarGoogle Scholar
  17. 2022. Wayback Machine - Internet Archive. https://archive.org/web/.Google ScholarGoogle Scholar
  18. 2022. Whois: Identify for everyone. https://www.whois.com/.Google ScholarGoogle Scholar
  19. 2022. Zone-H.org - Unrestricted information. http://zone-h.org/.Google ScholarGoogle Scholar
  20. Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz. 2020. VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 1681–1698.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alberto Bartoli and Eric Medvet. 2006. Automatic Integrity Checks for Remote Web Resources. IEEE Internet Computing 10 (2006), 56–62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Michael Bernard, Bonnie Lida, Shannon Riley, Telia Hackler, and Karen Janzen. 2002. A comparison of popular online fonts: Which size and type is best. Usability News 4, 1 (2002).Google ScholarGoogle Scholar
  23. Kevin Borgolte, Christopher Kruegel, and Giovanni Vigna. 2013. Delta: Automatic Identification of Unknown Web-Based Infection Campaigns. In Proceedings of the ACM SIGSAC Conference on Computer & Communications Security. 109–120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kevin Borgolte, Christopher Kruegel, and Giovanni Vigna. 2015. Meerkat: Detecting website defacements through image-based object recognition. In Proceedings of the USENIX Security Symposium. 595–610.Google ScholarGoogle Scholar
  25. G. Davanzo, E. Medvet, and A. Bartoli. 2011. Anomaly Detection Techniques for a Web Defacement Monitoring Service. Expert Systems with Applications 38, 10 (sep 2011), 12521–12530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Donato, M.S. Bartlett, J.C. Hager, P. Ekman, and T.J. Sejnowski. 1999. Classifying facial actions. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 10 (1999), 974–989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Anthony Y. Fu, Liu Wenyin, and Xiaotie Deng. 2006. Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover’s Distance (EMD). IEEE Transactions on Dependable and Secure Computing 3, 4 (2006), 301–311.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Luca Invernizzi, Kurt Thomas, Alexandros Kapravelos, Oxana Comanescu, Jean-Michel Picod, and Elie Bursztein. 2016. Cloak of Visibility: Detecting When Machines Browse a Different Web. In Proceedings of the IEEE Symposium on Security and Privacy. 743–758.Google ScholarGoogle ScholarCross RefCross Ref
  29. Zhuge Jianwei, Gu Lion, Duan Haixin, and Taylor Roberts. 2015. Investigating the Chinese Online Underground Economy. In China and Cybersecurity: Espionage, Strategy, and Politics in the Digital Domain.Google ScholarGoogle Scholar
  30. Gene H. Kim and Eugene H. Spafford. 1994. The Design and Implementation of Tripwire: A File System Integrity Checker. In Proceedings of the ACM Conference on Computer and Communications Security. 18–29.Google ScholarGoogle Scholar
  31. Ieng-Fat Lam, Wei-Cheng Xiao, Szu-Chi Wang, and Kuan-Ta Chen. 2009. Counteracting Phishing Page Polymorphism: An Image Layout Analysis Approach. In Proceedings of the Advances in Information Security and Assurance. 270–279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wenyin Liu, Xiaotie Deng, Guanglin Huang, and A.Y. Fu. 2006. An antiphishing strategy based on visual similarity assessment. IEEE Internet Computing 10, 2 (2006), 58–65.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens. 1997. Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging 16, 2 (1997), 187–198.Google ScholarGoogle ScholarCross RefCross Ref
  34. Federico Maggi, Marco Balduzzi, Ryan Flores, Lion Gu, and Vincenzo Ciancaglini. 2018. Investigating Web Defacement Campaigns at Large. In Proceedings of the Asia Conference on Computer and Communications Security. 443–456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jian Mao, Jingdong Bian, Wenqian Tian, Shishi Zhu, Tao Wei, Aili Li, and Zhenkai Liang. 2018. Detecting Phishing Websites via Aggregation Analysis of Page Layouts. Procedia Computer Science 129 (2018), 224–230.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jian Mao, Jingdong Bian, Wenqian Tian, Shishi Zhu, Tao Wei, Aili Li, and Zhenkai Liang. 2019. Phishing page detection via learning classifiers from page layout feature. EURASIP Journal on Wireless Communications and Networking 2019, 1 (2019), 1–14.Google ScholarGoogle ScholarCross RefCross Ref
  37. Leandro Medina and Friedrich Schneider. 2018. Shadow Economies Around the World: What Did We Learn Over the Last 20 Years¿Google ScholarGoogle Scholar
  38. Eric Medvet, Cyril Fillon, and Alberto Bartoli. 2007. Detection of Web Defacements by means of Genetic Programming. In Proceedings of the International Symposium on Information Assurance and Security. 227–234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Eric Medvet, Engin Kirda, and Christopher Kruegel. 2008. Visual-Similarity-Based Phishing Detection. In Proceedings of the International Conference on Security and Privacy in Communication Netowrks.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Adam G. Pennington, John D. Strunk, John Linwood Griffin, Craig A.N. Soules, Garth R. Goodson, and Gregory R. Ganger. 2003. Storage-based Intrusion Detection: Watching Storage Activity for Suspicious Behavior. In Proceedings of the USENIX Security Symposium.Google ScholarGoogle Scholar
  41. Luz Rello, Martin Pielot, and Mari-Carmen Marcos. 2016. Make It Big! The Effect of Font Size and Line Spacing on Online Readability. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 3637–3648.Google ScholarGoogle Scholar
  42. Angelo P. E. Rosiello, Engin Kirda, Christopher Kruegel, and Fabrizio Ferrandi. 2007. A layout-similarity-based approach for detecting phishing pages. In Proceedings of the International Conference on Security and Privacy in Communications Networks and the Workshops. 454–463.Google ScholarGoogle ScholarCross RefCross Ref
  43. Joshua Saxe, Richard Harang, Cody Wild, and Hillary Sanders. 2018. A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content. In Proceedings of the IEEE Security and Privacy Workshops. 8–14.Google ScholarGoogle ScholarCross RefCross Ref
  44. M. Schneider and Shih-Fu Chang. 1996. A robust content based digital signature for image authentication. In Proceedings of the IEEE International Conference on Image Processing, Vol. 3. 227–230.Google ScholarGoogle ScholarCross RefCross Ref
  45. Markus Andreas Stricker and Markus Orengo. 1995. Similarity of color images. In Proceedings of the Storage and Retrieval for Image and Video Databases III, Vol. 2420. 381–392.Google ScholarGoogle Scholar
  46. David Y. Wang, Stefan Savage, and Geoffrey M. Voelker. 2011. Cloak and Dagger: Dynamics of Web Search Cloaking. In Proceedings of the ACM Conference on Computer and Communications Security. 477–490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Liu Wenyin, Guanglin Huang, Liu Xiaoyue, Zhang Min, and Xiaotie Deng. 2005. Detection of Phishing Webpages Based on Visual Similarity. In Special Interest Tracks and Posters of the International Conference on World Wide Web. 1060–1061.Google ScholarGoogle Scholar
  49. Baoning Wu and Brian D. Davison. 2006. Detecting Semantic Cloaking on the Web. In Proceedings of the International Conference on World Wide Web. 819–828.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ronghai Yang, Xianbo Wang, Cheng Chi, Dawei Wang, Jiawei He, Siming Pang, and Wing Cheong Lau. 2021. Scalable Detection of Promotional Website Defacements in Black Hat { SEO} Campaigns. In Proceedings of the USENIX Security Symposium. 3703–3720.Google ScholarGoogle Scholar
  51. Haijun Zhang, Gang Liu, Tommy W. S. Chow, and Wenyin Liu. 2011. Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach. IEEE Transactions on Neural Networks 22, 10 (2011), 1532–1546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Weifeng Zhang, Hua Lu, Baowen Xu, and Hongji Yang. 2013. Web phishing detection based on page spatial layout similarity. Informatica 37, 3 (2013).Google ScholarGoogle Scholar

Index Terms

  1. The Chameleon on the Web: an Empirical Study of the Insidious Proactive Web Defacements

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '23: Proceedings of the ACM Web Conference 2023
      April 2023
      4293 pages
      ISBN:9781450394161
      DOI:10.1145/3543507

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 April 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%
    • Article Metrics

      • Downloads (Last 12 months)173
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format