ABSTRACT
Web defacement is one of the major promotional channels for online underground economies. It regularly compromises benign websites and injects fraudulent content to promote illicit goods and services. It inflicts significant harm to websites’ reputations and revenues and may lead to legal ramifications. In this paper, we uncover proactive web defacements, where the involved web pages (i.e., landing pages) proactively deface themselves within browsers using JavaScript (i.e., control scripts). Proactive web defacements have not yet received attention from research communities, anti-hacking organizations, or law-enforcement officials. To detect proactive web defacements, we designed a practical tool, PACTOR. It runs in the browser and intercepts JavaScript API calls that manipulate web page content. It takes snapshots of the rendered HTML source code immediately before and after the intercepted API calls and detects proactive web defacements by visually comparing every two consecutive snapshots. Our two-month empirical study, using PACTOR, on 2,454 incidents of proactive web defacements shows that they can evade existing URL safety-checking tools and effectively promote the ranking of their landing pages using legitimate content/keywords. We also investigated the vendor network of proactive web defacements and reported all the involved domains to law-enforcement officials and URL-safety checking tools.
- 2016. adblockparser. https://pypi.org/project/adblockparser/.Google Scholar
- 2016. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation. https://tranco-list.eu/.Google Scholar
- 2021. EasyList. https://easylist.to/easylist/easylist.txt.Google Scholar
- 2021. EasyPrivacy. https://easylist.to/easylist/easyprivacy.txt.Google Scholar
- 2022. Baidu url security center. https://bsb.baidu.com.Google Scholar
- 2022. Google safe browsing. https://transparencyreport.google.com/safe-browsing/search.Google Scholar
- 2022. In-depth guide to how Google Search works. https://developers.google.com/search/docs/fundamentals/how-search-works.Google Scholar
- 2022. “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. https://github.com/fxsjy/jieba.Google Scholar
- 2022. Norton safe web. https://safeweb.norton.com/.Google Scholar
- 2022. OpenCV. https://opencv.org/.Google Scholar
- 2022. Philippines to Shut 175 Online Casinos, Deport 40,000 Chinese. https://www.bloomberg.com/news/articles/2022-09-27/philippines-to-shut-175-online-casinos-deport-40-000-chinese.Google Scholar
- 2022. Policies for Content Posted by Users on Search. https://www.google.com/intl/en-US/search/policies/usercontent/.Google Scholar
- 2022. Selenium. https://www.selenium.dev/.Google Scholar
- 2022. Tencent url security center. https://urlsec.qq.com.Google Scholar
- 2022. Term frequency - inverse document frequency. https://en.wikipedia.org/wiki/Tf-idf.Google Scholar
- 2022. VirusTotal. https://www.virustotal.com.Google Scholar
- 2022. Wayback Machine - Internet Archive. https://archive.org/web/.Google Scholar
- 2022. Whois: Identify for everyone. https://www.whois.com/.Google Scholar
- 2022. Zone-H.org - Unrestricted information. http://zone-h.org/.Google Scholar
- Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz. 2020. VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 1681–1698.Google ScholarDigital Library
- Alberto Bartoli and Eric Medvet. 2006. Automatic Integrity Checks for Remote Web Resources. IEEE Internet Computing 10 (2006), 56–62.Google ScholarDigital Library
- Michael Bernard, Bonnie Lida, Shannon Riley, Telia Hackler, and Karen Janzen. 2002. A comparison of popular online fonts: Which size and type is best. Usability News 4, 1 (2002).Google Scholar
- Kevin Borgolte, Christopher Kruegel, and Giovanni Vigna. 2013. Delta: Automatic Identification of Unknown Web-Based Infection Campaigns. In Proceedings of the ACM SIGSAC Conference on Computer & Communications Security. 109–120.Google ScholarDigital Library
- Kevin Borgolte, Christopher Kruegel, and Giovanni Vigna. 2015. Meerkat: Detecting website defacements through image-based object recognition. In Proceedings of the USENIX Security Symposium. 595–610.Google Scholar
- G. Davanzo, E. Medvet, and A. Bartoli. 2011. Anomaly Detection Techniques for a Web Defacement Monitoring Service. Expert Systems with Applications 38, 10 (sep 2011), 12521–12530.Google ScholarDigital Library
- G. Donato, M.S. Bartlett, J.C. Hager, P. Ekman, and T.J. Sejnowski. 1999. Classifying facial actions. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 10 (1999), 974–989.Google ScholarDigital Library
- Anthony Y. Fu, Liu Wenyin, and Xiaotie Deng. 2006. Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover’s Distance (EMD). IEEE Transactions on Dependable and Secure Computing 3, 4 (2006), 301–311.Google ScholarDigital Library
- Luca Invernizzi, Kurt Thomas, Alexandros Kapravelos, Oxana Comanescu, Jean-Michel Picod, and Elie Bursztein. 2016. Cloak of Visibility: Detecting When Machines Browse a Different Web. In Proceedings of the IEEE Symposium on Security and Privacy. 743–758.Google ScholarCross Ref
- Zhuge Jianwei, Gu Lion, Duan Haixin, and Taylor Roberts. 2015. Investigating the Chinese Online Underground Economy. In China and Cybersecurity: Espionage, Strategy, and Politics in the Digital Domain.Google Scholar
- Gene H. Kim and Eugene H. Spafford. 1994. The Design and Implementation of Tripwire: A File System Integrity Checker. In Proceedings of the ACM Conference on Computer and Communications Security. 18–29.Google Scholar
- Ieng-Fat Lam, Wei-Cheng Xiao, Szu-Chi Wang, and Kuan-Ta Chen. 2009. Counteracting Phishing Page Polymorphism: An Image Layout Analysis Approach. In Proceedings of the Advances in Information Security and Assurance. 270–279.Google ScholarDigital Library
- Wenyin Liu, Xiaotie Deng, Guanglin Huang, and A.Y. Fu. 2006. An antiphishing strategy based on visual similarity assessment. IEEE Internet Computing 10, 2 (2006), 58–65.Google ScholarDigital Library
- F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens. 1997. Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging 16, 2 (1997), 187–198.Google ScholarCross Ref
- Federico Maggi, Marco Balduzzi, Ryan Flores, Lion Gu, and Vincenzo Ciancaglini. 2018. Investigating Web Defacement Campaigns at Large. In Proceedings of the Asia Conference on Computer and Communications Security. 443–456.Google ScholarDigital Library
- Jian Mao, Jingdong Bian, Wenqian Tian, Shishi Zhu, Tao Wei, Aili Li, and Zhenkai Liang. 2018. Detecting Phishing Websites via Aggregation Analysis of Page Layouts. Procedia Computer Science 129 (2018), 224–230.Google ScholarCross Ref
- Jian Mao, Jingdong Bian, Wenqian Tian, Shishi Zhu, Tao Wei, Aili Li, and Zhenkai Liang. 2019. Phishing page detection via learning classifiers from page layout feature. EURASIP Journal on Wireless Communications and Networking 2019, 1 (2019), 1–14.Google ScholarCross Ref
- Leandro Medina and Friedrich Schneider. 2018. Shadow Economies Around the World: What Did We Learn Over the Last 20 Years¿Google Scholar
- Eric Medvet, Cyril Fillon, and Alberto Bartoli. 2007. Detection of Web Defacements by means of Genetic Programming. In Proceedings of the International Symposium on Information Assurance and Security. 227–234.Google ScholarDigital Library
- Eric Medvet, Engin Kirda, and Christopher Kruegel. 2008. Visual-Similarity-Based Phishing Detection. In Proceedings of the International Conference on Security and Privacy in Communication Netowrks.Google ScholarDigital Library
- Adam G. Pennington, John D. Strunk, John Linwood Griffin, Craig A.N. Soules, Garth R. Goodson, and Gregory R. Ganger. 2003. Storage-based Intrusion Detection: Watching Storage Activity for Suspicious Behavior. In Proceedings of the USENIX Security Symposium.Google Scholar
- Luz Rello, Martin Pielot, and Mari-Carmen Marcos. 2016. Make It Big! The Effect of Font Size and Line Spacing on Online Readability. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 3637–3648.Google Scholar
- Angelo P. E. Rosiello, Engin Kirda, Christopher Kruegel, and Fabrizio Ferrandi. 2007. A layout-similarity-based approach for detecting phishing pages. In Proceedings of the International Conference on Security and Privacy in Communications Networks and the Workshops. 454–463.Google ScholarCross Ref
- Joshua Saxe, Richard Harang, Cody Wild, and Hillary Sanders. 2018. A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content. In Proceedings of the IEEE Security and Privacy Workshops. 8–14.Google ScholarCross Ref
- M. Schneider and Shih-Fu Chang. 1996. A robust content based digital signature for image authentication. In Proceedings of the IEEE International Conference on Image Processing, Vol. 3. 227–230.Google ScholarCross Ref
- Markus Andreas Stricker and Markus Orengo. 1995. Similarity of color images. In Proceedings of the Storage and Retrieval for Image and Video Databases III, Vol. 2420. 381–392.Google Scholar
- David Y. Wang, Stefan Savage, and Geoffrey M. Voelker. 2011. Cloak and Dagger: Dynamics of Web Search Cloaking. In Proceedings of the ACM Conference on Computer and Communications Security. 477–490.Google ScholarDigital Library
- Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.Google ScholarDigital Library
- Liu Wenyin, Guanglin Huang, Liu Xiaoyue, Zhang Min, and Xiaotie Deng. 2005. Detection of Phishing Webpages Based on Visual Similarity. In Special Interest Tracks and Posters of the International Conference on World Wide Web. 1060–1061.Google Scholar
- Baoning Wu and Brian D. Davison. 2006. Detecting Semantic Cloaking on the Web. In Proceedings of the International Conference on World Wide Web. 819–828.Google ScholarDigital Library
- Ronghai Yang, Xianbo Wang, Cheng Chi, Dawei Wang, Jiawei He, Siming Pang, and Wing Cheong Lau. 2021. Scalable Detection of Promotional Website Defacements in Black Hat { SEO} Campaigns. In Proceedings of the USENIX Security Symposium. 3703–3720.Google Scholar
- Haijun Zhang, Gang Liu, Tommy W. S. Chow, and Wenyin Liu. 2011. Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach. IEEE Transactions on Neural Networks 22, 10 (2011), 1532–1546.Google ScholarDigital Library
- Weifeng Zhang, Hua Lu, Baowen Xu, and Hongji Yang. 2013. Web phishing detection based on page spatial layout similarity. Informatica 37, 3 (2013).Google Scholar
Index Terms
- The Chameleon on the Web: an Empirical Study of the Insidious Proactive Web Defacements
Recommendations
Investigating Web Defacement Campaigns at Large
ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications SecurityWebsite defacement is the practice of altering the web pages of a website after its compromise. The altered pages, calleddeface pages, can negatively affect the reputation and business of the victim site. Previous research has focused primarily on ...
An Empirical Study of Web Cookies
WWW '16: Proceedings of the 25th International Conference on World Wide WebWeb cookies are used widely by publishers and 3rd parties to track users and their behaviors. Despite the ubiquitous use of cookies, there is little prior work on their characteristics such as standard attributes, placement policies, and the knowledge ...
Anomaly detection techniques for a web defacement monitoring service
Highlights► Web site defacements are a widespread problem. ► Reactions by affected administrators are usually slow. ► Anomaly detection techniques can be used to automatically detect defacements.
AbstractThe defacement of web sites has become a widespread problem. Reaction to these incidents is often quite slow and triggered by occasional checks or even feedback from users, because organizations usually lack a systematic and round the ...
Comments