research-article

Open access

AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention

Authors:

Hyoungshick KimAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 1902 - 1913

https://doi.org/10.1145/3589334.3645698

Published: 13 May 2024 Publication History

Abstract

Conventional ad blocking and tracking prevention tools often fall short in addressing web content manipulation. Machine learning approaches have been proposed to enhance detection accuracy, yet aspects of practical deployment have frequently been overlooked. This paper introduces AdFlush, a novel machine learning model for real-world browsers. To develop AdFlush, we evaluated the effectiveness of 883 features, ultimately selecting 27 key features for optimal performance. We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98, surpassing the performance of AdGraph and WebGraph, which recorded F1 scores between 0.81 and 0.87. A six-month longitudinal study confirmed that AdFlush maintains a high F1 score above 0.97 without the need for retraining, underscoring its effectiveness.

Supplemental Material

MP4 File

Supplemental video

Download
45.68 MB

References

[1]

Easylist. URL: https://easylist.to/.

[2]

Easyprivacy. URL: https://easylist.to/easylist/easyprivacy.txt.

[3]

Mshabab Alrizah, Sencun Zhu, Xinyu Xing, and Gang Wang. Errors, misunderstandings, and attacks: Analyzing the crowdsourcing process of ad-blocking systems. In Proceedings of the 2019 Internet Measurement Conference (IMC), pages 230--244, 2019.

[4]

Umar Iqbal, Steven Englehardt, and Zubair Shafiq. Fingerprinting the fingerprinters: Learning to detect browser fingerprinting behaviors. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), pages 1143--1161, 2021.

[5]

Umar Iqbal, Peter Snyder, Shitong Zhu, Benjamin Livshits, Zhiyun Qian, and Zubair Shafiq. AdGraph: A graph-based approach to ad and tracker blocking. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), pages 763--776, 2020.

[6]

Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, and Carmela Troncoso. WebGraph: Capturing advertising and tracking information flows for robust blocking. In Proceedings of the 2022 USENIX Security Symposium (Security), pages 2875--2892, 2022.

[7]

Zhiju Yang, Weiping Pei, Monchu Chen, and Chuan Yue. WTAGraph: Web tracking and advertising detection using graph neural networks. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), pages 1540--1557, 2022.

[8]

Erin LeDell and Sebastien Poirier. H2O automl: Scalable automatic machine learning. In Proceedings of the 2020 Workshop on Automatic Machine Learning (ICML), volume 2020, 2020.

[9]

Jonathan R Mayer and John C Mitchell. Third-party web tracking: Policy and technology. In Proceedings of the 2012 IEEE symposium on security and privacy (SP), pages 413--427, 2012.

Digital Library

[10]

Brian X. Chen. The battle for digital privacy is reshaping the internet, Sep 2021. URL: https://www.nytimes.com/2021/09/16/technology/digital-privacy.html.

[11]

Your data is shared and sold... What's being done about it?, Oct 2019. URL: https://knowledge.wharton.upenn.edu/article/data-shared-sold-whats-done/.

[12]

Mark Yep-Kui Chua, George OM Yee, Yuan Xiang Gu, and Chung-Horng Lung. Threats to online advertising and countermeasures: A technical survey. Digital Threats: Research and Practice, 1(2):1--27, 2020.

Digital Library

[13]

Tom Hegel. Breaking down the SEO poisoning attack: Howattackers are hijacking search results, Jan 2023. URL: https://www.sentinelone.com/blog/breaking-downthe-seo-poisoning-attack-how-attackers-are-hijacking-search-results/.

[14]

Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan Mayer, Arvind Narayanan, and Edward W. Felten. Cookies that give you away: The surveillance implications of web tracking. In Proceedings of the 2015 International Conference on World Wide Web (WWW), pages 289--299, 2015.

Digital Library

[15]

Georg Merzdovnik, Markus Huber, Damjan Buhov, Nick Nikiforakis, Sebastian Neuner, Martin Schmiedecker, and Edgar Weippl. Block me if you can: A largescale study of tracker-blocking tools. In Proceedings of 2017 IEEE European Symposium on Security and Privacy (Euro S&P), pages 319--333, 2017.

[16]

Raymond Hill. Ublock origin, 2020. URL: https://ublockorigin.com/.

[17]

Privacy Badger. URL: https://privacybadger.org/.

[18]

Disconnect. URL: https://disconnect.me/.

[19]

Firefox. URL: https://www.mozilla.org/en-US/firefox/features/private-browsing/.

[20]

Brave browser. URL: https://brave.com/.

[21]

Fanboy list. URL: https://fanboy.co.nz/.

[22]

Umar Iqbal, Zubair Shafiq, and Zhiyun Qian. The ad wars: retrospective measurement and analysis of anti-adblock filter lists. In Proceedings of the 2017 Internet Measurement Conference (IMC), pages 171--183, 2017.

Digital Library

[23]

Alexander Sjösten, Peter Snyder, Antonio Pastor, Panagiotis Papadopoulos, and Benjamin Livshits. Filter list generation for underserved regions. In Proceedings of the 2020 Web Conference (WWW), pages 1682--1692, 2020.

Digital Library

[24]

Steven Englehardt and Arvind Narayanan. Online tracking: A 1-million-site measurement and analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2016.

Digital Library

[25]

Sruti Bhagavatula, Christopher Dunn, Chris Kanich, Minaxi Gupta, and Brian Ziebart. Leveraging machine learning to improve unwanted resource filtering. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop (AISec), pages 95--102, 2014.

Digital Library

[26]

Hieu Le, Salma Elmalaki, Athina Markopoulou, and Zubair Shafiq. AutoFR: Automated filter rule generation for adblocking. In Proceedings of the 2023 USENIX Security Symposium (Security), pages 7535--7552, 2023.

[27]

Grant Storey, Dillon Reisman, Jonathan Mayer, and Arvind Naayana. The future of ad blocking: An analytical framework and new techniques. arXiv preprint arXiv:1705.08568, 2017.

[28]

Zainul Abi Din, Panagiotis Tigas, Samuel T. King, and Benjamin Livshits. Percival: Making in-browser perceptual ad blocking practical with deep learning. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC), pages 387--400, 2020.

[29]

Florian Tramèr, Pascal Dupré, Gili Rusak, Giancarlo Pellegrino, and Dan Boneh. Adversarial: Perceptual ad blocking meets adversarial machine learning. In Proceedings of the 2019 ACMSIGSAC Conference on Computer and Communications Security (CCS), pages 2005--2021, 2019.

Digital Library

[30]

Victor Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczynski, and Wouter Joosen. Tranco: A research-oriented top sites ranking hardened against manipulation. In Proceedings of the 2019 Network and Distributed System Security Symposium (NDSS), 2019.

[31]

Kunlun Ren, Weizhong Qiang, Yueming Wu, Yi Zhou, Deqing Zou, and Hai Jin. An empirical study on the effects of obfuscation on static machine learningbased malicious javascript detectors. In Proceedings of the 2023 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), pages 1420--1432, 2023.

[32]

Diana Kornbrot. Point biserial correlation. Wiley StatsRef: Statistics Reference Online, 2014.

[33]

Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. Gene selection for cancer classification using support vector machines. Machine learning, 46:389--422, 2002.

Digital Library

[34]

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148--175, 2015.

[35]

Caterina Labrín and Francisco Urdinez. Principal component analysis. In R for Political Data Science, pages 375--393. Chapman and Hall/CRC, 2020.

[36]

Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.

[37]

Chrome.declarativenetrequest. URL: https://developer.chrome.com/docs/ extensions/reference/declarativeNetRequest/.

[38]

Javascript-obfuscator: A powerful obfuscator for javascript and node.js. URL: https://github.com/javascript-obfuscator/javascript-obfuscator.

[39]

Gnirts: Obfuscate string literals in javascript code. URL: https://github.com /anseki/gnirts.

[40]

Alan Romano, Daniel Lehmann, Michael Pradel, and Weihang Wang. Wobfuscator: Obfuscating javascript malware via opportunistic translation to webassembly. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), pages 1574--1589, 2022.

[41]

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional gan. Advances in neural information processing systems, 32, 2019.

[42]

Peterlowe's list. URL: https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblockplus.

[43]

Warning removal list. URL: https://easylist-downloads.adblockplus.org/antiadblockfilters.txt.

Index Terms

AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention
1. Security and privacy
  1. Software and application security
    1. Web application security

Recommendations

Malicious web content detection by machine learning

The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses ...
Using a Machine Learning Model for Malicious URL Type Detection
Internet of Things, Smart Spaces, and Next Generation Networks and Systems
Abstract
The world wide web, beyond its benefits, has also become a major platform for online criminal activities. Traditional protection methods against malicious URLs, such as blacklisting, remain a valid alternative, but cannot detect unknown sites, ...
New biostatistics features for detecting web bot activity on web applications
Abstract
Web bots are malicious scripts that automatically traverse the websites, fill the web form and illegally scrap the data from web sites. The never-ending threat of web bot is causing serious problems on the web applications. According ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
1,268
Total Downloads

Downloads (Last 12 months)1,268
Downloads (Last 6 weeks)68

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten