skip to main content
10.1145/3589334.3645698acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention

Published: 13 May 2024 Publication History

Abstract

Conventional ad blocking and tracking prevention tools often fall short in addressing web content manipulation. Machine learning approaches have been proposed to enhance detection accuracy, yet aspects of practical deployment have frequently been overlooked. This paper introduces AdFlush, a novel machine learning model for real-world browsers. To develop AdFlush, we evaluated the effectiveness of 883 features, ultimately selecting 27 key features for optimal performance. We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98, surpassing the performance of AdGraph and WebGraph, which recorded F1 scores between 0.81 and 0.87. A six-month longitudinal study confirmed that AdFlush maintains a high F1 score above 0.97 without the need for retraining, underscoring its effectiveness.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Easylist. URL: https://easylist.to/.
[2]
Easyprivacy. URL: https://easylist.to/easylist/easyprivacy.txt.
[3]
Mshabab Alrizah, Sencun Zhu, Xinyu Xing, and Gang Wang. Errors, misunderstandings, and attacks: Analyzing the crowdsourcing process of ad-blocking systems. In Proceedings of the 2019 Internet Measurement Conference (IMC), pages 230--244, 2019.
[4]
Umar Iqbal, Steven Englehardt, and Zubair Shafiq. Fingerprinting the fingerprinters: Learning to detect browser fingerprinting behaviors. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), pages 1143--1161, 2021.
[5]
Umar Iqbal, Peter Snyder, Shitong Zhu, Benjamin Livshits, Zhiyun Qian, and Zubair Shafiq. AdGraph: A graph-based approach to ad and tracker blocking. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), pages 763--776, 2020.
[6]
Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, and Carmela Troncoso. WebGraph: Capturing advertising and tracking information flows for robust blocking. In Proceedings of the 2022 USENIX Security Symposium (Security), pages 2875--2892, 2022.
[7]
Zhiju Yang, Weiping Pei, Monchu Chen, and Chuan Yue. WTAGraph: Web tracking and advertising detection using graph neural networks. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), pages 1540--1557, 2022.
[8]
Erin LeDell and Sebastien Poirier. H2O automl: Scalable automatic machine learning. In Proceedings of the 2020 Workshop on Automatic Machine Learning (ICML), volume 2020, 2020.
[9]
Jonathan R Mayer and John C Mitchell. Third-party web tracking: Policy and technology. In Proceedings of the 2012 IEEE symposium on security and privacy (SP), pages 413--427, 2012.
[10]
Brian X. Chen. The battle for digital privacy is reshaping the internet, Sep 2021. URL: https://www.nytimes.com/2021/09/16/technology/digital-privacy.html.
[11]
Your data is shared and sold... What's being done about it?, Oct 2019. URL: https://knowledge.wharton.upenn.edu/article/data-shared-sold-whats-done/.
[12]
Mark Yep-Kui Chua, George OM Yee, Yuan Xiang Gu, and Chung-Horng Lung. Threats to online advertising and countermeasures: A technical survey. Digital Threats: Research and Practice, 1(2):1--27, 2020.
[13]
Tom Hegel. Breaking down the SEO poisoning attack: Howattackers are hijacking search results, Jan 2023. URL: https://www.sentinelone.com/blog/breaking-downthe-seo-poisoning-attack-how-attackers-are-hijacking-search-results/.
[14]
Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan Mayer, Arvind Narayanan, and Edward W. Felten. Cookies that give you away: The surveillance implications of web tracking. In Proceedings of the 2015 International Conference on World Wide Web (WWW), pages 289--299, 2015.
[15]
Georg Merzdovnik, Markus Huber, Damjan Buhov, Nick Nikiforakis, Sebastian Neuner, Martin Schmiedecker, and Edgar Weippl. Block me if you can: A largescale study of tracker-blocking tools. In Proceedings of 2017 IEEE European Symposium on Security and Privacy (Euro S&P), pages 319--333, 2017.
[16]
Raymond Hill. Ublock origin, 2020. URL: https://ublockorigin.com/.
[17]
Privacy Badger. URL: https://privacybadger.org/.
[18]
Disconnect. URL: https://disconnect.me/.
[19]
Firefox. URL: https://www.mozilla.org/en-US/firefox/features/private-browsing/.
[20]
Brave browser. URL: https://brave.com/.
[21]
Fanboy list. URL: https://fanboy.co.nz/.
[22]
Umar Iqbal, Zubair Shafiq, and Zhiyun Qian. The ad wars: retrospective measurement and analysis of anti-adblock filter lists. In Proceedings of the 2017 Internet Measurement Conference (IMC), pages 171--183, 2017.
[23]
Alexander Sjösten, Peter Snyder, Antonio Pastor, Panagiotis Papadopoulos, and Benjamin Livshits. Filter list generation for underserved regions. In Proceedings of the 2020 Web Conference (WWW), pages 1682--1692, 2020.
[24]
Steven Englehardt and Arvind Narayanan. Online tracking: A 1-million-site measurement and analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2016.
[25]
Sruti Bhagavatula, Christopher Dunn, Chris Kanich, Minaxi Gupta, and Brian Ziebart. Leveraging machine learning to improve unwanted resource filtering. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop (AISec), pages 95--102, 2014.
[26]
Hieu Le, Salma Elmalaki, Athina Markopoulou, and Zubair Shafiq. AutoFR: Automated filter rule generation for adblocking. In Proceedings of the 2023 USENIX Security Symposium (Security), pages 7535--7552, 2023.
[27]
Grant Storey, Dillon Reisman, Jonathan Mayer, and Arvind Naayana. The future of ad blocking: An analytical framework and new techniques. arXiv preprint arXiv:1705.08568, 2017.
[28]
Zainul Abi Din, Panagiotis Tigas, Samuel T. King, and Benjamin Livshits. Percival: Making in-browser perceptual ad blocking practical with deep learning. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC), pages 387--400, 2020.
[29]
Florian Tramèr, Pascal Dupré, Gili Rusak, Giancarlo Pellegrino, and Dan Boneh. Adversarial: Perceptual ad blocking meets adversarial machine learning. In Proceedings of the 2019 ACMSIGSAC Conference on Computer and Communications Security (CCS), pages 2005--2021, 2019.
[30]
Victor Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczynski, and Wouter Joosen. Tranco: A research-oriented top sites ranking hardened against manipulation. In Proceedings of the 2019 Network and Distributed System Security Symposium (NDSS), 2019.
[31]
Kunlun Ren, Weizhong Qiang, Yueming Wu, Yi Zhou, Deqing Zou, and Hai Jin. An empirical study on the effects of obfuscation on static machine learningbased malicious javascript detectors. In Proceedings of the 2023 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), pages 1420--1432, 2023.
[32]
Diana Kornbrot. Point biserial correlation. Wiley StatsRef: Statistics Reference Online, 2014.
[33]
Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. Gene selection for cancer classification using support vector machines. Machine learning, 46:389--422, 2002.
[34]
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148--175, 2015.
[35]
Caterina Labrín and Francisco Urdinez. Principal component analysis. In R for Political Data Science, pages 375--393. Chapman and Hall/CRC, 2020.
[36]
Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
[37]
Chrome.declarativenetrequest. URL: https://developer.chrome.com/docs/ extensions/reference/declarativeNetRequest/.
[38]
Javascript-obfuscator: A powerful obfuscator for javascript and node.js. URL: https://github.com/javascript-obfuscator/javascript-obfuscator.
[39]
Gnirts: Obfuscate string literals in javascript code. URL: https://github.com /anseki/gnirts.
[40]
Alan Romano, Daniel Lehmann, Michael Pradel, and Weihang Wang. Wobfuscator: Obfuscating javascript malware via opportunistic translation to webassembly. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), pages 1574--1589, 2022.
[41]
Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional gan. Advances in neural information processing systems, 32, 2019.
[42]
Peterlowe's list. URL: https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblockplus.
[43]
Warning removal list. URL: https://easylist-downloads.adblockplus.org/antiadblockfilters.txt.

Index Terms

  1. AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '24: Proceedings of the ACM Web Conference 2024
    May 2024
    4826 pages
    ISBN:9798400701719
    DOI:10.1145/3589334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. ad blocking
    2. machine learning
    3. web security
    4. web tracking

    Qualifiers

    • Research-article

    Conference

    WWW '24
    Sponsor:
    WWW '24: The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore, Singapore

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 1,268
      Total Downloads
    • Downloads (Last 12 months)1,268
    • Downloads (Last 6 weeks)68
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media