research-article

Where are you taking me?Understanding Abusive Traffic Distribution Systems

Authors:

Brian Kondracki,

Nick Nikiforakis,

Nicolas ChristinAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 3613 - 3624

https://doi.org/10.1145/3442381.3450071

Published: 03 June 2021 Publication History

Abstract

Illicit website owners frequently rely on traffic distribution systems (TDSs) operated by less-than-scrupulous advertising networks to acquire user traffic. While researchers have described a number of case studies on various TDSs or the businesses they serve, we still lack an understanding of how users are differentiated in these ecosystems, how different illicit activities frequently leverage the same advertisement networks and, subsequently, the same malicious advertisers. We design ODIN (Observatory of Dynamic Illicit ad Networks), the first system to study cloaking, user differentiation and business integration at the same time in four different types of traffic sources: typosquatting, copyright-infringing movie streaming, ad-based URL shortening, and illicit online pharmacy websites.

ODIN performed 874,494 scrapes over two months (June 19, 2019–August 24, 2019), posing as six different types of users (e.g., mobile, desktop, and crawler) and accumulating over 2TB of data. We observed 81% more malicious pages compared to using only the best performing crawl profile by itself. Three of the traffic sources we study redirect users to the same traffic broker domain names up to 44% of the time and all of them often expose users to the same malicious advertisers. Our experiments show that novel cloaking techniques could decrease by half the number of malicious pages observed. Worryingly, popular blacklists do not just suffer from the lack of coverage and delayed detection, but miss the vast majority of malicious pages targeting mobile users. We use these findings to design a classifier, which can make precise predictions about the likelihood of a user being redirected to a malicious advertiser.

References

[1]

Pieter Agten, Wouter Joosen, Frank Piessens, and Nick Nikiforakis. 2015. Seven months’ worth of mistakes: A longitudinal study of typosquatting abuse. In Proceedings of NDSS 2015.

[2]

Alexa. [n.d.]. Alexa’s list of top one million popular sites. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip. Last accessed on April 18, 2020.

[3]

Sumayah Alrwais, Kan Yuan, Eihal Alowaisheq, Zhou Li, and XiaoFeng Wang. 2014. Understanding the dark side of domain parking. In USENIX Security 14.

[4]

Anirban Banerjee, Md Sazzadur Rahman, and Michalis Faloutsos. 2011. SUT: Quantifying and mitigating URL typosquatting. Computer Networks (2011).

[5]

Elie Bursztein, Artem Malyshev, Tadek Pietraszek, and Kurt Thomas. 2016. Picasso: Lightweight device class fingerprinting for web clients. In Proceedings of the 6th Workshop on Security and Privacy in Smartphones and Mobile Devices.

Digital Library

[6]

Guanchen Chen, Matthew F Johnson, Pavan R Marupally, Naveen K Singireddy, Xin Yin, and Vamsi Paruchuri. 2009. Combating Typo-Squatting for Safer Browsing. In Advanced Information Networking and Applications Workshops, 2009. WAINA’09. International Conference on.

[7]

François Chollet. [n.d.]. DenseNet, Keras. https://keras.io/applications/. Last accessed on April 18, 2020.

[8]

Jason W Clark and Damon McCoy. 2013. There are no free ipads: An analysis of survey scams as a business. In Presented as part of the 6th {USENIX} Workshop on Large-Scale Exploits and Emergent Threats.

[9]

Fred J Damerau. 1964. A technique for computer detection and correction of spelling errors. Commun. ACM (1964).

[10]

Jérémie du Boisberranger, Joris Van den Bossche, Loïc Estève, Thomas J Fan, Alexandre Gramfort, Olivier Grisel, Yaroslav Halchenko, Nicolas Hug, Adrin Jalali, Guillaume Lemaitre, Jan Hendrik Metzen, Andreas Mueller, Vlad Niculae, Joel Nothman, Hanmin Qin, Bertrand Thirion, Tom Dupré la Tour, Gael Varoquaux, Nelle Varoquaux, and Roman Yurchak. [n.d.]. Scikit Random Forest Classifier. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html. Last accessed on April 18, 2020.

[11]

B. Edelman. 2003. Large-Scale Registration of Domains with Typographical Errors. http://cyber.law.harvard.edu/people/edelman/typo-domains/.

[12]

A. Filipovich. [n.d.]. Python client library for Google Safe Browsing API. https://github.com/afilipovich/gglsbl. Last accessed on April 18, 2020.

[13]

Google, Inc.[n.d.]. Google Safe Browsing Update API. https://developers.google.com/safe-browsing/v4/update-api. Last accessed on April 18, 2020.

[14]

Ben Hoyt. [n.d.]. Dhash Python library.https://pypi.org/project/dhash/. Last accessed on April 18, 2020.

[15]

Damilola Ibosiola, Benjamin Steer, Alvaro Garcia-Recuero, Gianluca Stringhini, Steve Uhlig, and Gareth Tyson. 2018. Movie pirates of the caribbean: Exploring illegal streaming cyberlockers. In Twelfth International AAAI Conference on Web and Social Media.

[16]

Luca Invernizzi, Stanislav Miskovic, Ruben Torres, Christopher Kruegel, Sabyasachi Saha, Giovanni Vigna, Sung-Ju Lee, and Marco Mellia. 2014. Nazca: Detecting Malware Distribution in Large-Scale Networks. In NDSS.

[17]

Luca Invernizzi, Kurt Thomas, Alexandros Kapravelos, Oxana Comanescu, Jean-Michel Picod, and Elie Bursztein. 2016. Cloak of visibility: Detecting when machines browse a different web. In 2016 IEEE S&P.

[18]

Janos Szurdi. 2021. ODIN public Github repository.https://github.com/jszurdi/ODIN/. Last accessed on February 7, 2021.

[19]

Jordan Jueckstock, Shaown Sarker, Peter Snyder, Panagiotis Papadopoulos, Matteo Varvello, Benjamin Livshits, and Alexandros Kapravelos. 2019. The Blind Men and the Internet: Multi-Vantage Point Web Measurements. arXiv preprint arXiv:1905.08767(2019).

[20]

Mohammad Taha Khan, Xiang Huo, Zhou Li, and Chris Kanich. 2015. Every second counts: Quantifying the negative externalities of cybercrime via typosquatting. In In 2015 IEEE Security and Privacy (SP).

[21]

Amin Kharraz, William Robertson, and Engin Kirda. 2018. Surveylance: automatically detecting online survey scams. In 2018 IEEE S&P.

[22]

Panagiotis Kintis, Najmeh Miramirkhani, Charles Lever, Yizheng Chen, Rosa Romero-Gómez, Nikolaos Pitropakis, Nick Nikiforakis, and Manos Antonakakis. 2017. Hiding in plain sight: A longitudinal study of combosquatting abuse. In Proceedings of CCS 2017.

Digital Library

[23]

Anh Le, Athina Markopoulou, and Michalis Faloutsos. 2011. Phishdef: Url names say it all. In 2011 Proceedings IEEE INFOCOM.

[24]

Nektarios Leontiadis, Tyler Moore, and Nicolas Christin. 2011. Measuring and Analyzing Search-Redirection Attacks in the Illicit Online Prescription Drug Trade. In USENIX Security Symposium.

[25]

Nektarios Leontiadis, Tyler Moore, and Nicolas Christin. 2014. A nearly four-year longitudinal study of search-engine poisoning. In In ACM CCS 2014.

Digital Library

[26]

K. Levchenko, N. Chachra, B. Enright, M. Felegyhazi, C. Grier, T. Halvorson, C. Kanich, C. Kreibich, H. Liu, D. McCoy, A. Pitsillidis, N. Weaver, V. Paxson, G. Voelker, and S. Savage. 2011. Click Trajectories: End-to-End Analysis of the Spam Value Chain. In Proceedings of IEEE Security and Privacy.

[27]

Zhou Li, Sumayah Alrwais, Yinglian Xie, Fang Yu, and XiaoFeng Wang. 2013. Finding the linchpins of the dark web: a study on topologically dedicated hosts on malicious web infrastructures. In 2013 IEEE S&P.

[28]

Zhou Li, Kehuan Zhang, Yinglian Xie, Fang Yu, and XiaoFeng Wang. 2012. Knowing your enemy: understanding and detecting malicious web advertising. In Proceedings of the 2012 ACM CCS.

Digital Library

[29]

Alessandro Linari, Faye Mitchell, David Duce, and Stephen Morris. 2009. Typo-Squatting: The Curse of Popularity. In Proceedings of the WebSci’09: Society On-Line.

[30]

L. Lu, R. Perdisci, and W. Lee. 2011. SURF: Detecting and Measuring Search Poisoning. In Proceedings of ACM CCS 2011.

[31]

Samuel Marchal, Giovanni Armano, Tommi Gröndahl, Kalle Saari, Nidhi Singh, and N Asokan. 2017. Off-the-hook: An efficient and usable client-side phishing prevention application. IEEE Trans. Comput. (2017).

[32]

D. McCoy, A. Pitsillidis, G. Jordan, N. Weaver, C. Kreibich, B. Krebs, G. Voelker, S. Savage, and K. Levchenko. 2012. PharmaLeaks: Understanding the Business of Online Pharmaceutical Affiliate Programs. In Proceedings of USENIX Security 2012.

[33]

Xianghang Mi, Ying Liu, Xuan Feng, Xiaojing Liao, Baojun Liu, XiaoFeng Wang, Feng Qian, Zhou Li, Sumayah Alrwais, and Limin Sun. 2019. Resident Evil: Understanding Residential IP Proxy as a Dark Service. In 2019 IEEE S&P.

[34]

Najmeh Miramirkhani, Oleksii Starov, and Nick Nikiforakis. 2017. Dial One for Scam: A Large-Scale Analysis of Technical Support Scams. In NDSS.

[35]

Tyler Moore and Benjamin Edelman. 2010. Measuring the perpetrators and funders of typosquatting. In Financial Cryptography and Data Security.

[36]

Terry Nelms, Roberto Perdisci, Manos Antonakakis, and Mustaque Ahamad. 2016. Towards measuring and mitigating social engineering software download attacks. In 25th {USENIX} Security Symposium ({USENIX} Security 16).

[37]

Robert G Newcombe. 2006. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 1: general issues and tail-area-based methods. Statistics in medicine(2006).

[38]

Nick Nikiforakis, Federico Maggi, Gianluca Stringhini, M Zubair Rafique, Wouter Joosen, Christopher Kruegel, Frank Piessens, Giovanni Vigna, and Stefano Zanero. 2014. Stranger danger: exploring the ecosystem of ad-based url shortening services. In WWW 014.

[39]

Adam Oest, Yeganeh Safaei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and Kevin Tyers. 2019. PhishFarm: A Scalable Framework for Measuring the Effectiveness of Evasion Techniques Against Browser Phishing Blacklists. In PhishFarm: A Scalable Framework for Measuring the Effectiveness of Evasion Techniques against Browser Phishing Blacklists.

[40]

OpenDNS. [n.d.]. PhishTank. http://phishtank.com. Last accessed on April 18, 2020.

[41]

Paul Pearce, Vacha Dave, Chris Grier, Kirill Levchenko, Saikat Guha, Damon McCoy, Vern Paxson, Stefan Savage, and Geoffrey Voelker. 2014. Characterizing large-scale click fraud in zeroaccess. In In ACM CCS 2014.

Digital Library

[42]

Paolo Piredda, Davide Ariu, Battista Biggio, Igino Corona, Luca Piras, Giorgio Giacinto, and Fabio Roli. 2017. Deepsquatting: Learning-based typosquatting detection at deeper domain levels. In Conference of the Italian Association for Artificial Intelligence.

[43]

PRGMR. [n.d.]. PRGMR VPS provider. https://prgmr.com/xen/. Last accessed on April 18, 2020.

[44]

Niels Provos, Panayiotis Mavrommatis, Moheeb Rajab, and Fabian Monrose. 2008. All your iframes point to us. (2008).

[45]

M Zubair Rafique, Tom Van Goethem, Wouter Joosen, Christophe Huygens, and Nick Nikiforakis. 2016. It’s free for a reason: Exploring the ecosystem of free live streaming services. In Proceedings of NDSS 2016.

[46]

Felix Richter. 2020. Landline Phones Are a Dying Breed. https://www.statista.com/chart/2072/landline-phones-in-the-united-states/.

[47]

Mahmood Sharif, Jumpei Urakawa, Nicolas Christin, Ayumu Kubota, and Akira Yamada. 2018. Predicting impending exposure to malicious content from user behavior. In Proceedings of the 2018 ACM CCS.

Digital Library

[48]

Jeffrey Spaulding, Shambhu Upadhyaya, and Aziz Mohaisen. 2016. The landscape of domain name typosquatting: Techniques and countermeasures. In 2016 11th International Conference on Availability, Reliability and Security (ARES).

[49]

Bharat Srinivasan, Athanasios Kountouras, Najmeh Miramirkhani, Monjur Alam, Nick Nikiforakis, Manos Antonakakis, and Mustaque Ahamad. 2018. Exposing search and advertisement abuse tactics and infrastructure of technical support scammers. In WWW 2018.

[50]

Oleksii Starov, Yuchen Zhou, Xiao Zhang, Najmeh Miramirkhani, and Nick Nikiforakis. 2018. Betrayed by your dashboard: Discovering malicious campaigns via web analytics. In In WWW 2018.

Digital Library

[51]

Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2013. Shady paths: Leveraging surfing crowds to detect malicious web pages. In ACM CCS 2013.

Digital Library

[52]

SURBL maintainers. [n.d.]. SURBL: URI reputation data. http://www.surbl.org/lists. Last accessed on April 18, 2020.

[53]

Janos Szurdi and Nicolas Christin. 2017. Email typosquatting. In Proceedings of IMC 2017.

Digital Library

[54]

Janos Szurdi and Nicolas Christin. 2018. Domain registration policy strategies and the fight against online crime. WEIS, June (2018).

[55]

Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Mark Felegyhazi, and Chris Kanich. 2014. The Long “Taile” of Typosquatting Domain Names. In USENIX Security Symposium.

[56]

Rashid Tahir, Ali Raza, Faizan Ahmad, Jehangir Kazi, Fareed Zaffar, Chris Kanich, and Matthew Caesar. 2018. It’s all in the name: Why some urls are more vulnerable to typosquatting. In IEEE INFOCOM 2018.

Digital Library

[57]

Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. 2011. Design and evaluation of a real-time url spam filtering service. In 2011 IEEE S&P.

[58]

Ke Tian, Steve TK Jan, Hang Hu, Danfeng Yao, and Gang Wang. 2018. Needle in a haystack: tracking down elite phishing domains in the wild. In IMC 2018.

[59]

Phani Vadrevu and Roberto Perdisci. 2019. What You See is NOT What You Get: Discovering and Tracking Social Engineering Attack Campaigns. In In IMC 2019.

Digital Library

[60]

T. Vidas and N. Christin. 2014. Evading Android Runtime Analysis via Sandbox Detection. In In ASIACCS’14.

[61]

Thomas Vissers, Wouter Joosen, and Nick Nikiforakis. 2015. Parking sensors: Analyzing and detecting parked domains. In Proceedings of NDSS 2015.

[62]

D. Wang, S. Savage, and G. Voelker. 2011. Cloak and Dagger: Dynamics of Web Search Cloaking. In Proceedings of ACM CCS 2011.

[63]

Yi-Min Wang, Doug Beck, Jeffrey Wang, Chad Verbowski, and Brad Daniels. 2006. Strider typo-patrol: discovery and analysis of systematic typo-squatting. In Proc. 2nd Workshop on Steps to Reducing Unwanted Traffic on the Internet (SRUTI).

Digital Library

[64]

Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-scale automatic classification of phishing pages. (2010).

[65]

Jing Ya, Tingwen Liu, Quangang Li, Pin Lv, Jinqiao Shi, and Li Guo. 2018. Fast and Accurate Typosquatting Domains Evaluation with Siamese Networks. In MILCOM 2018-2018 IEEE Military Communications Conference (MILCOM).

[66]

Apostolis Zarras, Alexandros Kapravelos, Gianluca Stringhini, Thorsten Holz, Christopher Kruegel, and Giovanni Vigna. 2014. The dark alleys of madison avenue: Understanding malicious advertisements. In Proceedings of IMC 2014.

Digital Library

[67]

Yuwei Zeng, Tianning Zang, Yongzheng Zhang, Xunxun Chen, and YiPeng Wang. 2019. A Comprehensive Measurement Study of Domain-Squatting Abuse. In ICC 2019-2019 IEEE International Conference on Communications (ICC).

Cited By

Wang YGuo CYan JZhang ZCheng Y(2025)Unmasking hidden threats: Enhanced detection of embedded malicious domains in pirate streaming videosComputers and Electrical Engineering10.1016/j.compeleceng.2025.110087123(110087)Online publication date: Apr-2025
https://doi.org/10.1016/j.compeleceng.2025.110087
Saric KSavins FRamachandran GJurdak RNepal SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Hyperlink Hijacking: Exploiting Erroneous URL Links to Phantom DomainsProceedings of the ACM Web Conference 202410.1145/3589334.3645510(1724-1733)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645510
Koide TNakano HChiba D(2024)ChatPhishDetector: Detecting Phishing Sites Using Large Language ModelsIEEE Access10.1109/ACCESS.2024.348390512(154381-154400)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3483905
Show More Cited By

Recommendations

How Discover a Malware using Model Checking
ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security

Android operating system is constantly overwhelmed by new sophisticated threats and new zero-day attacks. While aggressive malware, for instance malicious behaviors able to cipher data files or lock the GUI, are not worried to circumvention users by ...
Smart malware detection on Android

Nowadays, because of its increased popularity, Android is target to a growing number of attacks and malicious applications, with the purpose of stealing private information and consuming credit by subscribing to premium services. Most of the current ...
Talos: no more ransomware victims with formal methods

Ransomware is a very effective form of malware that is recently spreading out on an impressive number of workstations and smartphones. This malware blocks the access to the infected machine or to the files located in the infected machine. The attackers ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '21: Proceedings of the Web Conference 2021

April 2021

4054 pages

ISBN:9781450383127

DOI:10.1145/3442381

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
303
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YGuo CYan JZhang ZCheng Y(2025)Unmasking hidden threats: Enhanced detection of embedded malicious domains in pirate streaming videosComputers and Electrical Engineering10.1016/j.compeleceng.2025.110087123(110087)Online publication date: Apr-2025
https://doi.org/10.1016/j.compeleceng.2025.110087
Saric KSavins FRamachandran GJurdak RNepal SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Hyperlink Hijacking: Exploiting Erroneous URL Links to Phantom DomainsProceedings of the ACM Web Conference 202410.1145/3589334.3645510(1724-1733)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645510
Koide TNakano HChiba D(2024)ChatPhishDetector: Detecting Phishing Sites Using Large Language ModelsIEEE Access10.1109/ACCESS.2024.348390512(154381-154400)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3483905
Koide TFukushi NNakano HChiba D(2023)PhishReplicant: A Language Model-based Approach to Detect Generated Squatting Domain NamesProceedings of the 39th Annual Computer Security Applications Conference10.1145/3627106.3627111(1-13)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3627106.3627111
Le Pochat VJoosen W(2023)Analyzing Cyber Security Research Practices through a Meta-Research FrameworkProceedings of the 16th Cyber Security Experimentation and Test Workshop10.1145/3607505.3607523(64-74)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3607505.3607523
Wang CLi ZYin JLiu ZZhang ZLiu Q(2023)IDTracker: Discovering Illicit Website Communities via Third-party Service IDs2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58367.2023.00050(459-469)Online publication date: Jun-2023
https://doi.org/10.1109/DSN58367.2023.00050
Zhang SYin JLi ZYang RDu MLi R(2022)Node-Imbalance Learning on Heterogeneous Graph for Pirated Video Website Detection2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD54268.2022.9776224(834-840)Online publication date: 4-May-2022
https://doi.org/10.1109/CSCWD54268.2022.9776224
Charmet FMorikawa TTakahashi T(2022)Toward a Better Understanding of Mobile Users’ Behavior: A Web Session Repair SchemeIEEE Access10.1109/ACCESS.2022.320640210(99931-99943)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3206402

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten