skip to main content
10.1145/1718487.1718540acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

SBotMiner: large scale search bot detection

Published: 04 February 2010 Publication History

Abstract

In this paper, we study search bot traffic from search engine query logs at a large scale. Although bots that generate search traffic aggressively can be easily detected, a large number of distributed, low rate search bots are difficult to identify and are often associated with malicious attacks. We present SBotMiner, a system for automatically identifying stealthy, low-rate search bot traffic from query logs. Instead of detecting individual bots, our approach captures groups of distributed, coordinated search bots. Using sampled data from two different months, SBotMiner identifies over 123 million bot-related pageviews, accounting for 3.8% of total traffic. Our in-depth analysis shows that a large fraction of the identified bot traffic may be associated with various malicious activities such as phishing attacks or vulnerability exploits. This finding suggests that detecting search bot traffic holds great promise to detect and stop attacks early on.

References

[1]
Forum. http://www.diggsoft.com/ask/question.php?id=35807.
[2]
Google PageRank: How to improve pagerank. http://www.googleguide.com/improving_pagerank.html.
[3]
Michael Bendersky and W. Bruce Croft. Finding text reuse on the web. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 262--271, New York, NY, USA, 2009. ACM.
[4]
Brigitte Bigi. Using kullback-leibler distance for text categorization. In ECIR'2003, volume 2633 of Lecture Notes in Computer Science, pages 305--319. Springer-Verlag, 2003.
[5]
Greg Buehrer, Jack W. Stokes, and Kumar Chellapilla. A large-scale study of automated web search traffic. In AIRWeb'08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web, pages 1--8, New York, NY, USA, 2008. ACM.
[6]
Claudio Carpineto, Renato de Mori, Giovanni Romano, and Brigitte Bigi. An information-theoretic approach to automatic query expansion. ACM Transaction on Information Systems, 19(1):1--27, 2001.
[7]
Ken Chiang and Levi Lloyd. A case study of the rustock rootkit and spam bot. In First workshop on hot topics in understanding botnets, 2007.
[8]
Inc Click Forensics. Industry click fraud rate drops to 12.7% in q2. http://www.clickforensics.com/newsroom/press-releases/142-click-fraud-index-q2-2009.html.
[9]
Neil Daswani and Michael Stoppelman. The anatomy of clickbot.a. In HotBots'07: Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, pages 11--11, Berkeley, CA, USA, 2007. USENIX Association.
[10]
Efthimis N. Efthimiadis, Nicos Malevris, Apostolos Kousaridas, Alexandra Lepeniotou, and Nikos Loutas. An evaluation of how search engines respond to greek language queries. In HICSS, page 136, 2008.
[11]
David Eichmann. The rbse spider -- balancing effective search against web load, 1994.
[12]
Swa Frantzen. Clickbot. http://isc.sans.org/diary.html?storyid=1334.
[13]
Guofei Gu, Junjie Zhang, and Wenke Lee. Botsniffer: Detecting botnet command and control channels in network traffic. In NDSS, 2008.
[14]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and Dennis Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007.
[15]
I.T. Jolliffe. Blackenergy ddos bot analysis. Arbor Networks, 2007. http://atlas-public.ec2.arbor.net/docs/BlackEnergy+DDoS+Bot+Analysis.pdf.
[16]
Daniel V. Klein. Defending against the wily surfer-web-based attacks and defenses. In Proceedings of the Workshop on Intrusion Detection and Network Monitoring, pages 81--92, Berkeley, CA, USA, 1999. USENIX Association.
[17]
S. Kullback. The kullback-leibler distance. The American Statistician, 41:340--341, 1987.
[18]
Saugat Majumdar, Dhananjay Kulkarni, and Chinya V. Ravishankar. Addressing click fraud in content delivery systems. In In Proceedings of the 26th IEEE INFOCOM. C Mann. How click fraud could swallow the internet, 2006.
[19]
Ahmed Metwally, Divyakant Agrawal, Amr El Abbad, and Qi Zheng. On hit inflation techniques and detection in streams of web advertising networks. In ICDCS '07: Proceedings of the 27th International Conference on Distributed Computing Systems, page 52, Washington, DC, USA, 2007. IEEE Computer Society.
[20]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. Detectives: detecting coalition hit inflation attacks in advertising networks streams. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 241--250, New York, NY, USA, 2007. ACM.
[21]
Tyler Moore and Richard Clayton. Evil searching: Compromise and recompromise of internet hosts for phishing. In 13th International Conference on Financial Cryptography and Data Security, Barbados, 2009.
[22]
Haidar Moukdad. Lost in cyberspace: how do search engines handle arabic queries. In In: Proceedings of the 32nd Annual Conference of the Canadian Association for Information Science, 2004.
[23]
Jose Nazario. Principal component analysis. Spring-Verlag, New York, 1986.
[24]
Niels Provos, Joe McClain, and Ke Wang. Search worms. In WORM '06: Proceedings of the 4th ACM workshop on Recurring malcode, pages 1--8, New York, NY, USA, 2006. ACM.
[25]
Anirudh Ramachandran and Nick Feamster. Understanding the network-level behavior of spammers. In Sigcomm, 2006.
[26]
Adish Singla and Ingmar Weber. Camera brand congruence in the flickr social graph. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 252--261, New York, NY, USA, 2009. ACM.
[27]
Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, and Martin Szydlowski. our botnet is my botnet: Analysis of a botnet takeover. In CCS, 2009.
[28]
Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, and Ted Wobber. How dynamic are IP addresses? In ACM Sigcomm, 2007.
[29]
Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. Spamming botnets: Signatures and characteristics. In SIGCOMM, 2008.
[30]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P.K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008.
[31]
Linfeng Zhang and Yong Guan. Detecting click fraud in pay-per-click streams of online advertising networks. In ICDCS, pages 77--84, 2008.
[32]
Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum. Botgraph: Large scale spamming botnet detection. In The 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009.
[33]
Li Zhuang, John Dunagan, Daniel R. Simon, Helen J. Wang, and J.D. Tygar. Characterizing botnets from email spam records. In LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, 2008.

Cited By

View all
  • (2021)Denial-of-Service (DoS) Attack and BotnetResearch Anthology on Combating Denial-of-Service Attacks10.4018/978-1-7998-5348-0.ch003(49-73)Online publication date: 2021
  • (2021)AdSherlock: Efficient and Deployable Click Fraud Detection for Mobile ApplicationsIEEE Transactions on Mobile Computing10.1109/TMC.2020.296699120:4(1285-1297)Online publication date: 1-Apr-2021
  • (2021)Exploiting the Community Structure of Fraudulent Keywords for Fraud Detection in Web SearchJournal of Computer Science and Technology10.1007/s11390-021-0218-236:5(1167-1183)Online publication date: 30-Sep-2021
  • Show More Cited By

Index Terms

  1. SBotMiner: large scale search bot detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '10: Proceedings of the third ACM international conference on Web search and data mining
      February 2010
      468 pages
      ISBN:9781605588896
      DOI:10.1145/1718487
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 February 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. botnet detection
      2. click fraud
      3. search bot
      4. search log analysis
      5. web search

      Qualifiers

      • Research-article

      Conference

      Acceptance Rates

      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Denial-of-Service (DoS) Attack and BotnetResearch Anthology on Combating Denial-of-Service Attacks10.4018/978-1-7998-5348-0.ch003(49-73)Online publication date: 2021
      • (2021)AdSherlock: Efficient and Deployable Click Fraud Detection for Mobile ApplicationsIEEE Transactions on Mobile Computing10.1109/TMC.2020.296699120:4(1285-1297)Online publication date: 1-Apr-2021
      • (2021)Exploiting the Community Structure of Fraudulent Keywords for Fraud Detection in Web SearchJournal of Computer Science and Technology10.1007/s11390-021-0218-236:5(1167-1183)Online publication date: 30-Sep-2021
      • (2021)Clickedroid: A Methodology Based on Heuristic Approach to Detect Mobile Ad-Click FraudsProceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences10.1007/978-981-15-7533-4_68(853-864)Online publication date: 20-Feb-2021
      • (2020)Detecting and Understanding Online Advertising Fraud in the WildIEICE Transactions on Information and Systems10.1587/transinf.2019ICP0008E103.D:7(1512-1523)Online publication date: 1-Jul-2020
      • (2019)PathMarker: protecting web contents against inside crawlersCybersecurity10.1186/s42400-019-0023-12:1Online publication date: 20-Feb-2019
      • (2019)ClicktokProceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3317549.3323407(105-116)Online publication date: 15-May-2019
      • (2019)Botnet Detection Techniques and Research Challenges2019 International Conference on Recent Advances in Energy-efficient Computing and Communication (ICRAECC)10.1109/ICRAECC43874.2019.8995028(1-6)Online publication date: Mar-2019
      • (2019)Precise and Robust Detection of Advertising Fraud2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2019.00115(776-785)Online publication date: Jul-2019
      • (2019)Exploring Non-Human Traffic in Online Digital Advertisements: Analysis and PredictionComputational Collective Intelligence10.1007/978-3-030-28374-2_57(663-675)Online publication date: 9-Aug-2019
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media