research-article

SBotMiner: large scale search bot detection

Authors:

Qifa KeAuthors Info & Claims

WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Pages 421 - 430

https://doi.org/10.1145/1718487.1718540

Published: 04 February 2010 Publication History

Abstract

In this paper, we study search bot traffic from search engine query logs at a large scale. Although bots that generate search traffic aggressively can be easily detected, a large number of distributed, low rate search bots are difficult to identify and are often associated with malicious attacks. We present SBotMiner, a system for automatically identifying stealthy, low-rate search bot traffic from query logs. Instead of detecting individual bots, our approach captures groups of distributed, coordinated search bots. Using sampled data from two different months, SBotMiner identifies over 123 million bot-related pageviews, accounting for 3.8% of total traffic. Our in-depth analysis shows that a large fraction of the identified bot traffic may be associated with various malicious activities such as phishing attacks or vulnerability exploits. This finding suggests that detecting search bot traffic holds great promise to detect and stop attacks early on.

References

[1]

Forum. http://www.diggsoft.com/ask/question.php?id=35807.

[2]

Google PageRank: How to improve pagerank. http://www.googleguide.com/improving_pagerank.html.

[3]

Michael Bendersky and W. Bruce Croft. Finding text reuse on the web. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 262--271, New York, NY, USA, 2009. ACM.

Digital Library

[4]

Brigitte Bigi. Using kullback-leibler distance for text categorization. In ECIR'2003, volume 2633 of Lecture Notes in Computer Science, pages 305--319. Springer-Verlag, 2003.

Digital Library

[5]

Greg Buehrer, Jack W. Stokes, and Kumar Chellapilla. A large-scale study of automated web search traffic. In AIRWeb'08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web, pages 1--8, New York, NY, USA, 2008. ACM.

Digital Library

[6]

Claudio Carpineto, Renato de Mori, Giovanni Romano, and Brigitte Bigi. An information-theoretic approach to automatic query expansion. ACM Transaction on Information Systems, 19(1):1--27, 2001.

Digital Library

[7]

Ken Chiang and Levi Lloyd. A case study of the rustock rootkit and spam bot. In First workshop on hot topics in understanding botnets, 2007.

Digital Library

[8]

Inc Click Forensics. Industry click fraud rate drops to 12.7% in q2. http://www.clickforensics.com/newsroom/press-releases/142-click-fraud-index-q2-2009.html.

[9]

Neil Daswani and Michael Stoppelman. The anatomy of clickbot.a. In HotBots'07: Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, pages 11--11, Berkeley, CA, USA, 2007. USENIX Association.

Digital Library

[10]

Efthimis N. Efthimiadis, Nicos Malevris, Apostolos Kousaridas, Alexandra Lepeniotou, and Nikos Loutas. An evaluation of how search engines respond to greek language queries. In HICSS, page 136, 2008.

Digital Library

[11]

David Eichmann. The rbse spider -- balancing effective search against web load, 1994.

[12]

Swa Frantzen. Clickbot. http://isc.sans.org/diary.html?storyid=1334.

[13]

Guofei Gu, Junjie Zhang, and Wenke Lee. Botsniffer: Detecting botnet command and control channels in network traffic. In NDSS, 2008.

[14]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and Dennis Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007.

Digital Library

[15]

I.T. Jolliffe. Blackenergy ddos bot analysis. Arbor Networks, 2007. http://atlas-public.ec2.arbor.net/docs/BlackEnergy+DDoS+Bot+Analysis.pdf.

[16]

Daniel V. Klein. Defending against the wily surfer-web-based attacks and defenses. In Proceedings of the Workshop on Intrusion Detection and Network Monitoring, pages 81--92, Berkeley, CA, USA, 1999. USENIX Association.

Digital Library

[17]

S. Kullback. The kullback-leibler distance. The American Statistician, 41:340--341, 1987.

[18]

Saugat Majumdar, Dhananjay Kulkarni, and Chinya V. Ravishankar. Addressing click fraud in content delivery systems. In In Proceedings of the 26th IEEE INFOCOM. C Mann. How click fraud could swallow the internet, 2006.

[19]

Ahmed Metwally, Divyakant Agrawal, Amr El Abbad, and Qi Zheng. On hit inflation techniques and detection in streams of web advertising networks. In ICDCS '07: Proceedings of the 27th International Conference on Distributed Computing Systems, page 52, Washington, DC, USA, 2007. IEEE Computer Society.

Digital Library

[20]

Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. Detectives: detecting coalition hit inflation attacks in advertising networks streams. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 241--250, New York, NY, USA, 2007. ACM.

Digital Library

[21]

Tyler Moore and Richard Clayton. Evil searching: Compromise and recompromise of internet hosts for phishing. In 13th International Conference on Financial Cryptography and Data Security, Barbados, 2009.

Digital Library

[22]

Haidar Moukdad. Lost in cyberspace: how do search engines handle arabic queries. In In: Proceedings of the 32nd Annual Conference of the Canadian Association for Information Science, 2004.

[23]

Jose Nazario. Principal component analysis. Spring-Verlag, New York, 1986.

[24]

Niels Provos, Joe McClain, and Ke Wang. Search worms. In WORM '06: Proceedings of the 4th ACM workshop on Recurring malcode, pages 1--8, New York, NY, USA, 2006. ACM.

Digital Library

[25]

Anirudh Ramachandran and Nick Feamster. Understanding the network-level behavior of spammers. In Sigcomm, 2006.

Digital Library

[26]

Adish Singla and Ingmar Weber. Camera brand congruence in the flickr social graph. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 252--261, New York, NY, USA, 2009. ACM.

Digital Library

[27]

Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, and Martin Szydlowski. our botnet is my botnet: Analysis of a botnet takeover. In CCS, 2009.

Digital Library

[28]

Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, and Ted Wobber. How dynamic are IP addresses? In ACM Sigcomm, 2007.

Digital Library

[29]

Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. Spamming botnets: Signatures and characteristics. In SIGCOMM, 2008.

Digital Library

[30]

Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P.K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008.

Digital Library

[31]

Linfeng Zhang and Yong Guan. Detecting click fraud in pay-per-click streams of online advertising networks. In ICDCS, pages 77--84, 2008.

Digital Library

[32]

Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum. Botgraph: Large scale spamming botnet detection. In The 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009.

Digital Library

[33]

Li Zhuang, John Dunagan, Daniel R. Simon, Helen J. Wang, and J.D. Tygar. Characterizing botnets from email spam records. In LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, 2008.

Digital Library

Cited By

Arora AYadav SSharma K(2021)Denial-of-Service (DoS) Attack and BotnetResearch Anthology on Combating Denial-of-Service Attacks10.4018/978-1-7998-5348-0.ch003(49-73)Online publication date: 2021
https://doi.org/10.4018/978-1-7998-5348-0.ch003
Cao CGao YLuo YXia MDong WChen CLiu X(2021)AdSherlock: Efficient and Deployable Click Fraud Detection for Mobile ApplicationsIEEE Transactions on Mobile Computing10.1109/TMC.2020.296699120:4(1285-1297)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TMC.2020.2966991
Yang DLi ZWang XSalamatian KXie G(2021)Exploiting the Community Structure of Fraudulent Keywords for Fraud Detection in Web SearchJournal of Computer Science and Technology10.1007/s11390-021-0218-236:5(1167-1183)Online publication date: 30-Sep-2021
https://doi.org/10.1007/s11390-021-0218-2
Show More Cited By

Index Terms

SBotMiner: large scale search bot detection
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Detecting botnet by anomalous traffic

Botnets can cause significant security threat and huge loss to organizations, and are difficult to discover their existence. Therefore they have become one of the most severe threats on the Internet. The core component of botnets is their command and ...
Detection and classification of different botnet C&C channels
ATC'11: Proceedings of the 8th international conference on Autonomic and trusted computing

Unlike other types of malware, botnets are characterized by their command and control (C&C) channels, through which a central authority, the botmaster, may use the infected computer to carry out malicious activities. Given the damage botnets are capable ...
DFBotKiller

Each botnet needs an addressing mechanism to locate its command and control (C&C) server(s). This mechanism allows a botmaster to send commands to and receive stolen data from compromised hosts. To maximize the availability of the C&C server(s), ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

February 2010

468 pages

ISBN:9781605588896

DOI:10.1145/1718487

General Chairs:
Brian D. Davison
Lehigh University, USA
,
Torsten Suel
Polytechnic Institute of NYU, USA
,
Program Chairs:
Nick Craswell
Microsoft, USA
,
Bing Liu
University of Illinois, Chicago, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM'10

Sponsor:

WSDM'10: Third ACM International Conference on Web Search and Data Mining

February 4 - 6, 2010

New York, New York, USA

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
861
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Arora AYadav SSharma K(2021)Denial-of-Service (DoS) Attack and BotnetResearch Anthology on Combating Denial-of-Service Attacks10.4018/978-1-7998-5348-0.ch003(49-73)Online publication date: 2021
https://doi.org/10.4018/978-1-7998-5348-0.ch003
Cao CGao YLuo YXia MDong WChen CLiu X(2021)AdSherlock: Efficient and Deployable Click Fraud Detection for Mobile ApplicationsIEEE Transactions on Mobile Computing10.1109/TMC.2020.296699120:4(1285-1297)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TMC.2020.2966991
Yang DLi ZWang XSalamatian KXie G(2021)Exploiting the Community Structure of Fraudulent Keywords for Fraud Detection in Web SearchJournal of Computer Science and Technology10.1007/s11390-021-0218-236:5(1167-1183)Online publication date: 30-Sep-2021
https://doi.org/10.1007/s11390-021-0218-2
Keserwani PJha VGovil MPilli E(2021)Clickedroid: A Methodology Based on Heuristic Approach to Detect Mobile Ad-Click FraudsProceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences10.1007/978-981-15-7533-4_68(853-864)Online publication date: 20-Feb-2021
https://doi.org/10.1007/978-981-15-7533-4_68
KANEI FCHIBA DHATO KYOSHIOKA KMATSUMOTO TAKIYAMA M(2020)Detecting and Understanding Online Advertising Fraud in the WildIEICE Transactions on Information and Systems10.1587/transinf.2019ICP0008E103.D:7(1512-1523)Online publication date: 1-Jul-2020
https://doi.org/10.1587/transinf.2019ICP0008
Wan SLi YSun K(2019)PathMarker: protecting web contents against inside crawlersCybersecurity10.1186/s42400-019-0023-12:1Online publication date: 20-Feb-2019
https://doi.org/10.1186/s42400-019-0023-1
Nagaraja SShah R(2019)ClicktokProceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3317549.3323407(105-116)Online publication date: 15-May-2019
https://dl.acm.org/doi/10.1145/3317549.3323407
Sudhakar Kumar S(2019)Botnet Detection Techniques and Research Challenges2019 International Conference on Recent Advances in Energy-efficient Computing and Communication (ICRAECC)10.1109/ICRAECC43874.2019.8995028(1-6)Online publication date: Mar-2019
https://doi.org/10.1109/ICRAECC43874.2019.8995028
Kanei FChiba DHato KAkiyama M(2019)Precise and Robust Detection of Advertising Fraud2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2019.00115(776-785)Online publication date: Jul-2019
https://doi.org/10.1109/COMPSAC.2019.00115
Almahmoud SHammo BAl-Shboul B(2019)Exploring Non-Human Traffic in Online Digital Advertisements: Analysis and PredictionComputational Collective Intelligence10.1007/978-3-030-28374-2_57(663-675)Online publication date: 9-Aug-2019
https://doi.org/10.1007/978-3-030-28374-2_57
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten