research-article

Search engine click spam detection based on bipartite graph propagation

Authors:
Xin Li

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Min Zhang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yiqun Liu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Shaoping Ma

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yijiang Jin

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Liyun Ru

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014Pages 93–102https://doi.org/10.1145/2556195.2556214

Published:24 February 2014Publication History

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Pages 93–102

ABSTRACT

Using search engines to retrieve information has become an important part of people's daily lives. For most search engines, click information is an important factor in document ranking. As a result, some websites cheat to obtain a higher rank by fraudulently increasing clicks to their pages, which is referred to as "Click Spam". Based on an analysis of the features of fraudulent clicks, a novel automatic click spam detection approach is proposed in this paper, which consists of 1. modeling user sessions with a triple sequence, which, to the best of our knowledge, takes into account not only the user action but also the action objective and the time interval between actions for the first time; 2. using the user-session bipartite graph propagation algorithm to take advantage of cheating users to find more cheating sessions; and 3. using the pattern-session bipartite graph propagation algorithm to obtain cheating session patterns to achieve higher precision and recall of click spam detection. Experimental results based on a Chinese commercial search engine using real-world log data containing approximately 80 million user clicks per day show that 2.6% of all clicks were detected as spam with a precision of up to 97%.

References

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26. ACM, 2006. Google ScholarDigital Library
R. Agrawal and R. Srikant. Mining sequential patterns. In Data Engineering, 1995. Proceedings of the Eleventh International Conference on, pages 3--14. IEEE, 1995. Google ScholarDigital Library
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. A. Baeza-Yates. Link-based characterization and detection of web spam. In AIRWeb, pages 1--8, 2006.Google Scholar
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th international conference on World wide web, pages 1--10. ACM, 2009. Google ScholarDigital Library
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 87--94. ACM, 2008. Google ScholarDigital Library
G. Gu, R. Perdisci, J. Zhang, W. Lee, et al. Botminer: Clustering analysis of network traffic for protocol-and structure-independent botnet detection. In USENIX Security Symposium, pages 139--154, 2008. Google ScholarDigital Library
F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor, Y.-M. Wang, and C. Faloutsos. Click chain model in web search. In Proceedings of the 18th international conference on World wide web, pages 11--20. ACM, 2009. Google ScholarDigital Library
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 576--587. VLDB Endowment, 2004. Google ScholarDigital Library
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 355--359. ACM, 2000. Google ScholarDigital Library
J. Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In proceedings of the 17th international conference on data engineering, pages 215--224, 2001. Google ScholarDigital Library
B. J. Jansen. Click fraud. Computer, 40(7):85--86, 2007. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422--446, 2002. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133--142. ACM, 2002. Google ScholarDigital Library
H. Kang, K. Wang, D. Soukal, F. Behr, and Z. Zheng. Large-scale bot detection for search engines. In Proceedings of the 19th international conference on World wide web, pages 501--510. ACM, 2010. Google ScholarDigital Library
A. Karasaridis, B. Rexroad, and D. Hoeflin. Wide-scale botnet detection and characterization. In Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, volume 7. Cambridge, MA, 2007. Google ScholarDigital Library
V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In AIRWeb, volume 6, pages 37--40, 2006.Google Scholar
Y. Liu, R. Cen, M. Zhang, S. Ma, and L. Ru. Identifying web spam with user behavior analysis. In Proceedings of the 4th international workshop on Adversarial information retrieval on the web, pages 9--16. ACM, 2008. Google ScholarDigital Library
M. Marchiori. The quest for correct information on the web: Hyper search engines. Computer Networks and ISDN Systems, 29(8):1225--1235, 1997. Google ScholarDigital Library
A. Metwally, D. Agrawal, and A. E. Abbadi. Using association rules for fraud detection in web advertising networks. In Proceedings of the 31st international conference on Very large data bases, pages 169--180. VLDB Endowment, 2005. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining sequential patterns by pattern-growth: The prefixspan approach. Knowledge and Data Engineering, IEEE Transactions on, 16(11):1424--1440, 2004. Google ScholarDigital Library
F. Radlinski. Addressing malicious noise in clickthrough data. In Learning to Rank for Information Retrieval Workshop at SIGIR, volume 2007, 2007.Google Scholar
N. Sadagopan and J. Li. Characterizing typical and atypical user sessions in clickstreams. In Proceedings of the 17th international conference on World Wide Web, pages 885--894. ACM, 2008. Google ScholarDigital Library
T. Schluessler, S. Goglin, and E. Johnson. Is a bot at the controls?: Detecting input data attacks. In Proceedings of the 6th ACM SIGCOMM workshop on Network and system support for games, pages 1--6. ACM, 2007. Google ScholarDigital Library
X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large datasets. In Proc. 2003 SIAM Int'l Conf. Data Mining (SDM'03), pages 166--177, 2003.Google ScholarCross Ref

Index Terms

Search engine click spam detection based on bipartite graph propagation
1. Information systems
  1. Information retrieval

Recommendations

Fighting against web spam: a novel propagation method based on click-through data
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as content spamming and link-based spamming. Although these anti-spam ...
Read More
Finding and fighting search engine spam
Read More
Finding and Fighting Search Engine Spam: Algorithms and Evaluations
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
General Chairs:
Ben Carterette
University of Delaware, USA
,
Fernando Diaz
Microsoft Research, USA
,
Program Chairs:
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Donald Metzler
Google, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
click spam
frequent sequential patterns
label propagation
user session model
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '14 Paper Acceptance Rate64of355submissions,18%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 452
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Search engine click spam detection based on bipartite graph propagation

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fighting against web spam: a novel propagation method based on click-through data

Finding and fighting search engine spam

Finding and Fighting Search Engine Spam: Algorithms and Evaluations