research-article

Dr. Searcher and Mr. Browser: a unified hyperlink-click graph

Authors:

Barbara Poblete,

Carlos Castillo,

Aristides GionisAuthors Info & Claims

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 1123 - 1132

https://doi.org/10.1145/1458082.1458231

Published: 26 October 2008 Publication History

Abstract

We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink and click graphs. The hyperlink graph expresses link structure among Web pages, while the click graph is a bipartite graph of queries and documents denoting users' searching behavior extracted from a search engine's query log.

Our most important motivation is to model in a unified way the two main activities of users on the Web: searching and browsing, and at the same time to analyze the effects of random walks on this new graph. The intuition behind this task is to measure how the combination of link structure and usage data provide additional information to that contained in these structures independently.

Our experimental results show that both hyperlink and click graphs have strengths and weaknesses when it comes to using their stationary distribution scores for ranking Web pages. Furthermore, our evaluation indicates that the unified graph always generates consistent and robust scores that follow closely the best result obtained from either individual graph, even when applied to "noisy" data. It is our belief that the unified Web graph has several useful properties for improving current Web document ranking, as well as for generating new rankings of its own. In particular stationary distribution scores derived from the random walks on the combined graph can be used as an indicator of whether structural or usage data are more reliable in different situations.

References

[1]

R. Baeza-Yates. Graphs from search engine queries. SOFSEM 2007: Theory and Practice of Computer Science, pages 1--8, 2007.

Digital Library

[2]

L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of Web Spam. In Second International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), Seattle, USA, August 2006.

[3]

D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. pages 407--416, 2000.

[4]

S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1--7):107--117, 1998.

Digital Library

[5]

N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 239--246, New York, NY, USA, 2007. ACM Press.

Digital Library

[6]

N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM '08: Proceedings of the international conference on Web search and web data mining, pages 87--94, New York, NY, USA, 2008. ACM.

Digital Library

[7]

B. D. Davison. Topical locality in the web. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pages 272--279, Athens, Greece, 2000. ACM Press.

Digital Library

[8]

G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW2007 workshop Query Log Analysis: Social and Technological Challenges, 2007.

[9]

D. Fetterly. Adversarial information retrieval: The manipulation of web content. ACM Computing Reviews, July 2007.

[10]

Z. Gyöngyi and H. Garcia-Molina. Spam: It's not just for inboxes anymore. IEEE Computer Magazine, 38(10):28--34, 2005.

Digital Library

[11]

G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543, New York, NY, USA, 2002. ACM Press.

Digital Library

[12]

J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.

Digital Library

[13]

W. Kruskal and L. Goodman. Measures of association for cross classifications. Journal of the American Statistical Association, 49, 1954.

[14]

M. Lifantsev. Voting model for ranking Web pages. In P. Graham and M. Maheswaran, editors, Proceedings of the International Conference on Internet Computing, pages 143--148, Las Vegas, Nevada, USA, June 2000. CSREA Press.

[15]

R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.

Digital Library

[16]

F. Radlinski. Addressing malicious noise in clickthrough data. In Learning to Rank for Information Retrieval Workshop at SIGIR 2007, 2007.

[17]

F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248, New York, NY, USA, 2005. ACM Press.

Digital Library

[18]

J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl., 1(2):12--23, 2000.

Digital Library

[19]

W. Xi, B. Zhang, Z. Chen, Y. Lu, S. Yan, W.-Y. Ma, and E. A. Fox. Link fusion: a unified link analysis framework for multi-type interrelated data objects. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 319--327, New York, NY, USA, 2004. ACM.

Digital Library

Cited By

Brusilovsky PSmyth BShapira B(2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
https://doi.org/10.1007/978-3-319-90092-6_7
Neria MYacovzada NBen-Gal I(2017)A Risk-Scoring Feedback Model for Webpages and Web Users Based on Browsing BehaviorACM Transactions on Intelligent Systems and Technology10.1145/29282748:4(1-21)Online publication date: 6-May-2017
https://dl.acm.org/doi/10.1145/2928274
Jiang DYang L(2016)Query intent inference via search engine logKnowledge and Information Systems10.1007/s10115-015-0915-749:2(661-685)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1007/s10115-015-0915-7
Show More Cited By

Index Terms

Dr. Searcher and Mr. Browser: a unified hyperlink-click graph
1. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining

Recommendations

Overviewing the Knowledge of a Query Keyword by Clustering Viewpoints of Web Search Information Needs
WAINA '15: Proceedings of the 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops

In this paper, we address the issue of how to overview the knowledge of a given query keyword. We especially focus on concerns of those who search for Web pages with a given query keyword, and study how to efficiently overview the whole list of Web ...
Web Mining: Key Accomplishments, Applications and Future Directions
DSDE '10: Proceedings of the 2010 International Conference on Data Storage and Data Engineering

The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying ...
Note: The diameter of protean graphs

The web graph is a real-world self-organizing network whose vertices correspond to web pages, and whose edges correspond to links between pages. Many stochastic models for the web graph have been recently proposed, with the aim of reproducing one or ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

1562 pages

ISBN:9781595939913

DOI:10.1145/1458082

General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM08

Sponsor:

CIKM08: Conference on Information and Knowledge Management

October 26 - 30, 2008

California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
357
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Brusilovsky PSmyth BShapira B(2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
https://doi.org/10.1007/978-3-319-90092-6_7
Neria MYacovzada NBen-Gal I(2017)A Risk-Scoring Feedback Model for Webpages and Web Users Based on Browsing BehaviorACM Transactions on Intelligent Systems and Technology10.1145/29282748:4(1-21)Online publication date: 6-May-2017
https://dl.acm.org/doi/10.1145/2928274
Jiang DYang L(2016)Query intent inference via search engine logKnowledge and Information Systems10.1007/s10115-015-0915-749:2(661-685)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1007/s10115-015-0915-7
Ustinovskiy YMazur ASerdyukov P(2013)Intent-Based browse activity segmentationProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_21(242-253)Online publication date: 24-Mar-2013
https://dl.acm.org/doi/10.1007/978-3-642-36973-5_21
Mendes PMika PZaragoza HBlanco RChen XLebanon GWang HZaki M(2012)Measuring website similarity using an entity-aware click graphProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398500(1697-1701)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396761.2398500
Spirin NHan J(2012)Survey on web spam detectionACM SIGKDD Explorations Newsletter10.1145/2207243.220725213:2(50-64)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1145/2207243.2207252
Keikha MCrestani FCarman M(2012)Employing document dependency in blog searchJournal of the American Society for Information Science and Technology10.1002/asi.2168763:2(354-365)Online publication date: 1-Feb-2012
https://dl.acm.org/doi/10.1002/asi.21687
Gao BLiu TLiu YWang TMa ZLi H(2011)Page importance computation based on Markov processesInformation Retrieval10.1007/s10791-011-9164-x14:5(488-514)Online publication date: 3-Mar-2011
https://doi.org/10.1007/s10791-011-9164-x
Donato D(2010)Graph structures and algorithms for query-log analysisProceedings of the Programs, proofs, process and 6th international conference on Computability in Europe10.5555/1876420.1876434(126-131)Online publication date: 30-Jun-2010
https://dl.acm.org/doi/10.5555/1876420.1876434
Poblete BBustos BMendoza MBarrios JHuang JKoudas NJones GWu XCollins-Thompson KAn A(2010)Visual-semantic graphsProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871670(1553-1556)Online publication date: 26-Oct-2010
https://dl.acm.org/doi/10.1145/1871437.1871670
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten