skip to main content
10.1145/1458082.1458231acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Dr. Searcher and Mr. Browser: a unified hyperlink-click graph

Published: 26 October 2008 Publication History

Abstract

We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink and click graphs. The hyperlink graph expresses link structure among Web pages, while the click graph is a bipartite graph of queries and documents denoting users' searching behavior extracted from a search engine's query log.
Our most important motivation is to model in a unified way the two main activities of users on the Web: searching and browsing, and at the same time to analyze the effects of random walks on this new graph. The intuition behind this task is to measure how the combination of link structure and usage data provide additional information to that contained in these structures independently.
Our experimental results show that both hyperlink and click graphs have strengths and weaknesses when it comes to using their stationary distribution scores for ranking Web pages. Furthermore, our evaluation indicates that the unified graph always generates consistent and robust scores that follow closely the best result obtained from either individual graph, even when applied to "noisy" data. It is our belief that the unified Web graph has several useful properties for improving current Web document ranking, as well as for generating new rankings of its own. In particular stationary distribution scores derived from the random walks on the combined graph can be used as an indicator of whether structural or usage data are more reliable in different situations.

References

[1]
R. Baeza-Yates. Graphs from search engine queries. SOFSEM 2007: Theory and Practice of Computer Science, pages 1--8, 2007.
[2]
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of Web Spam. In Second International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), Seattle, USA, August 2006.
[3]
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. pages 407--416, 2000.
[4]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1--7):107--117, 1998.
[5]
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 239--246, New York, NY, USA, 2007. ACM Press.
[6]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM '08: Proceedings of the international conference on Web search and web data mining, pages 87--94, New York, NY, USA, 2008. ACM.
[7]
B. D. Davison. Topical locality in the web. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pages 272--279, Athens, Greece, 2000. ACM Press.
[8]
G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW2007 workshop Query Log Analysis: Social and Technological Challenges, 2007.
[9]
D. Fetterly. Adversarial information retrieval: The manipulation of web content. ACM Computing Reviews, July 2007.
[10]
Z. Gyöngyi and H. Garcia-Molina. Spam: It's not just for inboxes anymore. IEEE Computer Magazine, 38(10):28--34, 2005.
[11]
G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543, New York, NY, USA, 2002. ACM Press.
[12]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.
[13]
W. Kruskal and L. Goodman. Measures of association for cross classifications. Journal of the American Statistical Association, 49, 1954.
[14]
M. Lifantsev. Voting model for ranking Web pages. In P. Graham and M. Maheswaran, editors, Proceedings of the International Conference on Internet Computing, pages 143--148, Las Vegas, Nevada, USA, June 2000. CSREA Press.
[15]
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
[16]
F. Radlinski. Addressing malicious noise in clickthrough data. In Learning to Rank for Information Retrieval Workshop at SIGIR 2007, 2007.
[17]
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248, New York, NY, USA, 2005. ACM Press.
[18]
J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl., 1(2):12--23, 2000.
[19]
W. Xi, B. Zhang, Z. Chen, Y. Lu, S. Yan, W.-Y. Ma, and E. A. Fox. Link fusion: a unified link analysis framework for multi-type interrelated data objects. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 319--327, New York, NY, USA, 2004. ACM.

Cited By

View all

Index Terms

  1. Dr. Searcher and Mr. Browser: a unified hyperlink-click graph

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. random-walks
      2. search engine queries
      3. structure mining
      4. usage mining
      5. web graphs

      Qualifiers

      • Research-article

      Conference

      CIKM08
      CIKM08: Conference on Information and Knowledge Management
      October 26 - 30, 2008
      California, Napa Valley, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
      • (2017)A Risk-Scoring Feedback Model for Webpages and Web Users Based on Browsing BehaviorACM Transactions on Intelligent Systems and Technology10.1145/29282748:4(1-21)Online publication date: 6-May-2017
      • (2016)Query intent inference via search engine logKnowledge and Information Systems10.1007/s10115-015-0915-749:2(661-685)Online publication date: 1-Nov-2016
      • (2013)Intent-Based browse activity segmentationProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_21(242-253)Online publication date: 24-Mar-2013
      • (2012)Measuring website similarity using an entity-aware click graphProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398500(1697-1701)Online publication date: 29-Oct-2012
      • (2012)Survey on web spam detectionACM SIGKDD Explorations Newsletter10.1145/2207243.220725213:2(50-64)Online publication date: 1-May-2012
      • (2012)Employing document dependency in blog searchJournal of the American Society for Information Science and Technology10.1002/asi.2168763:2(354-365)Online publication date: 1-Feb-2012
      • (2011)Page importance computation based on Markov processesInformation Retrieval10.1007/s10791-011-9164-x14:5(488-514)Online publication date: 3-Mar-2011
      • (2010)Graph structures and algorithms for query-log analysisProceedings of the Programs, proofs, process and 6th international conference on Computability in Europe10.5555/1876420.1876434(126-131)Online publication date: 30-Jun-2010
      • (2010)Visual-semantic graphsProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871670(1553-1556)Online publication date: 26-Oct-2010
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media