skip to main content
10.1145/1148170.1148224acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Building implicit links from content for forum search

Published:06 August 2006Publication History

ABSTRACT

The objective of Web forums is to create a shared space for open communications and discussions of specific topics and issues. The tremendous information behind forum sites is not fully-utilized yet. Most links between forum pages are automatically created, which means the link-based ranking algorithm cannot be applied efficiently. In this paper, we proposed a novel ranking algorithm which tries to introduce the content information into link-based methods as implicit links. The basic idea is derived from the more focused random surfer: the surfer may more likely jump to a page which is similar to what he is reading currently. In this manner, we are allowed to introduce the content similarities into the link graph as a personalization bias. Our method, named Fine-grained Rank (FGRank), can be efficiently computed based on an automatically generated topic hierarchy. Not like the topic-sensitive PageRank, our method only need to compute single PageRank score for each page. Another contribution of this paper is to present a very efficient algorithm for automatically generating topic hierarchy and map each page in a large-scale collection onto the computed hierarchy. The experimental results show that the proposed method can improve retrieval performance, and reveal that content-based link graph is also important compared with the hyper-link graph.

References

  1. Google search engine. http://www. google. comGoogle ScholarGoogle Scholar
  2. Yahoo! search engine. http://search. yahoo. comGoogle ScholarGoogle Scholar
  3. R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web dynamics, age and page quality. In Proc. of SPIRE 2002 Lisbon, Portugal, September 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. D. Baker and A. McCallum. Distributional clustering of words for text classification. In Proc. of the 21st annual international ACM SIGIR conference on Research and development in information retrieval pages 96--103, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Bergmark, C. Lagoze, and A. Sbityakov. Focused crawls, tunneling, and digital libraries. In Proc. of the 6th European Conference on Digital Libraries pages 91--106, September 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Boldi, M. Santini, and S. Vigna. Pagerank as a function of the damping factor. In Proc. of the 14th international conference on World Wide Web Chiba, Japan, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. of 7th International World Wide Web Conference May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. of the 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining pages 269--274, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. S. Dhillon, S. Mallela, and R. Kumar. Enhanced word clustering for hierarchical text classification. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Diligenti, M. Gori, and M. Maggini. Web page scoring systems for horizontal and vertical search. In Proc. of the 11st International World Wide Web Conference May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Dumais and H. Chen. Hierarchical classification of web content. In Proc. of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval August 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. H. Haveliwala. Topic-sensitive pagerank. In Proc. of the 7th International World Wide Web Conference 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. K. Jain and R. C. Dubes. Algorithms for clustering data Prentice Hall, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Jeh and J. Widom. Scaling personalized web search. In Proc. of the 12th International World Wide Web Conference 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the web for computing. Technical eport, Stanford University, Stanford, CA, 2003.Google ScholarGoogle Scholar
  16. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604--622, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In Proc. of the 14th International Conference on Machine Learning pages 170--178, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Lang. News weeder: Learning to filter netnews. In Proc. of 12th International Conference on Machine Learning pages 331--339, 1995.Google ScholarGoogle Scholar
  19. T. Li, S. Zhu, and M. Ogihara. Topic hierarchy generation via linear discriminant projection. In Proc. of the 26th annual international ACM SIGIR conference on Research and development in information retrieval Toronto, Canada, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems volume 14, pages 849--856, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking:Bringing order to the web. Technical eport, Stanford University, Stanford, CA, 1998.Google ScholarGoogle Scholar
  22. M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems volume 14, Cambridge, MA, 2002. MIT Press.Google ScholarGoogle Scholar
  23. S. E. Robertson. Overview of the okapi projects. Journal of Documentation 53(1), 1997.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Vaithyanathan and B. Dom. Model-based hierarchical clustering. In Proc. of 6th Conferenceon Uncertainty in Artificial Intelligence pages 599--608, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Wang, A. Shakery, and T. Tao. Dirichlet pagerank. In Proc. of the 27th annual international ACM SIGIR conference on Research and development in information retrieval pages 661--662, Salvador, Brazil, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. Xi, J. Lind, and E. Brill. Learning effective ranking functions for newsgroup search. In Proc. of the 27th annual international ACM SIGIR conference on Research and development in information retrieval pages 394--401, Sheffield, United Kingdom, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. R. Xue, Q. Yang, H. J. Zeng, Y. Yu, and Z. Chen. Exploiting the hierarchical structure for link analysis. In Proc. of the 28th annual international ACM SIGIR conference on Research and development in information retrieval Salvador, Brazil, August 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. of the 15th International Conference on Machine Learning pages 412--420, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Building implicit links from content for forum search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
          August 2006
          768 pages
          ISBN:1595933697
          DOI:10.1145/1148170

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 August 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader