skip to main content
10.1145/1148170.1148224acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections

Building implicit links from content for forum search

Published: 06 August 2006 Publication History


The objective of Web forums is to create a shared space for open communications and discussions of specific topics and issues. The tremendous information behind forum sites is not fully-utilized yet. Most links between forum pages are automatically created, which means the link-based ranking algorithm cannot be applied efficiently. In this paper, we proposed a novel ranking algorithm which tries to introduce the content information into link-based methods as implicit links. The basic idea is derived from the more focused random surfer: the surfer may more likely jump to a page which is similar to what he is reading currently. In this manner, we are allowed to introduce the content similarities into the link graph as a personalization bias. Our method, named Fine-grained Rank (FGRank), can be efficiently computed based on an automatically generated topic hierarchy. Not like the topic-sensitive PageRank, our method only need to compute single PageRank score for each page. Another contribution of this paper is to present a very efficient algorithm for automatically generating topic hierarchy and map each page in a large-scale collection onto the computed hierarchy. The experimental results show that the proposed method can improve retrieval performance, and reveal that content-based link graph is also important compared with the hyper-link graph.


Google search engine. http://www. google. com
Yahoo! search engine. http://search. yahoo. com
R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web dynamics, age and page quality. In Proc. of SPIRE 2002 Lisbon, Portugal, September 2002.
L. D. Baker and A. McCallum. Distributional clustering of words for text classification. In Proc. of the 21st annual international ACM SIGIR conference on Research and development in information retrieval pages 96--103, 1998.
D. Bergmark, C. Lagoze, and A. Sbityakov. Focused crawls, tunneling, and digital libraries. In Proc. of the 6th European Conference on Digital Libraries pages 91--106, September 2002.
P. Boldi, M. Santini, and S. Vigna. Pagerank as a function of the damping factor. In Proc. of the 14th international conference on World Wide Web Chiba, Japan, May 2005.
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. of 7th International World Wide Web Conference May 1998.
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. of the 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining pages 269--274, 2001.
I. S. Dhillon, S. Mallela, and R. Kumar. Enhanced word clustering for hierarchical text classification. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2002.
M. Diligenti, M. Gori, and M. Maggini. Web page scoring systems for horizontal and vertical search. In Proc. of the 11st International World Wide Web Conference May 2002.
S. Dumais and H. Chen. Hierarchical classification of web content. In Proc. of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval August 2000.
T. H. Haveliwala. Topic-sensitive pagerank. In Proc. of the 7th International World Wide Web Conference 2002.
A. K. Jain and R. C. Dubes. Algorithms for clustering data Prentice Hall, 1988.
G. Jeh and J. Widom. Scaling personalized web search. In Proc. of the 12th International World Wide Web Conference 2003.
S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the web for computing. Technical eport, Stanford University, Stanford, CA, 2003.
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604--622, 1999.
D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In Proc. of the 14th International Conference on Machine Learning pages 170--178, 1997.
K. Lang. News weeder: Learning to filter netnews. In Proc. of 12th International Conference on Machine Learning pages 331--339, 1995.
T. Li, S. Zhu, and M. Ogihara. Topic hierarchy generation via linear discriminant projection. In Proc. of the 26th annual international ACM SIGIR conference on Research and development in information retrieval Toronto, Canada, 2003.
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems volume 14, pages 849--856, 2002.
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking:Bringing order to the web. Technical eport, Stanford University, Stanford, CA, 1998.
M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems volume 14, Cambridge, MA, 2002. MIT Press.
S. E. Robertson. Overview of the okapi projects. Journal of Documentation 53(1), 1997.
S. Vaithyanathan and B. Dom. Model-based hierarchical clustering. In Proc. of 6th Conferenceon Uncertainty in Artificial Intelligence pages 599--608, 2000.
X. Wang, A. Shakery, and T. Tao. Dirichlet pagerank. In Proc. of the 27th annual international ACM SIGIR conference on Research and development in information retrieval pages 661--662, Salvador, Brazil, 2005.
W. Xi, J. Lind, and E. Brill. Learning effective ranking functions for newsgroup search. In Proc. of the 27th annual international ACM SIGIR conference on Research and development in information retrieval pages 394--401, Sheffield, United Kingdom, 2004.
G. R. Xue, Q. Yang, H. J. Zeng, Y. Yu, and Z. Chen. Exploiting the hierarchical structure for link analysis. In Proc. of the 28th annual international ACM SIGIR conference on Research and development in information retrieval Salvador, Brazil, August 2005.
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. of the 15th International Conference on Machine Learning pages 412--420, 1997.

Cited By

View all



Information & Contributors


Published In

cover image ACM Conferences
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
August 2006
768 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006


Request permissions for this article.

Check for updates

Author Tags

  1. PageRank
  2. categorization
  3. clustering
  4. forum search
  5. hierarchy generation


  • Article


SIGIR06: The 29th Annual International SIGIR Conference
August 6 - 11, 2006
Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics


Cited By

View all
  • (2021)Content and link-structure perspective of ranking webpagesComputer Science Review10.1016/j.cosrev.2021.10039740:COnline publication date: 1-May-2021
  • (2019)Uncovering Hidden Links Between Images Through Their Textual ContextEnterprise Information Systems10.1007/978-3-030-26169-6_18(370-395)Online publication date: 28-Jul-2019
  • (2018)Web Forum Retrieval and Text AnalyticsFoundations and Trends in Information Retrieval10.1561/150000006212:1(1-163)Online publication date: 3-Jan-2018
  • (2017)Conflict in CommentsProceedings of the 2017 CHI Conference on Human Factors in Computing Systems10.1145/3025453.3025902(655-666)Online publication date: 2-May-2017
  • (2016)Measuring Similarity SimilarlyACM Transactions on Intelligent Systems and Technology10.1145/28905108:1(1-28)Online publication date: 26-Sep-2016
  • (2016)HICCKnowledge and Information Systems10.1007/s10115-015-0823-x46:2(343-367)Online publication date: 1-Feb-2016
  • (2016)Identifying the role of individual user messages in an online discussion and its use in thread retrievalJournal of the Association for Information Science and Technology10.1002/asi.2337367:2(276-288)Online publication date: 1-Feb-2016
  • (2014)Discovering High-Quality Threaded Discussions in Online ForumsJournal of Computer Science and Technology10.1007/s11390-014-1446-529:3(519-531)Online publication date: 17-May-2014
  • (2014)A Unified Fusion Framework for Time-Related Rank in Threaded Discussion CommunitiesTrends and Applications in Knowledge Discovery and Data Mining10.1007/978-3-319-13186-3_46(513-524)Online publication date: 26-Nov-2014
  • (2014)Exploiting Near-Duplicate Relations in Organizing News ArchivesInternational Journal of Intelligent Systems10.1002/int.2164729:7(597-614)Online publication date: 1-Jul-2014
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media