Skip to main content

Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Included in the following conference series:

Abstract

Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we study how to exploit the naturally available raw thread structures of forums to improve retrieval accuracy in the language modeling framework. Specifically, we propose and study two different schemes for smoothing the language model of a forum post based on the thread containing the post. We explore several different variants of the two schemes to exploit thread structures in different ways. We also create a human annotated test data set for forum post retrieval and evaluate the proposed smoothing methods using this data set. The experiment results show that the proposed methods for leveraging forum threads to improve estimation of document language models are effective, and they outperform the existing smoothing methods for the forum post retrieval task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.boardtracker.com/cgi-bin/about.pl?page=1

  2. Cong, G., Wang, L., Lin, C.-Y., Song, Y.-I., Sun, Y.: Finding question-answer pairs from online forums. In: SIGIR 2008, pp. 467–474. ACM, New York (2008)

    Google Scholar 

  3. Hiemstra, D.: Statistical language models for intelligent XML retrieval. In: Intelligent Search on XML Data, pp. 107–118 (2003)

    Google Scholar 

  4. Hiemstra, D., Kraaij, W.: Twenty-one at trec-7: Ad-hoc and cross-language track. In: TREC 1999, pp. 227–238 (1999)

    Google Scholar 

  5. Hong, L., Davison, B.D.: A classification-based approach to question answering in discussion boards. In: SIGIR 2009 (2009)

    Google Scholar 

  6. Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: SIGIR 2001, September 2001, pp. 111–119 (2001)

    Google Scholar 

  7. Lin, C., Yang, J.-M., Cai, R., Wang, X.-J., Wang, W.: Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications. In: SIGIR 2009, pp. 131–138. ACM, New York (2009)

    Google Scholar 

  8. Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: SIGIR 2004, pp. 186–193. ACM Press, New York (2004)

    Google Scholar 

  9. Miller, D.R.H., Leek, T., Schwartz, R.M.: BBN at trec7: Using hidden markov models for information retrieval. In: Proceedings of the Seventh Text REtrieval Conference (TREC-7), pp. 80–89 (1998)

    Google Scholar 

  10. Ogilvie, P., Callan, J.: Hierarchical language models for xml component retrieval. In: Proceedings of INEX Workshop

    Google Scholar 

  11. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281. ACM Press, New York (1998)

    Google Scholar 

  12. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR, pp. 232–241 (1994)

    Google Scholar 

  13. Seo, J., Croft, W.B., Smith, D.A.: Online community search using thread structure. In: CIKM 2009, pp. 1907–1910. ACM, New York (2009)

    Google Scholar 

  14. Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: HLT-NAACL 2006, pp. 407–414. Association for Computational Linguistics, Morristown (2006)

    Google Scholar 

  15. Weerkamp, W., Balog, K., de Rijke, M.: Using contextual information to improve search in email archives. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 400–411. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  16. Xu, G., Ma, W.-Y.: Building implicit links from content for forum search. In: SIGIR 2006, pp. 300–307. ACM, New York (2006)

    Google Scholar 

  17. Zhai, C.: Statistical Language Models for Information Retrieval. In: Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers (2008)

    Google Scholar 

  18. Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: CIKM 2001, pp. 403–410. ACM Press, New York (2001)

    Google Scholar 

  19. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001, pp. 334–342. ACM Press, New York (2001)

    Google Scholar 

  20. Zhai, C., Lafferty, J.: Two-stage language models for information retrieval. In: SIGIR 2002 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Duan, H., Zhai, C. (2011). Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20161-5_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20160-8

  • Online ISBN: 978-3-642-20161-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics