Skip to main content

Positional Translation Language Model for Ad-Hoc Information Retrieval

  • Conference paper
  • 4094 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Abstract

Most existing language modeling approaches are based on the term independence hypothesis. To go beyond this assumption, two main directions were investigated. The first one considers the use of the proximity features that capture the degree to which search terms appear close to each other in a document. Another one considers the use of semantic relationships between words. Previous studies have proven that these two types of information, including term proximity features and semantic relationships between words, are both useful to improve retrieval performance. Intuitionally, we can use them in combination to further improve retrieval performance. Based on this idea, this paper propose a positional translation language model to explicitly incorporate both of these two types of information under language modeling framework in a unified way. In the first step, we present a proximity-based method to estimate word-word translation probabilities. Then, we define a translation document model for each position of a document and use these document models to score the document. Experimental results on standard TREC collections show that the proposed model achieves significant improvements over the state-of-the-art models, including positional language model, and translation language models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bai, J., Song, D., Bruza, P., Nie, J.Y., Cao, G.: Query expansion using term relationships in language models for information retrieval. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 688–695. ACM, New York (2005)

    Google Scholar 

  2. Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–229. ACM, New York (1999)

    Google Scholar 

  3. Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–622. ACM, New York (2006)

    Google Scholar 

  4. Cao, G., Nie, J.Y., Bai, J.: Integrating word relationships into language models. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 298–305. ACM, New York (2005)

    Google Scholar 

  5. Jin, R., Hauptmann, A.G., Zhai, C.X.: Title language model for information retrieval. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–48. ACM, New York (2002)

    Google Scholar 

  6. Jing, Y., Croft, W.B.: An association thesaurus for information retrieval. In: Proceedings of RIAO, pp. 146–160 (1994)

    Google Scholar 

  7. Karimzadehgan, M., Zhai, C.X.: Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 323–330. ACM, New York (2010)

    Google Scholar 

  8. Karimzadehgan, M., Zhai, C.: Axiomatic analysis of translation language model for information retrieval. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 268–280. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Keen, E.M.: Some aspects of proximity searching in text retrieval systems. The Journal of Information Science 18(2), 89–98 (1992)

    Article  Google Scholar 

  10. Lv, Y., Zhai, C.X.: Positional language models for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306. ACM, New York (2009)

    Google Scholar 

  11. Miao, J., Huang, X.J., Ye, Z.: Proximity-based Rocchio’s model for pseudo relevance. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 535–544. ACM, New York (2012)

    Google Scholar 

  12. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM, New York (1998)

    Google Scholar 

  13. Schütze, H., Pedersen, J.O.: A co-occurrence based thesaurus and two applications to information retrieval. Information Processing and Management 33(3), 307–318 (1997)

    Article  Google Scholar 

  14. Tao, T., Zhai, C.X.: An exploration of proximity measures in information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 295–302. ACM, New York (2007)

    Google Scholar 

  15. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 61–69. ACM, New York (1994)

    Google Scholar 

  16. Zhao, J., Yun, Y.: A proximity language model for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 291–298. ACM, New York (2009)

    Google Scholar 

  17. Zhao, J., Huang, X., He, B.: CRTER: Using cross terms to enhance probabilistic information retrieval. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 155–164. ACM, New York (2011)

    Google Scholar 

  18. Zhai, C.X., Lafferty, J.: A study of smoothing methods for language models applied to Ad Hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 334–342. ACM, New York (2001)

    Google Scholar 

  19. Zhai, C.X., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 403–410. ACM, New York (2001)

    Google Scholar 

  20. Zhai, C.X.: Statistical Language Models for Information Retrieval A Critical Review. Foundations and Trends in Information Retrieval 2(3), 137–213 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tu, X., Luo, J., Li, B., He, T., Gu, J. (2014). Positional Translation Language Model for Ad-Hoc Information Retrieval. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06605-9_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06604-2

  • Online ISBN: 978-3-319-06605-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics