skip to main content
10.1145/2505515.2507864acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Exploiting proximity feature in statistical translation models for information retrieval

Published: 27 October 2013 Publication History

Abstract

A main challenge in applying translation language models to information retrieval is how to estimate the 'true' probability that a query could be generated as a translation of a document. The state-of-art methods rely on document-based word co-occurrences to estimate word-word translation probabilities. However, these methods do not take into account the proximity of co-occurrences. Intuitively, the proximity of co-occurrences can be exploited to estimate more accurate translation probabilities, since two words occur closer are more likely to be related. In this paper, we study how to explicitly incorporate proximity information into the existing translation language model, and propose a proximity-based translation language model, called TM-P, with three variants. In our TM-P models, a new concept (proximity-based word co-occurrence frequency) is introduced to model the proximity of word co-occurrences, which is then used to estimate translation probabilities. Experimental results on standard TREC collections show that our TM-P models achieve significant improvements over the state-of-the-art translation models.

References

[1]
Berger, A. and Lafferty, J. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval .SIGIR '99. ACM, New York, NY, USA, 222--229.
[2]
Büttcher, S., Clarke, C. L. A., and Lushman, B. 2006. Term proximity scoring for ad-hoc retrieval on very large text collections. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR '06. ACM, New York, NY, USA, 621--622.
[3]
Jin, R., Hauptmann, A. G. and Zhai, C. X. 2002. Title language model for information retrieval. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR '02. ACM, New York, NY, USA, 42--48.
[4]
Karimzadehgan, M. and Zhai, C. X. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval .SIGIR '10. ACM, New York, NY, USA, 323--330.
[5]
Karimzadehgan, M. and Zhai, C. X. 2012. Axiomatic analysis of translation language model for information retrieval. In Proceedings of the 34th European conference on Advances in Information Retrieval. ECIR'12. Springer-Verlag, Berlin, Heidelberg, 268--280.
[6]
Keen, E. M. 1992. Some aspects of proximity searching in text retrieval systems. J. Inf. Sci. 18, 2 (February 1992), 89--98.
[7]
Lv, Y. and Zhai, C. X. 2009. Positional language models for information retrieval. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. SIGIR '09. ACM, New York, NY, USA, 299--306.
[8]
Miao, J., Huang, X. J. and Ye, Z. 2012. Proximity-based Rocchio's model for pseudo relevance. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. SIGIR '12. ACM, New York, NY, USA, 535--544.
[9]
Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR '98. ACM, New York, NY, USA, 275--281.
[10]
Tao, T. and Zhai, C. X. 2007. An exploration of proximity measures in information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR '07. ACM, New York, NY, USA, 295--302.
[11]
Zhao, J., and Yun, Y. 2009. A proximity language model for information retrieval. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. SIGIR '09. ACM, New York, NY, USA, 291--298.
[12]
Zhao, J., Huang, X. and He, B. 2011. CRTER: using cross terms to enhance probabilistic information retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. SIGIR '11. ACM, New York, NY, USA, 155--164.
[13]
Zhai, C.X. and Lafferty, J. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR '01. ACM, New York, NY, USA, 334--342.
[14]
Zhai, C.X. 2008. Statistical Language Models for Information Retrieval A Critical Review. Found. Trends Inf. Retr. 2, 3 (March 2008), 137--213.

Index Terms

  1. Exploiting proximity feature in statistical translation models for information retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
    October 2013
    2612 pages
    ISBN:9781450322638
    DOI:10.1145/2505515
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information retrieval
    2. proximity information
    3. translation language models

    Qualifiers

    • Poster

    Conference

    CIKM'13
    Sponsor:
    CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
    October 27 - November 1, 2013
    California, San Francisco, USA

    Acceptance Rates

    CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 190
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media