skip to main content
10.1145/2484028.2484137acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Flat vs. hierarchical phrase-based translation models for cross-language information retrieval

Published: 28 July 2013 Publication History

Abstract

Although context-independent word-based approaches remain popular for cross-language information retrieval, many recent studies have shown that integrating insights from modern statistical machine translation systems can lead to substantial improvements in effectiveness. In this paper, we compare flat and hierarchical phrase-based translation models for query translation. Both approaches yield significantly better results than either a token-based or a one-best translation baseline on standard test collections. The choice of model manifests interesting tradeoffs in terms of effectiveness, efficiency, and model compactness.

References

[1]
M. Adriani and C. Van Rijsbergen. Phrase identification in cross-language information retrieval. RIAO, 2000.
[2]
L. Ballesteros and W. Croft. Phrasal translation and query expansion techniques for cross-language information retrieval. SIGIR Forum, 31:84--91, 1997.
[3]
A. Berger and J. Lafferty. Information retrieval as statistical translation. SIGIR, 1999.
[4]
P. Brown, V. Pietra, S. Pietra, and R. Mercer. The mathematics of statistical machine translation: parameter estimation. CL, 19(2):263--311, 1993.
[5]
D. Chiang. Hierarchical phrase-based translation. CL, 33(2):201--228, 2007.
[6]
K. Darwish and D. Oard. Probabilistic structured query methods. SIGIR, 2003.
[7]
C. Dyer, J. Weese, H. Setiawan, A. Lopez, F. Ture, V. Eidelman, J. Ganitkevitch, P. Blunsom, and P. Resnik. cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models. ACL Demos, 2010.
[8]
M. Federico and N. Bertoldi. Statistical cross-language information retrieval using n-best query translations. SIGIR, 2002.
[9]
J. Gao, J.-Y. Nie, G. Wu, and G. Cao. Dependence language model for information retrieval. SIGIR, 2004.
[10]
J. Gao, J.-Y. Nie, and M. Zhou. Statistical query translation models for cross-language information retrieval. TALIP, 5(4):323--359, 2006.
[11]
B. Jones, J. Andreas, D. Bauer, K. Hermann, and K. Knight. Semantics-based machine translation with hyperedge replacement grammars. COLING, 2012.
[12]
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: open source toolkit for statistical machine translation. ACL Demos, 2007.
[13]
P. Koehn, F. Och, and D. Marcu. Statistical phrase-based translation. NAACL-HLT, 2003.
[14]
W. Kraaij, J. Nie, and M. Simard. Embedding web-based statistical translation models in cross-language information retrieval. CL, 29(3):381--419, 2003.
[15]
Y. Liu, R. Jin, and J. Chai. A maximum coherence model for dictionary-based cross-language information retrieval. SIGIR, 2005.
[16]
A. Lopez. Statistical machine translation. ACM Computing Surveys, 40(3):8:1--8:49, 2008.
[17]
W. Magdy and G. Jones. Should MT systems be used as black boxes in CLIR? ECIR, 2011.
[18]
D. Metzler and W. Croft. A Markov random field model for term dependencies. SIGIR, 2005.
[19]
V. Nikoulina, B. Kovachev, N. Lagos, and C. Monz. Adaptation of statistical machine translation model for cross-language information retrieval in a service context. EACL, 2012.
[20]
J. Ponte and W. Croft. A language modeling approach to information retrieval. SIGIR, 1998.
[21]
H.-C. Seo, S.-B. Kim, H.-C. Rim, and S.-H. Myaeng. Improving query translation in English-Korean cross-language information retrieval. IP&M, 41(3):507--522, 2005.
[22]
M. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. CIKM, 2007.
[23]
F. Ture. Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation. PhD thesis, University of Maryland, College Park, 2013.
[24]
F. Ture, J. Lin, and D. Oard. Looking inside the box: context-sensitive translation for cross-language information retrieval. SIGIR, 2012.
[25]
D. Wu. A polynomial-time algorithm for statistical machine translation. ACL, 1996.
[26]
J. Xu and R. Weischedel. Empirical studies on the impact of lexical resources on CLIR performance. IP&M, 41(3):475--487, 2005.
[27]
K. Yamada and K. Knight. A syntax-based statistical translation model. ACL, 2001.
[28]
W. Zhang, S. Liu, C. Yu, C. Sun, F. Liu, and W. Meng. Recognition and classification of noun phrases in queries for effective retrieval. CIKM, 2007.

Cited By

View all

Index Terms

  1. Flat vs. hierarchical phrase-based translation models for cross-language information retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
    July 2013
    1188 pages
    ISBN:9781450320344
    DOI:10.1145/2484028
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 July 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. query translation
    2. scfg

    Qualifiers

    • Short-paper

    Conference

    SIGIR '13
    Sponsor:

    Acceptance Rates

    SIGIR '13 Paper Acceptance Rate 73 of 366 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Information Retrieval of Marathi Query from a Linguistic PerspectiveInnovations in Cybersecurity and Data Science10.1007/978-981-97-5791-6_52(717-728)Online publication date: 13-Dec-2024
    • (2022)A Novel Cross Language Neural Retrieval Model2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA)10.1109/ICDSCA56264.2022.9988147(894-903)Online publication date: 28-Oct-2022
    • (2021)Cross-lingual Language Model Pretraining for RetrievalProceedings of the Web Conference 202110.1145/3442381.3449830(1029-1039)Online publication date: 19-Apr-2021
    • (2014)Exploiting Representations from Statistical Machine Translation for Cross-Language Information RetrievalACM Transactions on Information Systems10.1145/264480732:4(1-32)Online publication date: 28-Oct-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media