short-paper

Flat vs. hierarchical phrase-based translation models for cross-language information retrieval

Authors:

Jimmy LinAuthors Info & Claims

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pages 813 - 816

https://doi.org/10.1145/2484028.2484137

Published: 28 July 2013 Publication History

Abstract

Although context-independent word-based approaches remain popular for cross-language information retrieval, many recent studies have shown that integrating insights from modern statistical machine translation systems can lead to substantial improvements in effectiveness. In this paper, we compare flat and hierarchical phrase-based translation models for query translation. Both approaches yield significantly better results than either a token-based or a one-best translation baseline on standard test collections. The choice of model manifests interesting tradeoffs in terms of effectiveness, efficiency, and model compactness.

References

[1]

M. Adriani and C. Van Rijsbergen. Phrase identification in cross-language information retrieval. RIAO, 2000.

[2]

L. Ballesteros and W. Croft. Phrasal translation and query expansion techniques for cross-language information retrieval. SIGIR Forum, 31:84--91, 1997.

Digital Library

[3]

A. Berger and J. Lafferty. Information retrieval as statistical translation. SIGIR, 1999.

Digital Library

[4]

P. Brown, V. Pietra, S. Pietra, and R. Mercer. The mathematics of statistical machine translation: parameter estimation. CL, 19(2):263--311, 1993.

Digital Library

[5]

D. Chiang. Hierarchical phrase-based translation. CL, 33(2):201--228, 2007.

Digital Library

[6]

K. Darwish and D. Oard. Probabilistic structured query methods. SIGIR, 2003.

Digital Library

[7]

C. Dyer, J. Weese, H. Setiawan, A. Lopez, F. Ture, V. Eidelman, J. Ganitkevitch, P. Blunsom, and P. Resnik. cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models. ACL Demos, 2010.

Digital Library

[8]

M. Federico and N. Bertoldi. Statistical cross-language information retrieval using n-best query translations. SIGIR, 2002.

Digital Library

[9]

J. Gao, J.-Y. Nie, G. Wu, and G. Cao. Dependence language model for information retrieval. SIGIR, 2004.

Digital Library

[10]

J. Gao, J.-Y. Nie, and M. Zhou. Statistical query translation models for cross-language information retrieval. TALIP, 5(4):323--359, 2006.

Digital Library

[11]

B. Jones, J. Andreas, D. Bauer, K. Hermann, and K. Knight. Semantics-based machine translation with hyperedge replacement grammars. COLING, 2012.

[12]

P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: open source toolkit for statistical machine translation. ACL Demos, 2007.

Digital Library

[13]

P. Koehn, F. Och, and D. Marcu. Statistical phrase-based translation. NAACL-HLT, 2003.

Digital Library

[14]

W. Kraaij, J. Nie, and M. Simard. Embedding web-based statistical translation models in cross-language information retrieval. CL, 29(3):381--419, 2003.

Digital Library

[15]

Y. Liu, R. Jin, and J. Chai. A maximum coherence model for dictionary-based cross-language information retrieval. SIGIR, 2005.

Digital Library

[16]

A. Lopez. Statistical machine translation. ACM Computing Surveys, 40(3):8:1--8:49, 2008.

Digital Library

[17]

W. Magdy and G. Jones. Should MT systems be used as black boxes in CLIR? ECIR, 2011.

Digital Library

[18]

D. Metzler and W. Croft. A Markov random field model for term dependencies. SIGIR, 2005.

Digital Library

[19]

V. Nikoulina, B. Kovachev, N. Lagos, and C. Monz. Adaptation of statistical machine translation model for cross-language information retrieval in a service context. EACL, 2012.

Digital Library

[20]

J. Ponte and W. Croft. A language modeling approach to information retrieval. SIGIR, 1998.

Digital Library

[21]

H.-C. Seo, S.-B. Kim, H.-C. Rim, and S.-H. Myaeng. Improving query translation in English-Korean cross-language information retrieval. IP&M, 41(3):507--522, 2005.

Digital Library

[22]

M. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. CIKM, 2007.

Digital Library

[23]

F. Ture. Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation. PhD thesis, University of Maryland, College Park, 2013.

[24]

F. Ture, J. Lin, and D. Oard. Looking inside the box: context-sensitive translation for cross-language information retrieval. SIGIR, 2012.

Digital Library

[25]

D. Wu. A polynomial-time algorithm for statistical machine translation. ACL, 1996.

Digital Library

[26]

J. Xu and R. Weischedel. Empirical studies on the impact of lexical resources on CLIR performance. IP&M, 41(3):475--487, 2005.

Digital Library

[27]

K. Yamada and K. Knight. A syntax-based statistical translation model. ACL, 2001.

Digital Library

[28]

W. Zhang, S. Liu, C. Yu, C. Sun, F. Liu, and W. Meng. Recognition and classification of noun phrases in queries for effective retrieval. CIKM, 2007.

Digital Library

Cited By

Manwar VManwar A(2024)Information Retrieval of Marathi Query from a Linguistic PerspectiveInnovations in Cybersecurity and Data Science10.1007/978-981-97-5791-6_52(717-728)Online publication date: 13-Dec-2024
https://doi.org/10.1007/978-981-97-5791-6_52
Dai JZhou D(2022)A Novel Cross Language Neural Retrieval Model2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA)10.1109/ICDSCA56264.2022.9988147(894-903)Online publication date: 28-Oct-2022
https://doi.org/10.1109/ICDSCA56264.2022.9988147
Yu PFei HLi P(2021)Cross-lingual Language Model Pretraining for RetrievalProceedings of the Web Conference 202110.1145/3442381.3449830(1029-1039)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449830
Show More Cited By

Index Terms

Flat vs. hierarchical phrase-based translation models for cross-language information retrieval
1. Information systems
  1. Information retrieval

Recommendations

Incremental syntactic language models for phrase-based translation
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing. Bottom-up and top-down parsers typically require a completed string as input. This requirement ...
Statistical query translation models for cross-language information retrieval

Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This article presents three statistical query translation models that focus on the ...
Towards a new possibilistic query translation tool for cross-language information retrieval

Approaches of query translation in Cross-Language Information Retrieval (CLIR) have frequently used dictionaries which suffer from translation ambiguity. Besides, a word-by-word query translation is not sufficient. In this paper, we propose, evaluate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

July 2013

1188 pages

ISBN:9781450320344

DOI:10.1145/2484028

General Chairs:
Gareth J.F. Jones
Dublin City University, Ireland
,
Páraic Sheridan
Dublin City University, Ireland
,
Program Chairs:
Diane Kelly
University of North Carolina, Chapel Hill, USA
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Tetsuya Sakai
Microsoft Research Asia, China

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '13

Sponsor:

SIGIR

SIGIR '13: The 36th International ACM SIGIR conference on research and development in Information Retrieval

July 28 - August 1, 2013

Dublin, Ireland

Acceptance Rates

SIGIR '13 Paper Acceptance Rate 73 of 366 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
258
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Manwar VManwar A(2024)Information Retrieval of Marathi Query from a Linguistic PerspectiveInnovations in Cybersecurity and Data Science10.1007/978-981-97-5791-6_52(717-728)Online publication date: 13-Dec-2024
https://doi.org/10.1007/978-981-97-5791-6_52
Dai JZhou D(2022)A Novel Cross Language Neural Retrieval Model2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA)10.1109/ICDSCA56264.2022.9988147(894-903)Online publication date: 28-Oct-2022
https://doi.org/10.1109/ICDSCA56264.2022.9988147
Yu PFei HLi P(2021)Cross-lingual Language Model Pretraining for RetrievalProceedings of the Web Conference 202110.1145/3442381.3449830(1029-1039)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449830
Ture FLin J(2014)Exploiting Representations from Statistical Machine Translation for Cross-Language Information RetrievalACM Transactions on Information Systems10.1145/264480732:4(1-32)Online publication date: 28-Oct-2014
https://dl.acm.org/doi/10.1145/2644807

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten