Article

A study of statistical models for query translation: finding a good unit of translation

Authors:

Jian-Yun NieAuthors Info & Claims

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 194 - 201

https://doi.org/10.1145/1148170.1148207

Published: 06 August 2006 Publication History

Abstract

This paper presents a study of three statistical query translation models that use different units of translation. We begin with a review of a word-based translation model that uses co-occurrence statistics for resolving translation ambiguities. The translation selection problem is then formulated under the framework of graphic model resorting to which the modeling assumptions and limitations of the co-occurrence model are discussed, and the research of finding better translation units is motivated. Then, two other models that use larger, linguistically motivated translation units (i.e., noun phrase and dependency triple) are presented. For each model, the modeling and training methods are described in detail. All query translation models are evaluated using TREC collections. Results show that larger translation units lead to more specific models that usually achieve better translation and cross-language information retrieval results.

References

[1]

Adriani, M. 2000. Using statistical term similarity for sense disambiguation in cross-language information retrieval. Information Retrieval. 2, 69--80.]]

Digital Library

[2]

Ballesteros, L. and Croft, W. B. 1998. Resolving ambiguity for cross-language retrieval. In SIGIR.]]

Digital Library

[3]

Bian, G. W. and Chen, H. H. 1998. Integrating query translation and document translation in a cross-language information retrieval system. Machine Translation and Information Soup, Lecture Notes in Computer Science, #1529, Spring-Verlag, pp. 250--265.]]

Digital Library

[4]

Chiang. D. 2005. A hierarchical phrase-based model for statistical machine translation. In: ACL 2005.]]

Digital Library

[5]

Ding, Y. and Palmer, M. Machine translation using probabilistic synchronous dependency insertion grammars. In: ACL 2005.]]

Digital Library

[6]

Duda, Richard O, Hart, Peter E. and Stork, David G. 2001. Pattern classification. John Wiley & Sons, Inc.]]

Digital Library

[7]

Fox. H. J. 2002. Phrasal cohesion and statistical machine translation. In: EMNLP 2002.]]

Digital Library

[8]

Gao, J., Li, M., Wu, A. and Huang, C. N. 2005a. Chinese word segmentation and named entity recognition: a pragmatic approach. Computational Linguistics, 31 (4).]]

Digital Library

[9]

Gao, J., Qi, H., Xia, X., and Nie, J-Y. 2005b. Linear discriminant model for information retrieval. In: SIGIR 2005.]]

Digital Library

[10]

Gao, J., Nie, J. Y., He, H., Chen, W. and Zhou, M. 2002. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In: SIGIR 2002, pp. 183--190.]]

Digital Library

[11]

Gao, J., Nie, J. Y., Zhang, J., Xun, E., Zhou, M., and Huang, C. 2001b. Improving query translation for CLIR using statistical Models. In: SIGIR'01, New Orleans, Louisiana, pp. 96--104.]]

Digital Library

[12]

Heidorn, G. 2000. Intelligent writing assistance. In Dale et al. editor Handbook of Natural Language Processing, Marcel Dekker.]]

[13]

Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. 1999. An introduction to variational methods for graphical models. In Jordan, M. I. editor, Learning in Graphical Models. MIT Press, Cambridge, MA, 1999.]]

Digital Library

[14]

Koehn, P. 2003. Noun Phrase Translation. Ph.D. thesis, University of Southern California.]]

Digital Library

[15]

Liu, Y., Jin, R. and Chai, J. Y. 2005. A maximum coherence model for dictionary-based cross-language information retrieval. In: SIGIR.]]

Digital Library

[16]

Och, F. J. 2002. Statistical machine translation: from single-word models to alignment templates. Ph. D. thesis. RWTH Aachen.]]

[17]

Och, F. J. 2003. Minimum error rate training in statistical machine translation. In: ACL 2003.]]

Digital Library

[18]

Press, W. H., S. A. Teukolsky, W. T. Vetterling and B. P. Flannery. 1992. Numerical Recipes In C: The Art of Scientific Computing. New York: Cambridge Univ. Press.]]

Digital Library

[19]

Quirk, C. Menezes, A., and Cherry, C. 2005. Dependency treelet translation: syntactically informed phrasal SMT. In: ACL 2005.]]

Digital Library

[20]

Resnik, P. and Smith, N. A. 2003. The web as a parallel corpus. Computational Linguistics. 29 (2003). pp.349--380]]

Digital Library

[21]

Robertson, S. E., and Walker, S. 2000. Microsoft Cambridge at TREC-9: Filtering track. In: TREC-9, pp. 361--368.]]

[22]

van Rijsbergen, C. J. 1979. Information retrieval, 2nd ed. Butterworths, London.]]

Digital Library

[23]

Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23 (3), pp. 377--403.]]

Digital Library

[24]

Xu, J., and Weischedel, R. 2000b. TREC-9 cross-language retrieval at BBN. In: TREC-9, NIST, Gaithersbury, MD.]]

[25]

Yamada, K. and Knight, K. 2001. A syntax based statistical translation model. In: ACL 2001.]]

Digital Library

[26]

Zhang, Y., Wu, K., Gao, J. and Vines, P. 2006. Automatic acquisition of Chinese-English parallel corpus from the web. In: ECIR2006]]

Digital Library

[27]

Zhou, M., Ding, Y., and Huang, C. 2001. Improving translation selection with a new translation model trained by independent monolingual corpora. Computational linguistics and Chinese Language Processing. Vol. 6, No. 1, pp 1--26.]]

Cited By

Taan AKhan SRaza AHanif AAnwar H(2021)Comparative Analysis of Information Retrieval Models on Quran Dataset in Cross-Language Information Retrieval SystemsIEEE Access10.1109/ACCESS.2021.31261689(169056-169067)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3126168
Zhang JHuang YJiang QLu WShen H(2018)Query translation based on visual information2018 Tenth International Conference on Advanced Computational Intelligence (ICACI)10.1109/ICACI.2018.8377521(563-567)Online publication date: Mar-2018
https://doi.org/10.1109/ICACI.2018.8377521
Bosca ACasu MDragoni MDi Francescomarino C(2014)Using Semantic and Domain-Based Information in CLIR SystemsThe Semantic Web: Trends and Challenges10.1007/978-3-319-07443-6_17(240-254)Online publication date: 2014
https://doi.org/10.1007/978-3-319-07443-6_17
Show More Cited By

Index Terms

A study of statistical models for query translation: finding a good unit of translation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Statistical query translation models for cross-language information retrieval

Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This article presents three statistical query translation models that focus on the ...
A study of using an out-of-box commercial MT system for query translation in CLIR
iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searching

Recent availability of commercial online machine translation (MT) systems makes it possible for layman Web users to utilize the MT capability for cross-language information retrieval (CLIR). To study the effectiveness of using MT for query translation, ...
Transitive dictionary translation challenges direct dictionary translation in CLIR

The paper reports on experiments carried out in transitive translation, a branch of cross-language information retrieval (CLIR). By transitive translation we mean translation of search queries into the language of the document collection through an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

August 2006

768 pages

ISBN:1595933697

DOI:10.1145/1148170

General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR06

Sponsor:

SIGIR06: The 29th Annual International SIGIR Conference

August 6 - 11, 2006

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
806
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Taan AKhan SRaza AHanif AAnwar H(2021)Comparative Analysis of Information Retrieval Models on Quran Dataset in Cross-Language Information Retrieval SystemsIEEE Access10.1109/ACCESS.2021.31261689(169056-169067)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3126168
Zhang JHuang YJiang QLu WShen H(2018)Query translation based on visual information2018 Tenth International Conference on Advanced Computational Intelligence (ICACI)10.1109/ICACI.2018.8377521(563-567)Online publication date: Mar-2018
https://doi.org/10.1109/ICACI.2018.8377521
Bosca ACasu MDragoni MDi Francescomarino C(2014)Using Semantic and Domain-Based Information in CLIR SystemsThe Semantic Web: Trends and Challenges10.1007/978-3-319-07443-6_17(240-254)Online publication date: 2014
https://doi.org/10.1007/978-3-319-07443-6_17
Gurrutxaga ALeturia ISaralegi XVicente I(2013)Automatic Comparable Web Corpora Collection and Bilingual Terminology Extraction for Specialized Dictionary MakingBuilding and Using Comparable Corpora10.1007/978-3-642-20128-8_3(51-75)Online publication date: 14-Dec-2013
https://doi.org/10.1007/978-3-642-20128-8_3
Zhou DTruran MBrailsford TWade VAshman H(2012)Translation techniques in cross-language information retrievalACM Computing Surveys10.1145/2379776.237977745:1(1-44)Online publication date: 7-Dec-2012
https://dl.acm.org/doi/10.1145/2379776.2379777
Ye ZHuang JHe BLin H(2012)Mining a multilingual association dictionary from Wikipedia for cross-language information retrievalJournal of the American Society for Information Science and Technology10.1002/asi.2269663:12(2474-2487)Online publication date: 1-Dec-2012
https://dl.acm.org/doi/10.1002/asi.22696
Lin XJiang D(2011)English-to-Chinese Translation for Technical Terms Based on Improved Mutual InformationAdvances in Computer, Communication, Control and Automation10.1007/978-3-642-25541-0_66(519-525)Online publication date: 2011
https://doi.org/10.1007/978-3-642-25541-0_66
Nie J(2010)Cross-Language Information RetrievalSynthesis Lectures on Human Language Technologies10.2200/S00266ED1V01Y201005HLT0083:1(1-125)Online publication date: Jan-2010
https://doi.org/10.2200/S00266ED1V01Y201005HLT008
Noh TPark SYoon HLee SPark SAllan JAslam JSanderson MZhai CZobel J(2009)An automatic translation of tags for multimedia contents using folksonomy networksProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval10.1145/1571941.1572026(492-499)Online publication date: 19-Jul-2009
https://dl.acm.org/doi/10.1145/1571941.1572026
Guo JXu GCheng XLi HAllan JAslam JSanderson MZhai CZobel J(2009)Named entity recognition in queryProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval10.1145/1571941.1571989(267-274)Online publication date: 19-Jul-2009
https://dl.acm.org/doi/10.1145/1571941.1571989
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten