Fast Document Translation for Cross-Language Information Retrieval

McCarley, J. Scott; Roukos, Salim

doi:10.1007/3-540-49478-2_14

J. Scott McCarley⁴ &
Salim Roukos⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1529))

Included in the following conference series:

Conference of the Association for Machine Translation in the Americas

684 Accesses
2 Citations

Abstract

We describe a statistical algorithm for machine translation intended to provide translations of large document collections at speeds far in excess of traditional machine translation systems, and of sufficiently high quality to perform information retrieval on the translated document collections. The model is trained from a parallel corpus and is capable of disambiguating senses of words. Information retrieval (IR) experiments on a French language dataset from a recent cross-language information retrieval evaluation yields results superior to those obtained by participants in the evaluation, and confirm the importance of word sense disambiugation in cross-language information retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Harman and E. Voorhees, “Overview of the Sixth Text REtrieval Conference (TREC6)”, in The 6th Text REtrieval Conference (TREC-6).
Google Scholar
P. F. Brown et al.“The mathematics of statistical machine translation: Parameter estimation”, Computational Lingustics, 19(2), 263–311, June 1993.
Google Scholar
L.R. Bahl, F. Jelinek, and R.L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition”, in IEEE Transactions on Pattern Analysis and Machine Intelligence 5(2), 1983.
Google Scholar
A. Berger, S. Della Pietra, V. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing”, in Computational Linguistics, vol. 22(1), p. 39 (1996).
Google Scholar
S. Della Pietra, V. Della Pietra, and J. Lafferty, “Inducing Features of Random Fields”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), p. 380, (1997).
Article Google Scholar
E. Chan, S. Garcia, S. Roukos, “TREC-5 Ad Hoc Retrieval Using K Nearest-Neighbors Re-Scoring” in The 5th Text REtrieval Conference (TREC-5) ed. by E.M. Voorhees and D.K Harman.
Google Scholar
M. Franz and S. Roukos, “TREC-6 Ad-hoc Retrieval”, in The 6th Text REtrieval Conference (TREC-6).
Google Scholar
S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, M. Gatford, “Okapi at TREC-3” in Proceedings of the Third Text REtrieval Conference (TREC-3) ed. by D.K. Harman. NIST Special Publication 500-225, 1995.
Google Scholar
E.P. Chan, S. Garcia, and S. Roukos, “Probabilistic Model for Information Retrieval with Unsupervised Training Data”, to appear in Proceedings, Fourth International Conference on Knowledge Discovery and Data Mining (1998)
Google Scholar
D.W. Oard, P. Hackett, “Document Translation for Cross-Language Text Retrieval at the University of Maryland”, in The 6th Text REtrieval Conference (TREC-6) ed. by E.M. Voorhees and D.K. Harman.
Google Scholar
B. Merialdo 1990 “Tagging text with a probabilistic model,” in Proceedings of the IBM Natural Language ITL, Paris, France, pp. 161–172.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY, 10598, USA
J. Scott McCarley & Salim Roukos

Authors

J. Scott McCarley
View author publications
You can also search for this author in PubMed Google Scholar
Salim Roukos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computing Research Lab, New Mexico State University, Box 30001 / 3CRL, Las Cruces, NM, 88003, USA
David Farwell
SYSTRAN Inc., 7855 Fay Avenue, Suite 300, P.O. Box 907, La Jolla, CA, 92037, USA
Laurie Gerber
Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA, 90292-6695, USA
Eduard Hovy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McCarley, J.S., Roukos, S. (1998). Fast Document Translation for Cross-Language Information Retrieval. In: Farwell, D., Gerber, L., Hovy, E. (eds) Machine Translation and the Information Soup. AMTA 1998. Lecture Notes in Computer Science(), vol 1529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49478-2_14

Download citation

DOI: https://doi.org/10.1007/3-540-49478-2_14
Published: 24 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65259-5
Online ISBN: 978-3-540-49478-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics