Abstract
We describe a statistical algorithm for machine translation intended to provide translations of large document collections at speeds far in excess of traditional machine translation systems, and of sufficiently high quality to perform information retrieval on the translated document collections. The model is trained from a parallel corpus and is capable of disambiguating senses of words. Information retrieval (IR) experiments on a French language dataset from a recent cross-language information retrieval evaluation yields results superior to those obtained by participants in the evaluation, and confirm the importance of word sense disambiugation in cross-language information retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Harman and E. Voorhees, “Overview of the Sixth Text REtrieval Conference (TREC6)”, in The 6th Text REtrieval Conference (TREC-6).
P. F. Brown et al.“The mathematics of statistical machine translation: Parameter estimation”, Computational Lingustics, 19(2), 263–311, June 1993.
L.R. Bahl, F. Jelinek, and R.L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition”, in IEEE Transactions on Pattern Analysis and Machine Intelligence 5(2), 1983.
A. Berger, S. Della Pietra, V. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing”, in Computational Linguistics, vol. 22(1), p. 39 (1996).
S. Della Pietra, V. Della Pietra, and J. Lafferty, “Inducing Features of Random Fields”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), p. 380, (1997).
E. Chan, S. Garcia, S. Roukos, “TREC-5 Ad Hoc Retrieval Using K Nearest-Neighbors Re-Scoring” in The 5th Text REtrieval Conference (TREC-5) ed. by E.M. Voorhees and D.K Harman.
M. Franz and S. Roukos, “TREC-6 Ad-hoc Retrieval”, in The 6th Text REtrieval Conference (TREC-6).
S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, M. Gatford, “Okapi at TREC-3” in Proceedings of the Third Text REtrieval Conference (TREC-3) ed. by D.K. Harman. NIST Special Publication 500-225, 1995.
E.P. Chan, S. Garcia, and S. Roukos, “Probabilistic Model for Information Retrieval with Unsupervised Training Data”, to appear in Proceedings, Fourth International Conference on Knowledge Discovery and Data Mining (1998)
D.W. Oard, P. Hackett, “Document Translation for Cross-Language Text Retrieval at the University of Maryland”, in The 6th Text REtrieval Conference (TREC-6) ed. by E.M. Voorhees and D.K. Harman.
B. Merialdo 1990 “Tagging text with a probabilistic model,” in Proceedings of the IBM Natural Language ITL, Paris, France, pp. 161–172.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McCarley, J.S., Roukos, S. (1998). Fast Document Translation for Cross-Language Information Retrieval. In: Farwell, D., Gerber, L., Hovy, E. (eds) Machine Translation and the Information Soup. AMTA 1998. Lecture Notes in Computer Science(), vol 1529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49478-2_14
Download citation
DOI: https://doi.org/10.1007/3-540-49478-2_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65259-5
Online ISBN: 978-3-540-49478-2
eBook Packages: Springer Book Archive