An English-Hindi Statistical Machine Translation System

Udupa U., Raghavendra; Faruquie, Tanveer A.

doi:10.1007/978-3-540-30211-7_27

Raghavendra Udupa U.²² &
Tanveer A. Faruquie²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

International Conference on Natural Language Processing

1675 Accesses
5 Citations

Abstract

Recently statistical methods for natural language translation have become popular and found reasonable success. In this paper we describe an English-Hindi statistical machine translation system. Our machine translation system is based on IBM Models 1, 2, and 3. We present experimental results on an English-Hindi parallel corpus consisting of 150,000 sentence pairs. We propose two new algorithms for the transfer of fertility parameters from Model 2 to Model 3. Our algorithms have a worst case time complexity of O(m ³) improving on the exponential time algorithm proposed in the classical paper on IBM Models. When the maximum fertility of a word is small, our algorithms are O(m ²) and hence very efficient in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of Statistical Machine Translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)
Google Scholar
Berger, A., Della Pietra, S., Della Pietra, V.: A maximum entropy approach to natural language processing. Computational linguistics 22(1) (1996)
Google Scholar
Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class based n-gram models for natural language. Computational linguistics 18(4) (1992)
Google Scholar
Berger, A., Brown, P., Della Pietra, S., Della Pietra, V., Gillette, J., Laffert, J., Mercer, R., Printz, H., Ures, L.: The Candide system for machine translation. In: Proceedings of the ARPA Human Language Technology Workshop (1994)
Google Scholar
Baum, L.E.: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3, 1–8 (1972)
Google Scholar
Knight, K.: Decoding complexity in word replacement translation models. Computational Linguistics 25(4) (1999)
Google Scholar
Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM Research Journal 13 (1969)
Google Scholar
Brown, R.D.: Example-based Machine Translation in the Pangloss System. In: International Conference on Computational Linguistics (COLING 1996), Copenhagen, Denmark (August 1996)
Google Scholar
Sinha, R.M.K., Sivaraman, K., Agrawal, A., Jain, R., Srivastava, R., Jain, A.: ANGLABHARTI: A Multilingual Machine Aided Translation Project on Trans lation from English to Hindi. In: IEEE International Conference on Systems, Man and Cybernetics, Vancouver, Canada (1995)
Google Scholar
Tillman, C., Vogel, S., Ney, H., Zubiaga, A.: A DP-based search using monotone alignments in statistical translation. In: Proc. ACL (1997)
Google Scholar
Tillman, C.: Word Re-Odering and Dynamic Programming based Search Algorithm for Statistical Machine Translation. Ph.D. Thesis (2001)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: A Method for Automatic Evaluation of Machine Translation. IBM Research Report, RC22176, W0109-022 (2001)
Google Scholar
Doddington, G.: Automatic Evaluation of Machine Translation Quality using Ngram Co-occurence Statistics. In: Human Language Technology: Notebook Proceedings, pp. 128–132 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM India Research Lab, New Delhi, 110016, India
Raghavendra Udupa U. & Tanveer A. Faruquie

Authors

Raghavendra Udupa U.
View author publications
You can also search for this author in PubMed Google Scholar
Tanveer A. Faruquie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Behavior Design Corporation, IV Science-Based Industrial Park Hsinchu, 2F, No.5, Industry E. Rd, Taiwan
Keh-Yih Su
University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, JST CREST, Honcho 4-1-8, Kawaguchi-shi,, 332-0012, Saitama,
Jun’ichi Tsujii
Pohang University of Science and Technology (POSTECH), AITrc, Republic of Korea
Jong-Hyeok Lee
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Udupa U., R., Faruquie, T.A. (2005). An English-Hindi Statistical Machine Translation System. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-30211-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics