Abstract
This paper describes an alignment system that aligns texts at the word level in Hindi-Punjabi parallel corpus. The previous aligner was based on length based estimation approach. In the previous version, multi-word unit & sometime one-to-one produces alignment errors. In this improved version, different techniques like Boundary Detection, Dictionary-Lookup (DL), Nearest-align-Neighbor (NAN) and Scoring based Minimum distance function to improve the accuracy has been used. Alignment of words means to identify correspondences between words in source language and target language sentences. This automatic word alignment of Hindi-Punjabi corpus is very useful in automatically developing Hindi-Punjabi dictionary. In the previous version, the system accuracy was claimed to be 89.5 % approximately but after rigorous testing, it is found to be 65%. After implementing above techniques in the improved system explained here, system accuracy was found to be 99.09% for one-to-one word alignment and 80% accuracy for multi-word alignment.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kawtrakul, A., Thumkanon, C., Oovorawan, Y., Varasrai, P., Suktarachan, M.: Automatic Thaiunknown word recognition. In: Proceedings of the Pacific Rim Symposium on Natural Language Processing, Thail, pp. 341–348 (1997)
Aswani, N., Gaizauskas, R.: Aligning words in English-Hindi parallel corpora. In: Proceeding of the ACL Workshop on Bilingual & Using Parallel Texts, Ann Arbor, pp. 115–118 (June 2005)
Dagan, I., Church, K., Gale, W.: Robust Bilingual Word Alignment for Machine Translation. In: Proceedings of the Workshop on Very Large Corpora (1993)
Goyal, V., Garcha, L.: Automatic Word Alignment Algorithm for Bilingual Hindi-Punjabi Parallel Text. In: Proceeding of the IACC, Patiala (2009)
Somboonphol, N., Sornlertlamvanich, V.: Statistical Technique for Estimating Word correspondence for Bilingual Dictionary Development. In: Proceedings of SNLP-Oriental COCOSDA (2002)
Wu, D.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In: Proc. of the 32nd Annual Conference of the ACL, Las Cruces, NM, pp. 80–87 (1994)
Gaizauskas, R., Aswani, N.: A hybrid approach to align sentences & words. In: Proceeding of the ACL Workshop on Bilingual & Using Parallel texts, Ann Arbor, pp. 57–64 (June 2005)
Moore, K.: The Ultimate VB .NET and ASP.NET Code Book
Macdonald, M.: Beginning ASP.NET in VB .NET: From Novice to Professional
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jindal, K., Goyal, V. (2012). Improved Algorithm for Automatic Word Alignment for Hindi-Punjabi Parallel Corpus. In: Kannan, R., Andres, F. (eds) Data Engineering and Management. ICDEM 2010. Lecture Notes in Computer Science, vol 6411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27872-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-27872-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27871-6
Online ISBN: 978-3-642-27872-3
eBook Packages: Computer ScienceComputer Science (R0)