Identification of Bilingual Suffix Classes for Classification and Translation Generation

Kavitha, Karimbi Mahesh; Gomes, Luís; Lopes, José Gabriel Pereira

doi:10.1007/978-3-319-12027-0_13

Identification of Bilingual Suffix Classes for Classification and Translation Generation

Karimbi Mahesh Kavitha^6,8,
Luís Gomes^6,7 &
José Gabriel Pereira Lopes^6,7

Conference paper
First Online: 12 November 2014

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8864))

Abstract

We examine the possibility of learning bilingual morphology using the translation forms taken from an existing, manually validated, bilingual translation lexicon. The objective is to evaluate the use of bilingual stem and suffix based features on the performance of the existing Support Vector Machine based classifier trained to classify the automatically extracted word-to-word translations. We initially induce the bilingual stem and suffix correspondences by considering the longest sequence common to orthogonally similar translations. Clusters of stem-pairs characterised by identical suffix-pairs are formed, which are then used to generate out-of-vocabulary translations that are identical to, but different from, the previously existing translations, thereby completing the existing lexicon. Using the bilingual stem and suffix correspondences induced from the augmented lexicon we come up with 5 new features that reflects the (non)existence of morphological coverage (agreement) between a term and its translation. Specifically, we examine and evaluate the use of suffix classes, bilingual stem and suffix correspondences as features in selecting correct word-to-word translations from among the automatically extracted ones. With a training data of approximately 35.8K word translations for the language pair English-Portuguese, we identified around 6.4K unique stem pairs and 0.25K unique suffix pairs. Further, experimental results show that the newly added features improved the word-to-word classification accuracy by 9.11\(\%\) leading to an overall improvement in the classifier accuracy by 2.15\(\%\) when all translations (single- and multi-word translations) were considered.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009)
Chapter Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of Recent Advances in Natural Language Processing, pp. 214–218 (2009)
Google Scholar
Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011)
Chapter Google Scholar
Gomes, L., Lopes, G.P.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, pp. 513–524 (October 2009)
Google Scholar
Kavitha, K.M., Gomes, L., Lopes, G.P.: Using svms for filtering translation tables for parallel corpora alignment. In: 15th Portuguese Conference in Arificial Intelligence, EPIA 2011, pp. 690–702 (October 2011)
Google Scholar
Déjean, H.: Morphemes as necessary concept for structures discovery from untagged corpora. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 295–298. Association for Computational Linguistics (1998)
Google Scholar
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 153–198 (2001)
Article MathSciNet Google Scholar
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, vol. 6, pp. 21–30. Association for Computational Linguistics (2002)
Google Scholar
Hammarström, H., Borin, L.: Unsupervised learning of morphology. Computational Linguistics 37(2), 309–350 (2011)
Article Google Scholar
Monson, C., Carbonell, J., Lavie, A., Levin, L.: Paramor and morpho challenge 2008. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 967–974. Springer, Heidelberg (2009)
Chapter Google Scholar
Momouchi, H.S.K.A.Y. Tochinai, K.: Prediction method of word for translation of unknown word. In: Proceedings of the IASTED International Conference, Artificial Intelligence and Soft Computing, Banff, Canada, July 27-August 1 1997, p. 228. Acta Pr. (1997)
Google Scholar
de Gispert, A., Marino, J.B.: On the impact of morphology in English to Spanish statistical MT. Speech Communication 50(11–12), 1034–1046 (2008)
Article Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Data Mining and Knowledge Discovery, 1–47 (2000)
Google Scholar
Sato, K., Saito, H.: Extracting word sequence correspondences based on support vector machines. Journal of Natural Language Processing 10(4), 109–124 (2003)
Article Google Scholar
Bergsma, S., Kondrak, G.: Alignment-based discriminative string similarity. In: Annual Meeting-ACL, vol. 45, p. 656 (2007)
Google Scholar
Kutsumi, T., Yoshimi, T., Kotani, K., Sata, I., Isahara, H.: Selection of entries for a bilingual dictionary from aligned translation equivalents using support vector machines. In: Proceedings of Pacific Association for Computational Linguistics
Google Scholar
Gaussier, É.: Unsupervised learning of derivational morphology from inflectional lexicons. In: Proceedings of ACL 1999 Workshop: Unsupervised Learning in Natural Language Processing (1999)
Google Scholar
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 515–524. ACM (2002)
Google Scholar
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology, pp. 52–61. Cambridge Univ. Pr. (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

CITI (NOVA LINCS), Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Quinta da Torre, 2829-516, Caparica, Portugal
Karimbi Mahesh Kavitha, Luís Gomes & José Gabriel Pereira Lopes
ISTRION BOX-Translation and Revision, Lda., Parkurbis, 6200-865, Covilhã, Portugal
Luís Gomes & José Gabriel Pereira Lopes
Department of Computer Applications, St. Joseph Engineering College, Vamanjoor, Mangalore, 575 028, India
Karimbi Mahesh Kavitha

Authors

Karimbi Mahesh Kavitha
View author publications
You can also search for this author in PubMed Google Scholar
Luís Gomes
View author publications
You can also search for this author in PubMed Google Scholar
José Gabriel Pereira Lopes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karimbi Mahesh Kavitha .

Editor information

Editors and Affiliations

Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ana L.C. Bazzan
Pontifica Universidad Católica (PUC), Santiago de Chile, Chile
Karim Pichara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kavitha, K.M., Gomes, L., Lopes, J.G.P. (2014). Identification of Bilingual Suffix Classes for Classification and Translation Generation. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-12027-0_13
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics