Exploring the Relevance of Bilingual Morph-Units in Automatic Induction of Translation Templates

Mahesh, Kavitha Karimbi; Gomes, Luís; Lopes, José Gabriel Pereira

doi:10.1007/978-3-030-03928-8_34

Kavitha Karimbi Mahesh^17,18,
Luís Gomes¹⁷ &
José Gabriel Pereira Lopes¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11238))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1234 Accesses
2 Citations

Abstract

To tackle the problem of out-of-vocabulary (OOV) words and improve bilingual lexicon coverage, the relevance of bilingual morph-units is explored in inducing translation patterns considering unigram to n-gram and n-gram to unigram translations. The approach relies on induction of translation templates using bilingual stems learnt from automatically acquired bilingual translation lexicons. By generalising the templates using bilingual suffix clusters, new translations are automatically suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Translations not present in the existing lexicon.
2.
DGT-TM - https://open-data.europa.eu/en/data/dataset/dgt-translation-memory
Europarl - http://www.statmt.org/europarl/
OPUS (EUconst, EMEA) - http://opus.lingfil.uu.se/.
3.
Not in the bilingual lexicon that was used for aligning the parallel texts.
4.
Word-to-word translations taken from the lexicon discussed in Sect. 3.1.
5.
Translations that are lexically similar.
6.
A suffix cluster may or may not correspond to Part-of-Speech such as noun or adjective but there are cases where the same suffix cluster aggregates nouns, adjectives and adverbs.
7.
Verb - (‘’,‘ar’) and (‘e’,‘ar’).
8.
$$_{2511}$ represents the stem ‘declar’ in English and $$T_{2511\#8}$ represents its translation in Portuguese, which is ‘declar’ as well.
9.
13830 contract $\leftrightarrow $ 13830#2 contra, 13830 contract $\leftrightarrow $ 13830#1 contrat and 13831 buyout $\leftrightarrow $ 13831#3 compra.
10.
masculine plural.

References

Yang, M., Kirchhoff, K.: Phrase-based backoff models for machine translation of highly inflected languages. In: Proceedings of EACL, pp. 41–48 (2006)
Google Scholar
de Gispert, A., Mariño, J.B. Crego, J.M.: Improving statistical machine translation by classifying and generalizing inflected verb forms. In: Proceedings of 9th European Conference on Speech Communication and Technology, Lisboa, Portugal , pp. 3193–3196 (2005)
Google Scholar
Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 209–217. ACL (2009)
Google Scholar
Momouchi, H.S.K.A.Y., Tochinai, K.: Prediction method of word for translation of unknown word. In: Proceedings of the IASTED International Conference, Artificial Intelligence and Soft Computing, 27 July–1 August 1997, Banff, Canada, p. 228. Acta Pr. (1997)
Google Scholar
Snyder, B., Barzilay, R.: Unsupervised multilingual learning for morphological segmentation. In: Proceedings of ACL 2008: HLT, pp. 737–745. ACL (2008)
Google Scholar
Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P.: Identification of bilingual segments for translation generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 167–178. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12571-8_15
Chapter Google Scholar
Cicekli, I., Güvenir, H.A.: Learning translation templates from bilingual translation examples. In: Carl, M., Way, A. (eds.) Recent Advances in Example-Based Machine Translation. TLTB, vol. 21, pp. 255–286. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0181-6_9
Chapter MATH Google Scholar
Rile, H., Zong, C., Bo, X.: An approach to automatic acquisition of translation templates based on phrase structure extraction and alignment. IEEE Trans. Audio Speech Lang. Process. 14(5), 1656–1663 (2006)
Article Google Scholar
Gangadharaiah, R., Brown, R.D., Carbonell, J.: Phrasal equivalence classes for generalized corpus-based machine translation. In: Gelbukh, A. (ed.) CICLing 2011. LNCS, vol. 6609, pp. 13–28. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19437-5_2
Chapter Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Google Scholar
Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of Recent Advances in Natural Language Processing, pp. 214–218 (2009)
Google Scholar
Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS (LNAI), vol. 5816, pp. 587–597. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04686-5_48
Chapter Google Scholar
Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 624–633. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24769-9_45
Chapter Google Scholar
Gomes, L.. Lopes, G.P.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, pp. 513–524, October 2009
Google Scholar
Gomes, L.: Translation alignment and extraction within a lexica-centered iterative workflow. Ph.D. thesis, Lisboa, Portugal, December 2017
Google Scholar
Kavitha, K.M., Gomes, L., Aires, J., Lopes, J.G.P.: Classification and selection of translation candidates for parallel corpora alignment. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS (LNAI), vol. 9273, pp. 723–734. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23485-4_73
Chapter Google Scholar
Costa, J., Gomes, L., Lopes, G.P., Russo, L.M.S.: Improving bilingual search performance using compact full-text indices. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 582–595. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18111-0_44
Chapter Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, pp. 52–61. Cambridge University Press, Cambridge (1997)
Book Google Scholar
Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Learning clusters of bilingual suffixes using bilingual translation lexicon. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 607–615. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_57
Chapter Google Scholar

Download references

Acknowledgements

K. M. Kavitha and Luís Gomes acknowledge the Research Fellowship by FCT/MCTES with Ref. nos., SFRH/BD/64371/2009 and SFRH/BD/65059/2009, respectively, and the funded research project ISTRION (Ref. PTDC/EIA-EIA/114521/2009) that provided other means for the research carried out. The authors thank NOVA LINCS, FCT/UNL for the support and SJEC for the partial financial assistance provided.

Author information

Authors and Affiliations

NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Lisbon, Portugal
Kavitha Karimbi Mahesh, Luís Gomes & José Gabriel Pereira Lopes
Department of Computer Science and Engineering, St Joseph Engineering College Vamanjoor, Mangaluru, 575 028, India
Kavitha Karimbi Mahesh

Authors

Kavitha Karimbi Mahesh
View author publications
You can also search for this author in PubMed Google Scholar
Luís Gomes
View author publications
You can also search for this author in PubMed Google Scholar
José Gabriel Pereira Lopes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kavitha Karimbi Mahesh .

Editor information

Editors and Affiliations

Universidad Nacional del Sur, Bahía Blanca, Buenos Aires, Argentina
Guillermo R. Simari
University of Madeira, Funchal, Portugal
Eduardo Fermé
Universidad Nacional de Piura, Castilla-Piura, Peru
Flabio Gutiérrez Segura
Universidad Nacional de Trujillo, Trujillo, Peru
José Antonio Rodríguez Melquiades

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahesh, K.K., Gomes, L., Lopes, J.G.P. (2018). Exploring the Relevance of Bilingual Morph-Units in Automatic Induction of Translation Templates. In: Simari, G., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2018. IBERAMIA 2018. Lecture Notes in Computer Science(), vol 11238. Springer, Cham. https://doi.org/10.1007/978-3-030-03928-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-03928-8_34
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03927-1
Online ISBN: 978-3-030-03928-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics