Abstract
By learning bilingual suffixation operations from translations using an existing bilingual lexicon with near translation forms we can improve its coverage and hence deal with the OOV entries. From this perspective, we identify bilingual stems, their bilingual morphological extensions (bilingual suffixes) and subsequently clusters of bilingual suffixes using known translation forms seen in an existing bilingual translation lexicon. We rely on clustering to enable safer translation generalisations. The degree of co-occurrence between two bilingual morphological extensions with reference to common bilingual stems determines if each of them should fall in the same cluster. Results are discussed for language pairs English-Portuguese (EN-PT) and English-Hindi (EN-HI).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note the null suffix in EN corresponding to gender and number suffixes in HI.
- 2.
Translations that are lexically similar.
- 3.
A suffix cluster may or may not correspond to Part-of-Speech such as noun or adjective but there are cases where the same suffix cluster aggregates nouns, adjectives and adverbs.
- 4.
- 5.
2\(^{nd}\) line in each row shows the transliterations for HI terms.
- 6.
- 7.
EMILLE Corpus - http://www.emille.lancs.ac.uk/
- 8.
DGT-TM - https://open-data.europa.eu/en/data/dataset/dgt-translation-memory Europarl - http://www.statmt.org/europarl/ OPUS (EUconst, EMEA) - http://opus.lingfil.uu.se/
- 9.
In the Table 4, only two bilingual suffixes are shown per cluster although the original clusters contains varying number of bilingual suffixes ranging from 2 to 15 for EN-PT and from 2 to 5 for EN-HI.
References
Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P.: Identification of bilingual segments for translation generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 167–178. Springer, Heidelberg (2014)
Lindén, K.: Assigning an inflectional paradigm using the longest matching affix. In: Mitään ongelmia, E., Wiberg, M., Koura, A. (eds.) Juhlakirja Juhani Reimanille 50-vuotispäiväksi 23.1.2008. Turku 2008 (2008)
Desai, S., Pawar, J., Bhattacharyya, P.: A framework for learning morphology using suffix association matrix. In: WSSANLP-2014, pp. 28–36 (2014)
Dasgupta, S., Ng, V.: Unsupervised word segmentation for bangla. In: Proceedings of ICON, pp. 15–24 (2007)
Da Silva, J.F., Lopes, G.P.: Extracting multiword terms from document collections. In: Proceedings of the VExTAL: Venezia per il Trattamento Automatico delle Lingue, pp. 22–24 (1999)
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computat. Linguist. 19(2), 263–311 (1993)
Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. Proc. RANLP 2009, 214–218 (2009)
Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009)
Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011)
Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Bilingually motivated segmentation and generation of word translations using relatively small translation data sets. In: Proceedings of the PACLIC29 (Accepted) (2015)
Acknowledgements
K.M. Kavitha and Luís Gomes acknowledge the Research Fellowship by FCT/MCTES with Ref. nos., SFRH/BD/64371/2009 and SFRH/BD/65059/2009, respectively, and the funded research project ISTRION (Ref. PTDC/EIA-EIA/114521/2009) that provided other means for the research carried out. The authors thank NOVA LINCS, FCT/UNL for the support, SJEC for providing the financial assistance to participate in MIKE 2015, and ISTRION BOX - Translation&Revision, Lda., for providing the data and valuable consultation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kavitha, K.M., Gomes, L., Lopes, J.G.P. (2015). Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_57
Download citation
DOI: https://doi.org/10.1007/978-3-319-26832-3_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)