Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon

Kavitha, K. M.; Gomes, Luís; Lopes, José Gabriel P.

doi:10.1007/978-3-319-26832-3_57

K. M. Kavitha^16,18,
Luís Gomes^16,17 &
José Gabriel P. Lopes^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9468))

Included in the following conference series:

International Conference on Mining Intelligence and Knowledge Exploration

1772 Accesses
1 Citations

Abstract

By learning bilingual suffixation operations from translations using an existing bilingual lexicon with near translation forms we can improve its coverage and hence deal with the OOV entries. From this perspective, we identify bilingual stems, their bilingual morphological extensions (bilingual suffixes) and subsequently clusters of bilingual suffixes using known translation forms seen in an existing bilingual translation lexicon. We rely on clustering to enable safer translation generalisations. The degree of co-occurrence between two bilingual morphological extensions with reference to common bilingual stems determines if each of them should fall in the same cluster. Results are discussed for language pairs English-Portuguese (EN-PT) and English-Hindi (EN-HI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note the null suffix in EN corresponding to gender and number suffixes in HI.
2.
Translations that are lexically similar.
3.
A suffix cluster may or may not correspond to Part-of-Speech such as noun or adjective but there are cases where the same suffix cluster aggregates nouns, adjectives and adverbs.
4.
http://glaros.dtc.umn.edu/gkhome/views/cluto
5.
2\(^{nd}\) line in each row shows the transliterations for HI terms.
6.
http://sanskritdocuments.org/hindi/dict/eng-hin\(\_\)unic.html/ www.dicts.info www.hindilearner.com
7.
EMILLE Corpus - http://www.emille.lancs.ac.uk/
8.
DGT-TM - https://open-data.europa.eu/en/data/dataset/dgt-translation-memory Europarl - http://www.statmt.org/europarl/ OPUS (EUconst, EMEA) - http://opus.lingfil.uu.se/
9.
In the Table 4, only two bilingual suffixes are shown per cluster although the original clusters contains varying number of bilingual suffixes ranging from 2 to 15 for EN-PT and from 2 to 5 for EN-HI.

References

Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P.: Identification of bilingual segments for translation generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 167–178. Springer, Heidelberg (2014)
Google Scholar
Lindén, K.: Assigning an inflectional paradigm using the longest matching affix. In: Mitään ongelmia, E., Wiberg, M., Koura, A. (eds.) Juhlakirja Juhani Reimanille 50-vuotispäiväksi 23.1.2008. Turku 2008 (2008)
Google Scholar
Desai, S., Pawar, J., Bhattacharyya, P.: A framework for learning morphology using suffix association matrix. In: WSSANLP-2014, pp. 28–36 (2014)
Google Scholar
Dasgupta, S., Ng, V.: Unsupervised word segmentation for bangla. In: Proceedings of ICON, pp. 15–24 (2007)
Google Scholar
Da Silva, J.F., Lopes, G.P.: Extracting multiword terms from document collections. In: Proceedings of the VExTAL: Venezia per il Trattamento Automatico delle Lingue, pp. 22–24 (1999)
Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computat. Linguist. 19(2), 263–311 (1993)
Google Scholar
Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. Proc. RANLP 2009, 214–218 (2009)
Google Scholar
Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009)
Chapter Google Scholar
Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011)
Chapter Google Scholar
Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Bilingually motivated segmentation and generation of word translations using relatively small translation data sets. In: Proceedings of the PACLIC29 (Accepted) (2015)
Google Scholar

Download references

Acknowledgements

K.M. Kavitha and Luís Gomes acknowledge the Research Fellowship by FCT/MCTES with Ref. nos., SFRH/BD/64371/2009 and SFRH/BD/65059/2009, respectively, and the funded research project ISTRION (Ref. PTDC/EIA-EIA/114521/2009) that provided other means for the research carried out. The authors thank NOVA LINCS, FCT/UNL for the support, SJEC for providing the financial assistance to participate in MIKE 2015, and ISTRION BOX - Translation&Revision, Lda., for providing the data and valuable consultation.

Author information

Authors and Affiliations

NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
K. M. Kavitha, Luís Gomes & José Gabriel P. Lopes
ISTRION BOX-Translation & Revision, Lda., Parkurbis, 6200-865, Covilhã, Portugal
Luís Gomes & José Gabriel P. Lopes
Department of Computer Applications, St. Joseph Engineering College, Vamanjoor, Mangaluru, 575 028, India
K. M. Kavitha

Authors

K. M. Kavitha
View author publications
You can also search for this author in PubMed Google Scholar
Luís Gomes
View author publications
You can also search for this author in PubMed Google Scholar
José Gabriel P. Lopes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. M. Kavitha .

Editor information

Editors and Affiliations

Norwegian Univ. of Science & Technology, Trondheim, Norway
Rajendra Prasath
Intl Inst of Info Tech Hyderabad, Hyderabad, India
Anil Kumar Vuppala
V.H.N.S.N.College (Autonomous), Virudhunagar, Tamil Nadu, India
T. Kathirvalavakumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kavitha, K.M., Gomes, L., Lopes, J.G.P. (2015). Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-26832-3_57
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics