Skip to main content

Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9468))

Abstract

By learning bilingual suffixation operations from translations using an existing bilingual lexicon with near translation forms we can improve its coverage and hence deal with the OOV entries. From this perspective, we identify bilingual stems, their bilingual morphological extensions (bilingual suffixes) and subsequently clusters of bilingual suffixes using known translation forms seen in an existing bilingual translation lexicon. We rely on clustering to enable safer translation generalisations. The degree of co-occurrence between two bilingual morphological extensions with reference to common bilingual stems determines if each of them should fall in the same cluster. Results are discussed for language pairs English-Portuguese (EN-PT) and English-Hindi (EN-HI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note the null suffix in EN corresponding to gender and number suffixes in HI.

  2. 2.

    Translations that are lexically similar.

  3. 3.

    A suffix cluster may or may not correspond to Part-of-Speech such as noun or adjective but there are cases where the same suffix cluster aggregates nouns, adjectives and adverbs.

  4. 4.

    http://glaros.dtc.umn.edu/gkhome/views/cluto

  5. 5.

    2\(^{nd}\) line in each row shows the transliterations for HI terms.

  6. 6.

    http://sanskritdocuments.org/hindi/dict/eng-hin\(\_\)unic.html/ www.dicts.info www.hindilearner.com

  7. 7.

    EMILLE Corpus - http://www.emille.lancs.ac.uk/

  8. 8.

    DGT-TM - https://open-data.europa.eu/en/data/dataset/dgt-translation-memory Europarl - http://www.statmt.org/europarl/ OPUS (EUconst, EMEA) - http://opus.lingfil.uu.se/

  9. 9.

    In the Table 4, only two bilingual suffixes are shown per cluster although the original clusters contains varying number of bilingual suffixes ranging from 2 to 15 for EN-PT and from 2 to 5 for EN-HI.

References

  1. Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P.: Identification of bilingual segments for translation generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 167–178. Springer, Heidelberg (2014)

    Google Scholar 

  2. Lindén, K.: Assigning an inflectional paradigm using the longest matching affix. In: Mitään ongelmia, E., Wiberg, M., Koura, A. (eds.) Juhlakirja Juhani Reimanille 50-vuotispäiväksi 23.1.2008. Turku 2008 (2008)

    Google Scholar 

  3. Desai, S., Pawar, J., Bhattacharyya, P.: A framework for learning morphology using suffix association matrix. In: WSSANLP-2014, pp. 28–36 (2014)

    Google Scholar 

  4. Dasgupta, S., Ng, V.: Unsupervised word segmentation for bangla. In: Proceedings of ICON, pp. 15–24 (2007)

    Google Scholar 

  5. Da Silva, J.F., Lopes, G.P.: Extracting multiword terms from document collections. In: Proceedings of the VExTAL: Venezia per il Trattamento Automatico delle Lingue, pp. 22–24 (1999)

    Google Scholar 

  6. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computat. Linguist. 19(2), 263–311 (1993)

    Google Scholar 

  7. Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. Proc. RANLP 2009, 214–218 (2009)

    Google Scholar 

  8. Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Bilingually motivated segmentation and generation of word translations using relatively small translation data sets. In: Proceedings of the PACLIC29 (Accepted) (2015)

    Google Scholar 

Download references

Acknowledgements

K.M. Kavitha and Luís Gomes acknowledge the Research Fellowship by FCT/MCTES with Ref. nos., SFRH/BD/64371/2009 and SFRH/BD/65059/2009, respectively, and the funded research project ISTRION (Ref. PTDC/EIA-EIA/114521/2009) that provided other means for the research carried out. The authors thank NOVA LINCS, FCT/UNL for the support, SJEC for providing the financial assistance to participate in MIKE 2015, and ISTRION BOX - Translation&Revision, Lda., for providing the data and valuable consultation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. M. Kavitha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kavitha, K.M., Gomes, L., Lopes, J.G.P. (2015). Learning Clusters of Bilingual Suffixes Using Bilingual Translation Lexicon. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26832-3_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26831-6

  • Online ISBN: 978-3-319-26832-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics