skip to main content
10.1145/3371158.3371219acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
short-paper

A Study of Efficacy of Cross-lingual Word Embeddings for Indian Languages

Published:15 January 2020Publication History

ABSTRACT

Cross-lingual word embeddings have become ubiquitous for various NLP tasks. Existing literature primarily evaluate the quality of cross-lingual word embeddings on the task of Bilingual Lexicon Induction. They report very high accuracies for European languages. In this paper, we report the accuracy of Bilingual Lexicon Induction (BLI) task for cross-lingual word embeddings generated using two mapping based unsupervised approaches: VecMap and MUSE for Indian languages on a dataset created using linked Indian Wordnet. We also show the comparison of these approaches with a simple baseline where the embeddings for all languages are trained using fast-text on the combined corpora of 11 Indian languages. Our experiments show that existing cross-lingual word embedding approaches give low accuracy on bilingual lexicon induction for cognate words. Given the high cognate overlap of several Indian languages, this is a serious limitation of existing approaches.

References

  1. Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 789--798.Google ScholarGoogle ScholarCross RefCross Ref
  2. Pushpak Bhattacharyya. 2017. IndoWordNet. In The WordNet in Indian Languages. Springer, 1--18.Google ScholarGoogle Scholar
  3. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.Google ScholarGoogle ScholarCross RefCross Ref
  4. Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word Translation Without Parallel Data. In In Proceedings of ICLR 2018.Google ScholarGoogle Scholar
  5. Anoop Kunchukuttan, Ratish Puduppully, and Pushpak Bhattacharyya. 2015. Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 81--85.Google ScholarGoogle ScholarCross RefCross Ref
  6. Sebastian Ruder, Ivan Vulić, and Anders Søgaard. 2019. A Survey of Cross-lingual Word Embedding Models. Journal of Artificial Intelligence Research 65 (2019), 569--631.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CoDS COMAD 2020: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD
    January 2020
    399 pages
    ISBN:9781450377386
    DOI:10.1145/3371158

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 January 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper
    • Research
    • Refereed limited

    Acceptance Rates

    CoDS COMAD 2020 Paper Acceptance Rate78of275submissions,28%Overall Acceptance Rate197of680submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader