skip to main content
research-article

Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis

Published:21 July 2018Publication History
Skip Abstract Section

Abstract

Cross-lingual word embeddings are representations for vocabularies of two or more languages in one common continuous vector space and are widely used in various natural language processing tasks. A state-of-the-art way to generate cross-lingual word embeddings is to learn a linear mapping, with an assumption that the vector representations of similar words in different languages are related by a linear relationship. However, this assumption does not always hold true, especially for substantially different languages. We therefore propose to use kernel canonical correlation analysis to capture a non-linear relationship between word embeddings of two languages. By extensively evaluating the learned word embeddings on three tasks (word similarity, cross-lingual dictionary induction, and cross-lingual document classification) across five language pairs, we demonstrate that our proposed approach achieves essentially better performances than previous linear methods on all of the three tasks, especially for language pairs with substantial typological difference.

References

  1. Shotaro Akaho. 2006. A kernel method for canonical correlation analysis. CoRR abs/cs/0609071.Google ScholarGoogle Scholar
  2. Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2289--2294.Google ScholarGoogle ScholarCross RefCross Ref
  3. Francis Bond and Ryan Foster. 2013. Linking and extending an open multilingual wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1352--1362.Google ScholarGoogle Scholar
  4. Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. WIT: Web inventory of transcribed and translated talks. In Proceedings of the 16 Conference of the European Association for Machine Translation (EAMT’12). 261--268.Google ScholarGoogle Scholar
  5. Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transition-based dependency parsing with stack long short-term memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 334--343.Google ScholarGoogle Scholar
  6. Chris Dyer, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. 2010. cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proceedings of the ACL 2010 System Demonstrations. Association for Computational Linguistics, 7--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Manaal Faruqui and Chris Dyer. 2014. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 462--471.Google ScholarGoogle ScholarCross RefCross Ref
  8. Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web (WWW’01). ACM, New York, NY, 406--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yoav Freund and Robert E. Schapire. 1999. Large margin classification using the perceptron algorithm. Mach. Learn. 37, 3 (1999), 277--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kenji Fukumizu, Francis R. Bach, and Arthur Gretton. 2007. Statistical consistency of kernel canonical correlation analysis. J. Mach. Learn. Res. 8 (May 2007), 361--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. BilBOWA: Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 748--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Stephan Gouws and Anders Søgaard. 2015. Simple task-specific bilingual word embeddings. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1386--1390.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1234--1244.Google ScholarGoogle Scholar
  14. Karl Moritz Hermann and Phil Blunsom. 2014. Multilingual models for compositional distributed semantics. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 58--68.Google ScholarGoogle ScholarCross RefCross Ref
  15. Felix Hill, Roi Reichart, and Anna Korhonen. 2014. SimLex-999: Evaluating semantic models with (genuine) similarity estimation. CoRR abs/1408.3456 (2014).Google ScholarGoogle Scholar
  16. Harold Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321--377.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In Proceedings of the International Conference on Computational Linguistics (COLING’12).Google ScholarGoogle Scholar
  18. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5 (Dec. 2004), 361--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Bilingual word representations with monolingual quality in mind. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 151--159.Google ScholarGoogle Scholar
  20. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (Nov. 2008), 2579--2605.Google ScholarGoogle Scholar
  21. Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 92--97.Google ScholarGoogle Scholar
  22. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781.Google ScholarGoogle Scholar
  23. Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. CoRR abs/1309.4168 (2013).Google ScholarGoogle Scholar
  24. Aditya Mogadala and Achim Rettinger. 2016. Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google ScholarGoogle ScholarCross RefCross Ref
  25. Jerome L. Myers and Arnold Well. 1995. Research Design and Statistical Analysis (1st ed.). Lawrence Erlbaum Associates, Mahwah, NJ.Google ScholarGoogle Scholar
  26. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 5 (Aug. 1988), 513--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tianze Shi, Zhiyuan Liu, Yang Liu, and Maosong Sun. 2015. Learning cross-lingual word embeddings via matrix co-factorization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 567--572.Google ScholarGoogle Scholar
  28. Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. 2013. Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 455--465.Google ScholarGoogle Scholar
  29. James H. Steiger. 1980. Tests for comparing elements of a correlation matrix. Psychol. Bull. 87, 2 (1980), 245.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Guillaume Lample, and Chris Dyer. 2015. Evaluation of word vector representations by subspace alignment. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP’15).Google ScholarGoogle ScholarCross RefCross Ref
  31. Shyam Upadhyay, Manaal Faruqui, Chris Dyer, and Dan Roth. 2016. Cross-lingual models of word embeddings: An empirical comparison. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1661--1670.Google ScholarGoogle ScholarCross RefCross Ref
  32. Jakob Uszkoreit, Jay M. Ponte, Ashok C. Popat, and Moshe Dubiner. 2010. Large scale parallel document mining for machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). Association for Computational Linguistics, 1101--1109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ivan Vulić and Marie-Francine Moens. 2013. Cross-lingual semantic similarity of words as the similarity of their semantic word responses. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 106--116.Google ScholarGoogle Scholar
  34. Ivan Vulić and Marie-Francine Moens. 2015. Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 719--725.Google ScholarGoogle Scholar
  35. Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL’09). Association for Computational Linguistics, 235--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Chao Xing, Dong Wang, Chao Liu, and Yiye Lin. 2015. Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL’15).Google ScholarGoogle ScholarCross RefCross Ref
  37. Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong. 2014. Bilingually-constrained phrase embeddings for machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 111--121.Google ScholarGoogle ScholarCross RefCross Ref
  38. Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun, Tatsuya Izuha, and Jie Hao. 2016. Building earth mover’s distance on bilingual word embeddings for machine translation. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 2870--2876. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 4
      December 2018
      193 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3229525
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 July 2018
      • Accepted: 1 March 2018
      • Revised: 1 November 2017
      • Received: 1 June 2017
      Published in tallip Volume 17, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader