Abstract
Cross-lingual word embeddings are representations for vocabularies of two or more languages in one common continuous vector space and are widely used in various natural language processing tasks. A state-of-the-art way to generate cross-lingual word embeddings is to learn a linear mapping, with an assumption that the vector representations of similar words in different languages are related by a linear relationship. However, this assumption does not always hold true, especially for substantially different languages. We therefore propose to use kernel canonical correlation analysis to capture a non-linear relationship between word embeddings of two languages. By extensively evaluating the learned word embeddings on three tasks (word similarity, cross-lingual dictionary induction, and cross-lingual document classification) across five language pairs, we demonstrate that our proposed approach achieves essentially better performances than previous linear methods on all of the three tasks, especially for language pairs with substantial typological difference.
- Shotaro Akaho. 2006. A kernel method for canonical correlation analysis. CoRR abs/cs/0609071.Google Scholar
- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2289--2294.Google ScholarCross Ref
- Francis Bond and Ryan Foster. 2013. Linking and extending an open multilingual wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1352--1362.Google Scholar
- Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. WIT: Web inventory of transcribed and translated talks. In Proceedings of the 16 Conference of the European Association for Machine Translation (EAMT’12). 261--268.Google Scholar
- Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transition-based dependency parsing with stack long short-term memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 334--343.Google Scholar
- Chris Dyer, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. 2010. cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proceedings of the ACL 2010 System Demonstrations. Association for Computational Linguistics, 7--12. Google ScholarDigital Library
- Manaal Faruqui and Chris Dyer. 2014. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 462--471.Google ScholarCross Ref
- Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web (WWW’01). ACM, New York, NY, 406--414. Google ScholarDigital Library
- Yoav Freund and Robert E. Schapire. 1999. Large margin classification using the perceptron algorithm. Mach. Learn. 37, 3 (1999), 277--296. Google ScholarDigital Library
- Kenji Fukumizu, Francis R. Bach, and Arthur Gretton. 2007. Statistical consistency of kernel canonical correlation analysis. J. Mach. Learn. Res. 8 (May 2007), 361--383. Google ScholarDigital Library
- Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. BilBOWA: Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 748--756. Google ScholarDigital Library
- Stephan Gouws and Anders Søgaard. 2015. Simple task-specific bilingual word embeddings. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1386--1390.Google ScholarCross Ref
- Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1234--1244.Google Scholar
- Karl Moritz Hermann and Phil Blunsom. 2014. Multilingual models for compositional distributed semantics. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 58--68.Google ScholarCross Ref
- Felix Hill, Roi Reichart, and Anna Korhonen. 2014. SimLex-999: Evaluating semantic models with (genuine) similarity estimation. CoRR abs/1408.3456 (2014).Google Scholar
- Harold Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321--377.Google ScholarCross Ref
- Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In Proceedings of the International Conference on Computational Linguistics (COLING’12).Google Scholar
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5 (Dec. 2004), 361--397. Google ScholarDigital Library
- Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Bilingual word representations with monolingual quality in mind. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 151--159.Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (Nov. 2008), 2579--2605.Google Scholar
- Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 92--97.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781.Google Scholar
- Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. CoRR abs/1309.4168 (2013).Google Scholar
- Aditya Mogadala and Achim Rettinger. 2016. Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google ScholarCross Ref
- Jerome L. Myers and Arnold Well. 1995. Research Design and Statistical Analysis (1st ed.). Lawrence Erlbaum Associates, Mahwah, NJ.Google Scholar
- Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 5 (Aug. 1988), 513--523. Google ScholarDigital Library
- Tianze Shi, Zhiyuan Liu, Yang Liu, and Maosong Sun. 2015. Learning cross-lingual word embeddings via matrix co-factorization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 567--572.Google Scholar
- Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. 2013. Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 455--465.Google Scholar
- James H. Steiger. 1980. Tests for comparing elements of a correlation matrix. Psychol. Bull. 87, 2 (1980), 245.Google ScholarCross Ref
- Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Guillaume Lample, and Chris Dyer. 2015. Evaluation of word vector representations by subspace alignment. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP’15).Google ScholarCross Ref
- Shyam Upadhyay, Manaal Faruqui, Chris Dyer, and Dan Roth. 2016. Cross-lingual models of word embeddings: An empirical comparison. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1661--1670.Google ScholarCross Ref
- Jakob Uszkoreit, Jay M. Ponte, Ashok C. Popat, and Moshe Dubiner. 2010. Large scale parallel document mining for machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). Association for Computational Linguistics, 1101--1109. Google ScholarDigital Library
- Ivan Vulić and Marie-Francine Moens. 2013. Cross-lingual semantic similarity of words as the similarity of their semantic word responses. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 106--116.Google Scholar
- Ivan Vulić and Marie-Francine Moens. 2015. Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 719--725.Google Scholar
- Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL’09). Association for Computational Linguistics, 235--243. Google ScholarDigital Library
- Chao Xing, Dong Wang, Chao Liu, and Yiye Lin. 2015. Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL’15).Google ScholarCross Ref
- Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong. 2014. Bilingually-constrained phrase embeddings for machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 111--121.Google ScholarCross Ref
- Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun, Tatsuya Izuha, and Jie Hao. 2016. Building earth mover’s distance on bilingual word embeddings for machine translation. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 2870--2876. Google ScholarDigital Library
Index Terms
- Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis
Recommendations
Improving word representations via global context and multiple word prototypes
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is ...
Transliteration equivalence using canonical correlation analysis
ECIR'2010: Proceedings of the 32nd European conference on Advances in Information RetrievalWe address the problem of Transliteration Equivalence, i.e. determining whether a pair of words in two different languages (e.g.Auden, ऑडेन) are name transliterations or not. This problem is at the heart of Mining Name Transliterations (MINT) from ...
A study on the innovative model of foreign language teaching in universities using big data corpus
This paper aims to explore unsupervised cross-lingual word representation learning methods with the specific task of acquiring a bilingual translation lexicon on a monolingual corpus. Specifically, an unsupervised cross-lingual word representation co-...
Comments