Abstract
Learning bilingual lexicon from monolingual data is a novel idea in natural language process which can benefit many low-resource language pairs. In this paper, we present an approach for obtaining bilingual lexicon from monolingual data. Our method only requires a small seed bilingual lexicon and we use the Canonical Correlation Analysis to construct a shared latent space to explain two monolingual embeddings how to be linked. Experimental results show that a considerable precision and size bilingual lexicon can be learned in Chinese-Uyghur and Chinese-Kazakh monolingual data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Word2vec: https://code.google.com/p/word2vec/.
- 2.
We omitted the derivation process, if you want to learn more, the work of Bach and Jordan (2005) is good for you.
- 3.
- 4.
- 5.
- 6.
BilBOWA: https://github.com/gouwsmeister/bilbowa.
References
Munteanu, D.S., Marcu, D.: Improving machine translation performance by exploiting non-parallel corpora. Comput. Linguist. 31, 477–504 (2005)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013a)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013b)
Mikolov, T., Sutskever, I.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Cao, H., Zhao, T., Zhang, S.: A distribution-based model to learn bilingualword embeddings. In: Proceedings of COLING (2016)
Bach, F.R., Jordan, M.I.: A probabilistic interpretation of canonical correlation analysis (2005)
Vulić, I., Moens, M.-F.: A study on bootstrapping bilingual vector spaces from non-parallel data (and nothing else). In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
Gouws, S., Bengio, Y., Corrado, G.: BilBOWA: fast bilingual distributed representations without word alignments. In: JMLR (2015)
Wushouer, M., Ishida, T., Lin, D.: Bilingual dictionary induction as an optimization problem. In: International Conference on Language Resources & Evaluation (2014)
Zhang, M., Peng, H., Liu, Y.: Bilingual lexicon induction from non-parallel data with minimal supervision. In: AAAI (2017)
Haghighi, A., Liang, P., Berg-Kirkpatrick, T.: Learning bilingual lexicons from monolingual corpora. In: ACL (2008)
Shi, T., Liu, Z., Liu, Y.: Learning cross-lingual word embeddings via matrix co-factorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (2015)
Vulić, I., Kiela, D., Clark, S.: Multi-modal representations for improved bilingual lexicon learning. In: ACL (2016)
Vulić, I., Korhonen, A.: On the role of seed lexicons in learning bilingual word embeddings. In: ACL (2016)
Vulić, I., Moens, M.-F.: Probabilistic models of cross-lingual semantic similarity in context based on latent cross-lingual concepts induced from comparable data. In: EMNLP (2014)
Gouws, S., Søgaard, A.: Simple task-specific bilingual word embeddings. In: The 2015 Annual Conference of the North American Chapter of the ACL (2015)
Liu, X., Duh, K., Matsumoto, Y.: Topic models + word alignment = a flexible framework for extracting bilingual dictionary from comparable corpus (2013)
Acknowledgments
This work is supported by the Xinjiang Fun under Grant (No. 2015KL031), the West Light Foundation of The Chinese Academy of Sciences (No. 2015-XBQN-B-10), the Xinjiang Science and Technology Major Project (No. 2016A03007-3) and Natural Science Foundation of Xinjiang (No. 2015211B034).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Zhu, S., Li, X., Yang, Y., Wang, L., Mi, C. (2018). Learning Bilingual Lexicon for Low-Resource Language Pairs. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_66
Download citation
DOI: https://doi.org/10.1007/978-3-319-73618-1_66
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)