Abstract
A text classification problem in Kazakh language is examined. The amount of training data for the task in Kazakh is very limited, but plenty of labeled data in Russian are available. Language vector space transform is built and used to transfer knowledge from Russian into Kazakh language. The obtained classification quality is comparable to that of an approach that employed sophisticated automatic translation system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Coulmance, J., Marty, J.-M., Wenzek, G., Benhalloum, A.: Trans-gram, fast cross-lingual word-embeddings. In: Proceedings of the Empirical Methods in Natural Language Processing (2015)
Erk, K., Pad, S.: A structured vector space model for word meaning in context. In: Proceedings of EMNLP (2008)
Gouws, S., Bengio, Y., Corrado, G.: Bilbowa: fast bilingual distributed representations without word alignments. In: Proceedings of the 25th International Conference on Machine Learning, vol. 15, pp. 748–756 (2015)
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of ACL, pp. 873–882. ACL (2012)
Klementiev, A., Titov, A., Bhattarai, B.: Inducing crosslingual distributed representations of words. In: International Conference on Computational Linguistics (COLING), Bombay, India (2012)
Lewis, D.D., Yang, Y., Rose, T., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation (2013). http://arXiv.org/abs/1309.4168
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the Empirical Methods in Natural Language Processing (2014)
Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of EMNLP, pp. 151–161. ACL (2011)
Turney, P.D., Pantel, P., et al.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)
Acknowledgments
This work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.579.21.0008, ID RFMEFI57914X0008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Smirnov, A., Mendelev, V. (2017). Applying Word Embeddings to Leverage Knowledge Available in One Language in Order to Solve a Practical Text Classification Problem in Another Language. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-52920-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52919-6
Online ISBN: 978-3-319-52920-2
eBook Packages: Computer ScienceComputer Science (R0)