Skip to main content

Applying Word Embeddings to Leverage Knowledge Available in One Language in Order to Solve a Practical Text Classification Problem in Another Language

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 661))

  • 1216 Accesses

Abstract

A text classification problem in Kazakh language is examined. The amount of training data for the task in Kazakh is very limited, but plenty of labeled data in Russian are available. Language vector space transform is built and used to transfer knowledge from Russian into Kazakh language. The obtained classification quality is comparable to that of an approach that employed sophisticated automatic translation system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  2. Coulmance, J., Marty, J.-M., Wenzek, G., Benhalloum, A.: Trans-gram, fast cross-lingual word-embeddings. In: Proceedings of the Empirical Methods in Natural Language Processing (2015)

    Google Scholar 

  3. Erk, K., Pad, S.: A structured vector space model for word meaning in context. In: Proceedings of EMNLP (2008)

    Google Scholar 

  4. Gouws, S., Bengio, Y., Corrado, G.: Bilbowa: fast bilingual distributed representations without word alignments. In: Proceedings of the 25th International Conference on Machine Learning, vol. 15, pp. 748–756 (2015)

    Google Scholar 

  5. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of ACL, pp. 873–882. ACL (2012)

    Google Scholar 

  6. Klementiev, A., Titov, A., Bhattarai, B.: Inducing crosslingual distributed representations of words. In: International Conference on Computational Linguistics (COLING), Bombay, India (2012)

    Google Scholar 

  7. Lewis, D.D., Yang, Y., Rose, T., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  8. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  9. Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation (2013). http://arXiv.org/abs/1309.4168

  10. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the Empirical Methods in Natural Language Processing (2014)

    Google Scholar 

  11. Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of EMNLP, pp. 151–161. ACL (2011)

    Google Scholar 

  12. Turney, P.D., Pantel, P., et al.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  13. https://code.google.com/archive/p/word2vec/

Download references

Acknowledgments

This work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.579.21.0008, ID RFMEFI57914X0008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valentin Mendelev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Smirnov, A., Mendelev, V. (2017). Applying Word Embeddings to Leverage Knowledge Available in One Language in Order to Solve a Practical Text Classification Problem in Another Language. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52920-2_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52919-6

  • Online ISBN: 978-3-319-52920-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics