Skip to main content
Log in

A survey of word embeddings based on deep learning

  • Published:
Computing Aims and scope Submit manuscript

Abstract

The representational basis for downstream natural language processing tasks is word embeddings, which capture lexical semantics in numerical form to handle the abstract semantic concept of words. Recently, the word embeddings approaches, represented by deep learning, has attracted extensive attention and widely used in many tasks, such as text classification, knowledge mining, question-answering, smart Internet of Things systems and so on. These neural networks-based models are based on the distributed hypothesis while the semantic association between words can be efficiently calculated in low-dimensional space. However, the expressed semantics of most models are constrained by the context distribution of each word in the corpus while the logic and common knowledge are not better utilized. Therefore, how to use the massive multi-source data to better represent natural language and world knowledge still need to be explored. In this paper, we introduce the recent advances of neural networks-based word embeddings with their technical features, summarizing the key challenges and existing solutions, and further give a future outlook on the research and application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Manning C, Raghavan P, Schütze H (2010) Introduction to information retrieval. Nat Lang Eng 16(1):100–103

    Article  Google Scholar 

  2. Zhang Y, Jin R, Zhou Z-H (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1–4):43–52

    Article  Google Scholar 

  3. Firth JR (1957) A synopsis of linguistic theory, 1930–1955. In: Studies in linguistic analysis, Philological Society, Oxford

  4. Harris ZS (1954) Distributional structure. Word 10(2–3):146–162

    Article  Google Scholar 

  5. Dagan I, Lee L, Pereira FCN (1999) Similarity-based models of word co-occurrence probabilities. Mach Learn 34(1–3):43–69

    Article  Google Scholar 

  6. Dagan I, Marcus S, Markovitch S (1993) Contextual word similarity and estimation from sparse data. In: Proceedings of the 31st annual meeting on association for computational linguistics, pp 164–171

  7. Schütze H (1992) Context space. In: AAAI fall symposium on probabilistic approaches to natural language, pp 113–120

  8. Schütze H (1992) Dimensions of meaning. In: Supercomputing’92: proceedings of the 1992 ACM/IEEE conference on supercomputing, IEEE, pp 787–796

  9. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543

  10. Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 1, pp 238–247

  11. Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 384–394

  12. Lebret R, Collobert R (2014) Word embeddings through hellinger pca. In: EACL, p 482

  13. Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(2–3):259–284

    Article  Google Scholar 

  14. Dhillon PS, Foster DP, Ungar LH (2011) Multi-view learning of word embeddings via CCA. In: Advances in neural information processing systems, pp 199–207

  15. Dhillon PS, Foster DP, Ungar LH (2015) Eigenwords: spectral word embeddings. J Mach Learn Res 16:3035–3078

    MathSciNet  MATH  Google Scholar 

  16. Pereira F, Tishby N, Lee L (1993) Distributional clustering of english words. In: Proceedings of the 31st annual meeting on association for computational linguistics, pp 183–190

  17. Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479

    Google Scholar 

  18. Lin D, Wu X (2009) Phrase clustering for discriminative learning. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing, pp 1030–1038

  19. Bengio Y, Ducharme R, Vincent P (2001) A neural probabilistic language model. In: Advances in neural information processing systems, pp 932–938

  20. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  21. Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics (ACL), pp 384–394

  22. Xu W, Rudnicky A (2000) Can artificial neural networks learn language models? In: Sixth international conference on spoken language processing

  23. Mnih A, Hinton G (2007) Three new graphical models for statistical language modelling. In: Proceedings of the 24th international conference on machine learning, pp 641–648

  24. Mnih A, Hinton GE (2008) A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp 1081–1088

  25. Mnih A, Kavukcuoglu K (2013) Learning word embeddings efficiently with noise-contrastive estimation. In: Advances in neural information processing systems, pp 2265–2273

  26. Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. In: Proceedings of the international workshop on artificial intelligence and statistics, pp 246–252

  27. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: International conference on machine learning

  28. Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: 11th Annual conference of the international speech communication association, INTERSPEECH 2010, pp 1045–1048

  29. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations workshop Track

  30. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  31. Mikolov T, Yih WT, Zweig G (2013) Linguistic regularities in continuous space word representations. In: NAACL-HLT, pp 746–751

  32. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  33. Tissier J, Gravier C, Habrard A (2017) Dict2vec: learning word embeddings using lexical dictionaries. In: Conference on empirical methods in natural language processing (EMNLP), pp 254–263

  34. Cao S, Lu W, Zhou J, Li X (2018) cw2vec: learning chinese word embeddings with stroke n-gram information. In: Thirty-second AAAI conference on artificial intelligence

  35. Xu J, Liu J, Zhang L, Li Z, Chen H (2016) Improve chinese word embeddings by exploiting internal structure. In: NAACL-HLT

  36. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  37. Botha J, Blunsom P (2014) Compositional morphology for word representations and language modelling. Comput Sci 2014:1899–1907

    Google Scholar 

  38. Chen X, Lei X, Liu Z, Sun M, Luan H (2015) Joint learning of character and word embeddings. In: International conference on artificial intelligence

  39. Kalchbrenner N, Blunsom P (2013) Recurrent convolutional neural networks for discourse compositionality. In: Workshop on CVSC, pp 119–126

  40. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp 655–665

  41. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751

  42. Xu Y, Liu J (2017) Implicitly incorporating morphological information into word embedding. arXiv preprint arXiv:170102481

  43. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2018) Supervised learning of universal sentence representations from natural language inference data. In: Conference on empirical methods in natural language processing

  44. Talman A, Yli-Jyra A, Tiedemann J (2018) Natural language inference with hierarchical Bilstm max pooling architecture. arXiv preprint arXiv:180808762

  45. Chung T, Xu B, Liu Y, Ouyang C, Li S, Luo L (2019) Empirical study on character level neural network classifier for chinese text. Eng Appl Artif Intell 80:1–7

    Article  Google Scholar 

  46. Martinez-Rico JR, Martinez-Romo J, Araujo L (2019) Can deep learning techniques improve classification performance of vandalism detection in wikipedia? Eng Appl Artif Intell 78:248–259

    Article  Google Scholar 

  47. Yao L, Zhang Y, Chen Q, Qian H, Wei B, Hu Z (2017) Mining coherent topics in documents using word embeddings and large-scale text data. Eng Appl Artif Intell 64:432–439

    Article  Google Scholar 

  48. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:160301354

  49. Shijia E, Xiang Y (2017) Chinese named entity recognition with character word mixed embedding. In: ACM on conference on information knowledge management

  50. Sun Y, Lei L, Tang D, Nan Y, Ji Z, Wang X (2015) Modeling mention, context and entity with neural networks for entity disambiguation. In: Twenty-fourth international joint conference on artificial intelligence

  51. Li J, Zhao S, Yang J et al (2018) WCP-RNN: a novel RNN-based approach for bio-NER in chinese EMRs. J Supercomput 2018:1–18

    Google Scholar 

  52. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365

  53. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  54. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. URL https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/languageunsupervised/languageunderstandingpaper.pdf

  55. Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759

  56. Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the conference

  57. Socher R, Lin C, Manning C, Ng AY (2011) Parsing natural scenes and naturallanguage with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 129–136

  58. Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv:150807909

  59. Cotterell R, Schütze H (2015) Morphological word-embeddings. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1287–1292

  60. Bian J, Gao B, Liu TY (2014) Knowledge-powered deep learning for word embedding. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 132–148

  61. Cao K, Rei M (2016) A joint model for word embedding and word morphology. arXiv preprint arXiv:1606.02601

  62. Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI conference on artificial intelligence

  63. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  64. Mitchell J, Lapata M (2008) Vector-based models of semantic composition. In: Proceedings of ACL-08: HLT, pp 236–244

  65. Blacoe W, Lapata M (2012) A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, Association for Computational Linguistics, pp 546–556

  66. Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  67. Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302

  68. Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893

  69. Conneau A, Kiela D, Schwenk H et al (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364

  70. Levy O, Goldberg Y (2014) Linguistic regularities in sparse and explicit word representations. In: Proceedings of the eighteenth conference on computational natural language learning, pp 171–180

  71. Heinzerling B, Strube M (2017) Bpemb: tokenization-free pre-trained subword embeddings in 275 languages. arXiv preprint arXiv:1710.02187

  72. Xin JYXJHH, Song Y (2017) Joint embeddings of chinese words, characters, and fine-grained subcharacter components. In: EMNLP

  73. Li Y, Li W, Sun F, Li S (2015) Component-enhanced chinese character embeddings. arXiv preprint arXiv:150806669

  74. Su TR, Lee HY (2017) Learning chinese word representations from glyphs of characters. arXiv preprint arXiv:170804755

  75. Sun Y, Lei L, Nan Y, Ji Z, Wang X (2014) Radical-enhanced chinese character embedding. Lect Not Comput Sci 8835:279–286

    Article  Google Scholar 

  76. Yang L, Sun M (2015) Improved learning of Chinese word embeddings with semantic knowledge. In: Chinese computational linguistic and natural language processing based on naturally annotated big data. Springer, pp 15–25

Download references

Acknowledgements

We thank the reviewers for their helpful comments and greatly acknowledge valuable contributions of our classmate Qiang Zhou in the preparation of our work. This work is supported by the Fundamental Research Funds for the Central Universities under Grant 2019XDA20.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenan Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Zhou, W. & Jiang, C. A survey of word embeddings based on deep learning. Computing 102, 717–740 (2020). https://doi.org/10.1007/s00607-019-00768-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-019-00768-7

Keywords

Mathematics Subject Classification

Navigation