A survey of word embeddings based on deep learning

Wang, Shirui; Zhou, Wenan; Jiang, Chao

doi:10.1007/s00607-019-00768-7

A survey of word embeddings based on deep learning

Published: 12 November 2019

Volume 102, pages 717–740, (2020)
Cite this article

Computing Aims and scope Submit manuscript

Shirui Wang¹,
Wenan Zhou¹ &
Chao Jiang¹

5386 Accesses
85 Citations
3 Altmetric
Explore all metrics

Abstract

The representational basis for downstream natural language processing tasks is word embeddings, which capture lexical semantics in numerical form to handle the abstract semantic concept of words. Recently, the word embeddings approaches, represented by deep learning, has attracted extensive attention and widely used in many tasks, such as text classification, knowledge mining, question-answering, smart Internet of Things systems and so on. These neural networks-based models are based on the distributed hypothesis while the semantic association between words can be efficiently calculated in low-dimensional space. However, the expressed semantics of most models are constrained by the context distribution of each word in the corpus while the logic and common knowledge are not better utilized. Therefore, how to use the massive multi-source data to better represent natural language and world knowledge still need to be explored. In this paper, we introduce the recent advances of neural networks-based word embeddings with their technical features, summarizing the key challenges and existing solutions, and further give a future outlook on the research and application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge-Powered Deep Learning for Word Embedding

Resurgence of Deep Learning: Genesis of Word Embedding

Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases

References

Manning C, Raghavan P, Schütze H (2010) Introduction to information retrieval. Nat Lang Eng 16(1):100–103
Article Google Scholar
Zhang Y, Jin R, Zhou Z-H (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1–4):43–52
Article Google Scholar
Firth JR (1957) A synopsis of linguistic theory, 1930–1955. In: Studies in linguistic analysis, Philological Society, Oxford
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
Article Google Scholar
Dagan I, Lee L, Pereira FCN (1999) Similarity-based models of word co-occurrence probabilities. Mach Learn 34(1–3):43–69
Article Google Scholar
Dagan I, Marcus S, Markovitch S (1993) Contextual word similarity and estimation from sparse data. In: Proceedings of the 31st annual meeting on association for computational linguistics, pp 164–171
Schütze H (1992) Context space. In: AAAI fall symposium on probabilistic approaches to natural language, pp 113–120
Schütze H (1992) Dimensions of meaning. In: Supercomputing’92: proceedings of the 1992 ACM/IEEE conference on supercomputing, IEEE, pp 787–796
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 1, pp 238–247
Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 384–394
Lebret R, Collobert R (2014) Word embeddings through hellinger pca. In: EACL, p 482
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(2–3):259–284
Article Google Scholar
Dhillon PS, Foster DP, Ungar LH (2011) Multi-view learning of word embeddings via CCA. In: Advances in neural information processing systems, pp 199–207
Dhillon PS, Foster DP, Ungar LH (2015) Eigenwords: spectral word embeddings. J Mach Learn Res 16:3035–3078
MathSciNet MATH Google Scholar
Pereira F, Tishby N, Lee L (1993) Distributional clustering of english words. In: Proceedings of the 31st annual meeting on association for computational linguistics, pp 183–190
Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
Google Scholar
Lin D, Wu X (2009) Phrase clustering for discriminative learning. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing, pp 1030–1038
Bengio Y, Ducharme R, Vincent P (2001) A neural probabilistic language model. In: Advances in neural information processing systems, pp 932–938
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics (ACL), pp 384–394
Xu W, Rudnicky A (2000) Can artificial neural networks learn language models? In: Sixth international conference on spoken language processing
Mnih A, Hinton G (2007) Three new graphical models for statistical language modelling. In: Proceedings of the 24th international conference on machine learning, pp 641–648
Mnih A, Hinton GE (2008) A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp 1081–1088
Mnih A, Kavukcuoglu K (2013) Learning word embeddings efficiently with noise-contrastive estimation. In: Advances in neural information processing systems, pp 2265–2273
Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. In: Proceedings of the international workshop on artificial intelligence and statistics, pp 246–252
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: International conference on machine learning
Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: 11th Annual conference of the international speech communication association, INTERSPEECH 2010, pp 1045–1048
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations workshop Track
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mikolov T, Yih WT, Zweig G (2013) Linguistic regularities in continuous space word representations. In: NAACL-HLT, pp 746–751
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Article Google Scholar
Tissier J, Gravier C, Habrard A (2017) Dict2vec: learning word embeddings using lexical dictionaries. In: Conference on empirical methods in natural language processing (EMNLP), pp 254–263
Cao S, Lu W, Zhou J, Li X (2018) cw2vec: learning chinese word embeddings with stroke n-gram information. In: Thirty-second AAAI conference on artificial intelligence
Xu J, Liu J, Zhang L, Li Z, Chen H (2016) Improve chinese word embeddings by exploiting internal structure. In: NAACL-HLT
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
MATH Google Scholar
Botha J, Blunsom P (2014) Compositional morphology for word representations and language modelling. Comput Sci 2014:1899–1907
Google Scholar
Chen X, Lei X, Liu Z, Sun M, Luan H (2015) Joint learning of character and word embeddings. In: International conference on artificial intelligence
Kalchbrenner N, Blunsom P (2013) Recurrent convolutional neural networks for discourse compositionality. In: Workshop on CVSC, pp 119–126
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp 655–665
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Xu Y, Liu J (2017) Implicitly incorporating morphological information into word embedding. arXiv preprint arXiv:170102481
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2018) Supervised learning of universal sentence representations from natural language inference data. In: Conference on empirical methods in natural language processing
Talman A, Yli-Jyra A, Tiedemann J (2018) Natural language inference with hierarchical Bilstm max pooling architecture. arXiv preprint arXiv:180808762
Chung T, Xu B, Liu Y, Ouyang C, Li S, Luo L (2019) Empirical study on character level neural network classifier for chinese text. Eng Appl Artif Intell 80:1–7
Article Google Scholar
Martinez-Rico JR, Martinez-Romo J, Araujo L (2019) Can deep learning techniques improve classification performance of vandalism detection in wikipedia? Eng Appl Artif Intell 78:248–259
Article Google Scholar
Yao L, Zhang Y, Chen Q, Qian H, Wei B, Hu Z (2017) Mining coherent topics in documents using word embeddings and large-scale text data. Eng Appl Artif Intell 64:432–439
Article Google Scholar
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:160301354
Shijia E, Xiang Y (2017) Chinese named entity recognition with character word mixed embedding. In: ACM on conference on information knowledge management
Sun Y, Lei L, Tang D, Nan Y, Ji Z, Wang X (2015) Modeling mention, context and entity with neural networks for entity disambiguation. In: Twenty-fourth international joint conference on artificial intelligence
Li J, Zhao S, Yang J et al (2018) WCP-RNN: a novel RNN-based approach for bio-NER in chinese EMRs. J Supercomput 2018:1–18
Google Scholar
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. URL https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/languageunsupervised/languageunderstandingpaper.pdf
Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759
Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the conference
Socher R, Lin C, Manning C, Ng AY (2011) Parsing natural scenes and naturallanguage with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 129–136
Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv:150807909
Cotterell R, Schütze H (2015) Morphological word-embeddings. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1287–1292
Bian J, Gao B, Liu TY (2014) Knowledge-powered deep learning for word embedding. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 132–148
Cao K, Rei M (2016) A joint model for word embedding and word morphology. arXiv preprint arXiv:1606.02601
Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI conference on artificial intelligence
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Mitchell J, Lapata M (2008) Vector-based models of semantic composition. In: Proceedings of ACL-08: HLT, pp 236–244
Blacoe W, Lapata M (2012) A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, Association for Computational Linguistics, pp 546–556
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302
Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893
Conneau A, Kiela D, Schwenk H et al (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364
Levy O, Goldberg Y (2014) Linguistic regularities in sparse and explicit word representations. In: Proceedings of the eighteenth conference on computational natural language learning, pp 171–180
Heinzerling B, Strube M (2017) Bpemb: tokenization-free pre-trained subword embeddings in 275 languages. arXiv preprint arXiv:1710.02187
Xin JYXJHH, Song Y (2017) Joint embeddings of chinese words, characters, and fine-grained subcharacter components. In: EMNLP
Li Y, Li W, Sun F, Li S (2015) Component-enhanced chinese character embeddings. arXiv preprint arXiv:150806669
Su TR, Lee HY (2017) Learning chinese word representations from glyphs of characters. arXiv preprint arXiv:170804755
Sun Y, Lei L, Nan Y, Ji Z, Wang X (2014) Radical-enhanced chinese character embedding. Lect Not Comput Sci 8835:279–286
Article Google Scholar
Yang L, Sun M (2015) Improved learning of Chinese word embeddings with semantic knowledge. In: Chinese computational linguistic and natural language processing based on naturally annotated big data. Springer, pp 15–25

Download references

Acknowledgements

We thank the reviewers for their helpful comments and greatly acknowledge valuable contributions of our classmate Qiang Zhou in the preparation of our work. This work is supported by the Fundamental Research Funds for the Central Universities under Grant 2019XDA20.

Author information

Authors and Affiliations

Department of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Shirui Wang, Wenan Zhou & Chao Jiang

Authors

Shirui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Chao Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenan Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Zhou, W. & Jiang, C. A survey of word embeddings based on deep learning. Computing 102, 717–740 (2020). https://doi.org/10.1007/s00607-019-00768-7

Download citation

Received: 04 June 2019
Accepted: 05 November 2019
Published: 12 November 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00607-019-00768-7

Keywords

Mathematics Subject Classification

68T50

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of word embeddings based on deep learning

Abstract

Access this article

Similar content being viewed by others

Knowledge-Powered Deep Learning for Word Embedding

Resurgence of Deep Learning: Genesis of Word Embedding

Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A survey of word embeddings based on deep learning

Abstract

Access this article

Similar content being viewed by others

Knowledge-Powered Deep Learning for Word Embedding

Resurgence of Deep Learning: Genesis of Word Embedding

Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation