Syntactic word embedding based on dependency syntax and polysemous analysis

Ye, Zhong-lin; Zhao, Hai-xing

doi:10.1631/FITEE.1601846

Syntactic word embedding based on dependency syntax and polysemous analysis

Published: 11 June 2018

Volume 19, pages 524–535, (2018)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

377 Accesses
Explore all metrics

Abstract

Most word embedding models have the following problems: (1) In the models based on bag-of-words contexts, the structural relations of sentences are completely neglected; (2) Each word uses a single embedding, which makes the model indiscriminative for polysemous words; (3) Word embedding easily tends to contextual structure similarity of sentences. To solve these problems, we propose an easy-to-use representation algorithm of syntactic word embedding (SWE). The main procedures are: (1) A polysemous tagging algorithm is used for polysemous representation by the latent Dirichlet allocation (LDA) algorithm; (2) Symbols ‘+’ and ‘−’ are adopted to indicate the directions of the dependency syntax; (3) Stopwords and their dependencies are deleted; (4) Dependency skip is applied to connect indirect dependencies; (5) Dependency-based contexts are inputted to a word2vec model. Experimental results show that our model generates desirable word embedding in similarity evaluation tasks. Besides, semantic and syntactic features can be captured from dependency-based syntactic contexts, exhibiting less topical and more syntactic similarity. We conclude that SWE outperforms single embedding learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing explicit and predictive distributional semantic models endowed with syntactic contexts

Article 13 May 2016

Computing Sentence Embedding by Merging Syntactic Parsing Tree and Word Embedding

A Sentence Similarity Model Based on Word Embeddings and Dependency Syntax-Tree

References

Baroni M, Lenci A, 2010. Distributional memory: a general framework for corpus-based semantics. Comput Ling, 36(4):673–721. https://doi.org/10.1162/coli_a_00016
Article Google Scholar
Bengio Y, Ducharme R, Vincent P, et al., 2003. A neural probabilistic language model. J Mach Learn Res, 3(6): 1137–1155. https://doi.org/10.1007/3-540-33486-6_6
MATH Google Scholar
Bullinaria JA, Levy JP, 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behav Res Methods, 39(3):510–526. https://doi.org/10.3758/BF03193020
Article Google Scholar
Finkelstein L, Gabrilovich E, Matias Y, et al., 2002. Placing search in context: the concept revisited. ACM Trans Inform Syst, 20(1):116–131. https://doi.org/10.1145/503104.503110
Article Google Scholar
Firth JR, 1957. A synopsis of linguistic theory. Stud Ling Anal, 41(4):1–32.
Google Scholar
Goldberg Y, Nivre J, 2012. A dynamic oracle for arc-eager dependency parsing. Proc Coling, p.959–976.
Google Scholar
Goldberg Y, Nivre J, 2014. Training deterministic parsers with non-deterministic oracles. Trans Assoc Comput Ling, p.403–414.
Google Scholar
Harris ZS, 1981. Distributional structure. Word, 10(2–3): 146–162. https://doi.org/10.1007/978-94-017-6059-1_36
Google Scholar
Hill F, Reichart R, Korhonen A, 2015. SimLex-999: evaluating semantic models with (genuine) similarity estimation. Comput Ling, 41(2):665–695. https://doi.org/10.1162/COLI_a_00237
Article MathSciNet Google Scholar
Hinton GE, 1986. Learning distributed representations of concepts. Proc 8th Annual Conf of the Cognitive Science Society, p.1-12.
Google Scholar
Huang EH, Socher R, Manning CD, et al., 2012. Improving word representations via global context and multiple word prototypes. Proc 50th Annual Meeting of Association for Computational Linguistics, p.873–882.
Google Scholar
Krishna K, Murty MN, 1999. Genetic K-means algorithm. IEEE Trans Syst Man Cybern Part B, 29(3):433–439. https://doi.org/10.1109/3477.764879
Article Google Scholar
Lebret R, Collobert R, 2014. Word embeddings through Hellinger PCA. Proc 14th Conf on European Chapter of the Association for Computational Linguistics, p.482–490.
Google Scholar
Lebret R, Collobert R, 2015. Rehabilitation of count-based models for word vector representations. Int Conf on Intelligent Text Processing and Computational Linguistics, p.417–429. https://doi.org/10.1007/978-3-319-18111-0_31
Google Scholar
Levy O, Goldberg Y, 2014. Dependency-based word embeddings. Proc 52nd Annual Meeting of Association for Computational Linguistics, p.302–308. https://doi.org/10.3115/v1/P14-2050
Google Scholar
Liu Y, Liu ZY, Chua TS, et al., 2015. Topical word embeddings. Proc 29th AAAI Conf on Artificial Intelligence, p.2418–2424.
Google Scholar
Luong MT, Socher R, Manning CD, 2013. Better word representations with recursive neural networks for morphology. Proc 17th Conf on Computational Natural Language Learning, p.104–113.
Google Scholar
Mikolov T, Sutskever I, Chen K, et al., 2013. Distributed representations of words and phrases and their compositionality. Int Conf on Neural Information Processing Systems, p.3111–3119.
Google Scholar
Mnih A, Hinton GE, 2008. A scalable hierarchical distributed language model. Proc 21st Int Conf on Neural Information Processing System, p.1081–1088.
Google Scholar
Nguyen KA, Walde SSI, Vu NT, 2016. Neural-based noise filtering from word embeddings. Proc 26th Int Conf on Computational Linguistics, p.2699–2707.
Google Scholar
Pennington J, Socher R, Manning CD, 2014. Glove: global vectors for word representation. Proc Conf on Empirical Methods in Natural Language Processing, p.1532–1543.
Google Scholar
Ren YF, Wang RM, Ji DH, 2016. A topic-enhanced word embedding for Twitter sentiment classification. Inform Sci, 369:188–198. https://doi.org/10.1016/j.ins.2016.06.040
Article Google Scholar
Ritter A, Mausam, Etzioni O, 2010. A latent Dirichlet allocation method for selectional preferences. Proc 48th Annual Meeting of Association for Computational Linguistics, p.424–434.
Google Scholar
Rubenstein H, Goodenough JB, 1965. Contextual correlates of synonymy. Commun ACM, 8(10):627–633. https://doi.org/10.1145/365628.365657
Article Google Scholar
Tian F, Dai HJ, Bian J, et al., 2014. A probabilistic model for learning multi-prototype word embeddings. Proc 25th Int Conf on Computational Linguistics, p.151–160.
Google Scholar
Turney PD, Pantel P, 2010. From frequency to meaning: vector space models of semantics. J Artif Intell Res, 37(1):141–188. https://doi.org/10.1613/jair.2934
MathSciNet MATH Google Scholar
Wang P, Xu B, Xu JM, et al., 2016. Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174(B):806–814. https://doi.org/10.1016/j.neucom.2015.09.096
Article Google Scholar
Xu W, Rudnicky AI, 2000. Can artificial neural networks learn language models? Proc 6th Int Conf on Spoken Language Processing, p.202–205.
Google Scholar
Zhai M, Tan J, Choi DJ, 2016. Intrinsic and extrinsic evaluations of word embeddings. Proc 30th AAAI Conf on Artificial Intelligence, p.4282–4283.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Shaanxi Normal University, Xi’an, 710119, China
Zhong-lin Ye & Hai-xing Zhao
School of Computer, Qinghai Normal University, Xining, 810800, China
Hai-xing Zhao

Authors

Zhong-lin Ye
View author publications
You can also search for this author inPubMed Google Scholar
Hai-xing Zhao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hai-xing Zhao.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61663041 and 61763041), the Program for Changjiang Scholars and Innovative Research Team in Universities, China (No. IRT_15R40), the Research Fund for the Chunhui Program of Ministry of Education of China (No. Z2014022), the Natural Science Foundation of Qinghai Province, China (No. 2014-ZJ-721), and the Fundamental Research Funds for the Central Universities, China (No. 2017TS045)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, Zl., Zhao, Hx. Syntactic word embedding based on dependency syntax and polysemous analysis. Frontiers Inf Technol Electronic Eng 19, 524–535 (2018). https://doi.org/10.1631/FITEE.1601846

Download citation

Received: 21 December 2016
Revised: 17 April 2017
Published: 11 June 2018
Issue Date: April 2018
DOI: https://doi.org/10.1631/FITEE.1601846

Key words

CLC number

TP391

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Syntactic word embedding based on dependency syntax and polysemous analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparing explicit and predictive distributional semantic models endowed with syntactic contexts

Computing Sentence Embedding by Merging Syntactic Parsing Tree and Word Embedding

A Sentence Similarity Model Based on Word Embeddings and Dependency Syntax-Tree

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Subscribe and save

Buy Now