Computing Sentence Embedding by Merging Syntactic Parsing Tree and Word Embedding

Wang, Yong; Zhong, Maosheng; Tao, Lan; Wu, Shuixiu

doi:10.1007/978-3-030-57884-8_2

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12239))

Included in the following conference series:

International Conference on Artificial Intelligence and Security

1137 Accesses

Abstract

Recent progress in using deep learning for training word embedding has motivated us to explore the research of semantic representation in long texts, such as sentences, paragraphs and chapters. The existing methods typically use word weights and word vectors to calculate sentence embedding. However, these methods lose the word order and the syntactic structure information of sentences. This paper proposes a method for sentence embedding based on the results of syntactic parsing tree and word vectors. We propose the SynTree-WordVec method for deriving sentence embedding, which merges word vectors and the syntactic structure from the Stanford parser. The experimental results show the potential to solve the shortcomings of existing methods. Compared to the traditional sentence embedding weighting method, our method achieves better or comparable performance on various text similarity tasks, especially with the low dimension of the data set.

Supported by the National Natural Science Foundation of China (NSFC) under Grant No.61877031, No.61876074.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agirre, E., et al.: Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252–263 (2015)
Google Scholar
Agirre, E., et al.: Semeval-2014 task 10: Multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91 (2014)
Google Scholar
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *sem 2013 shared task: Semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pp. 32–43 (2013)
Google Scholar
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: A pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics (2012)
Google Scholar
Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings (2016)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
Google Scholar
Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556. Association for Computational Linguistics (2012)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing. In: International Conference on Machine Learning (2008)
Google Scholar
Frege, G.: Über begriff und gegenstand (1892)
Google Scholar
Hermann, K.M.: Distributed representations for compositional semantics (2014). arXiv preprint arXiv:1411.3146
Lai, S.: Word and document embeddings based on neural network approaches (2016). arXiv preprint arXiv:1611.05962
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Manning, C.D., Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT press, Cambridge (1999)
Google Scholar
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 1–8 (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: proceedings of ACL-08: HLT, pp. 236–244 (2008)
Google Scholar
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)
Google Scholar
Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 129–136 (2011)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Song, M., Zhao, X., Liu, Y., Zhao, Z.: Text sentiment analysis based on convolutional neural network and bidirectional LSTM model. In: Zhou, Q., Miao, Q., Wang, H., Xie, W., Wang, Y., Lu, Z. (eds.) ICPCSEE 2018. CCIS, vol. 902, pp. 55–68. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2206-8_6
Chapter Google Scholar
Wu, W., Zhou, J., Qu, W.: A survey of syntactic parsing based on statistical learning. J. Chin. Inf. Process. 27(3), 9–19 (2013)
Google Scholar
Xiong, Z., Shen, Q., Wang, Y., Zhu, C.: Paragraph vector representation based on word to vector and CNN learning. CMC Comput. Mater. Contin 55, 213–227 (2018)
Google Scholar
Xiong, Z., Shen, Q., Xiong, Y., Wang, Y., Li, W.: New generation model of word vector representation based on CBOW or skip-gram. CMC-Comput. Mater. Continua 60(1), 259–273 (2019)
Google Scholar
Xu, W., Callison-Burch, C., Dolan, B.: Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (PIT). In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 1–11 (2015)
Google Scholar
Zhang, X., Lu, W., Li, F., Peng, X., Zhang, R.: Deep feature fusion model for sentence semantic matching. Mater. Continua Comput. 61(2), 601–616 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Jiangxi Normal University, Nanchang Ziyang Avenue No. 99, Nanchang, 330022, China
Yong Wang, Maosheng Zhong & Shuixiu Wu
East China Jiao Tong University, Nanchang Shuanggang East Street No. 808, Nanchang, 330013, China
Maosheng Zhong & Lan Tao

Authors

Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Maosheng Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Lan Tao
View author publications
You can also search for this author in PubMed Google Scholar
Shuixiu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maosheng Zhong .

Editor information

Editors and Affiliations

Nanjing University of Information Science, Nanjing, China
Xingming Sun
Nanjing University of Information Science, Nanjing, China
Jinwei Wang
Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Zhong, M., Tao, L., Wu, S. (2020). Computing Sentence Embedding by Merging Syntactic Parsing Tree and Word Embedding. In: Sun, X., Wang, J., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2020. Lecture Notes in Computer Science(), vol 12239. Springer, Cham. https://doi.org/10.1007/978-3-030-57884-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-57884-8_2
Published: 01 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57883-1
Online ISBN: 978-3-030-57884-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics