Skip to main content

Computing Sentence Embedding by Merging Syntactic Parsing Tree and Word Embedding

  • Conference paper
  • First Online:
Book cover Artificial Intelligence and Security (ICAIS 2020)

Abstract

Recent progress in using deep learning for training word embedding has motivated us to explore the research of semantic representation in long texts, such as sentences, paragraphs and chapters. The existing methods typically use word weights and word vectors to calculate sentence embedding. However, these methods lose the word order and the syntactic structure information of sentences. This paper proposes a method for sentence embedding based on the results of syntactic parsing tree and word vectors. We propose the SynTree-WordVec method for deriving sentence embedding, which merges word vectors and the syntactic structure from the Stanford parser. The experimental results show the potential to solve the shortcomings of existing methods. Compared to the traditional sentence embedding weighting method, our method achieves better or comparable performance on various text similarity tasks, especially with the low dimension of the data set.

Supported by the National Natural Science Foundation of China (NSFC) under Grant No.61877031, No.61876074.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agirre, E., et al.: Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252–263 (2015)

    Google Scholar 

  2. Agirre, E., et al.: Semeval-2014 task 10: Multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91 (2014)

    Google Scholar 

  3. Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *sem 2013 shared task: Semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pp. 32–43 (2013)

    Google Scholar 

  4. Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: A pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics (2012)

    Google Scholar 

  5. Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings (2016)

    Google Scholar 

  6. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)

    Google Scholar 

  7. Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556. Association for Computational Linguistics (2012)

    Google Scholar 

  8. Collobert, R., Weston, J.: A unified architecture for natural language processing. In: International Conference on Machine Learning (2008)

    Google Scholar 

  9. Frege, G.: Über begriff und gegenstand (1892)

    Google Scholar 

  10. Hermann, K.M.: Distributed representations for compositional semantics (2014). arXiv preprint arXiv:1411.3146

  11. Lai, S.: Word and document embeddings based on neural network approaches (2016). arXiv preprint arXiv:1611.05962

  12. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  13. Manning, C.D., Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT press, Cambridge (1999)

    Google Scholar 

  14. Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 1–8 (2014)

    Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  16. Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: proceedings of ACL-08: HLT, pp. 236–244 (2008)

    Google Scholar 

  17. Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)

    Article  Google Scholar 

  18. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  19. Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)

    Google Scholar 

  20. Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 129–136 (2011)

    Google Scholar 

  21. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

    Google Scholar 

  22. Song, M., Zhao, X., Liu, Y., Zhao, Z.: Text sentiment analysis based on convolutional neural network and bidirectional LSTM model. In: Zhou, Q., Miao, Q., Wang, H., Xie, W., Wang, Y., Lu, Z. (eds.) ICPCSEE 2018. CCIS, vol. 902, pp. 55–68. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2206-8_6

    Chapter  Google Scholar 

  23. Wu, W., Zhou, J., Qu, W.: A survey of syntactic parsing based on statistical learning. J. Chin. Inf. Process. 27(3), 9–19 (2013)

    Google Scholar 

  24. Xiong, Z., Shen, Q., Wang, Y., Zhu, C.: Paragraph vector representation based on word to vector and CNN learning. CMC Comput. Mater. Contin 55, 213–227 (2018)

    Google Scholar 

  25. Xiong, Z., Shen, Q., Xiong, Y., Wang, Y., Li, W.: New generation model of word vector representation based on CBOW or skip-gram. CMC-Comput. Mater. Continua 60(1), 259–273 (2019)

    Google Scholar 

  26. Xu, W., Callison-Burch, C., Dolan, B.: Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (PIT). In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 1–11 (2015)

    Google Scholar 

  27. Zhang, X., Lu, W., Li, F., Peng, X., Zhang, R.: Deep feature fusion model for sentence semantic matching. Mater. Continua Comput. 61(2), 601–616 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maosheng Zhong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y., Zhong, M., Tao, L., Wu, S. (2020). Computing Sentence Embedding by Merging Syntactic Parsing Tree and Word Embedding. In: Sun, X., Wang, J., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2020. Lecture Notes in Computer Science(), vol 12239. Springer, Cham. https://doi.org/10.1007/978-3-030-57884-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57884-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57883-1

  • Online ISBN: 978-3-030-57884-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics