skip to main content
research-article

Encoding Syntactic Knowledge in Neural Networks for Sentiment Classification

Published: 05 June 2017 Publication History

Abstract

Phrase/Sentence representation is one of the most important problems in natural language processing. Many neural network models such as Convolutional Neural Network (CNN), Recursive Neural Network (RNN), and Long Short-Term Memory (LSTM) have been proposed to learn representations of phrase/sentence, however, rich syntactic knowledge has not been fully explored when composing a longer text from its shorter constituent words. In most traditional models, only word embeddings are utilized to compose phrase/sentence representations, while the syntactic information of words is yet to be explored. In this article, we discover that encoding syntactic knowledge (part-of-speech tag) in neural networks can enhance sentence/phrase representation. Specifically, we propose to learn tag-specific composition functions and tag embeddings in recursive neural networks, and propose to utilize POS tags to control the gates of tree-structured LSTM networks. We evaluate these models on two benchmark datasets for sentiment classification, and demonstrate that improvements can be obtained with such syntactic knowledge encoded.

References

[1]
Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, and Yoshua Bengio. 2012. Theano: New features and speed improvements. Deep Learning and Unsupervised Feature Learning. NIPS 2012 Workshop.
[2]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. The Journal of Machine Learning Research 3 (2003), 1137--1155.
[3]
Tao Chen, Ruifeng Xu, Yulan He, and Xuan Wang. 2017. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Systems with Applications 72 (2017), 221--230.
[4]
Yanqing Chen and Steven Skiena. 2014. Building sentiment lexicons for all major languages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 383--389.
[5]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014).
[6]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167.
[7]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (2011), 2493--2537.
[8]
Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2014. Adaptive multi-compositionality for recursive neural models with applications to sentiment analysis. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI’14). AAAI.
[9]
Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question answering over freebase with multi-column convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 260--269.
[10]
Jeffrey L. Elman. 1990. Finding structure in time. Cognitive Science 14, 2 (1990), 179--211.
[11]
Peter W. Foltz, Walter Kintsch, and Thomas K. Landauer. 1998. The measurement of textual coherence with latent semantic analysis. Discourse Processes 25, 2--3 (1998), 285--307.
[12]
Alan Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 273--278.
[13]
Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2015. LSTM: A search space odyssey. arXiv:1503.04069 (2015).
[14]
Karl Moritz Hermann and Phil Blunsom. 2013. The role of syntax in vector space models of compositional semantics. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computer Linguistics, 894--904.
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
[16]
Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168--177.
[17]
Ozan Irsoy and Claire Cardie. 2014. Deep recursive neural networks for compositionality in language. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). 2096--2104.
[18]
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). Association for Computer Linguistics, 655--665.
[19]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In EMNLP. Association for Computational Linguistics, 1746--1751.
[20]
Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—Volume 1. Association for Computational Linguistics, 423--430.
[21]
Thomas K. Landauer and Susan T. Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104, 2 (1997), 211.
[22]
Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2015. Molding CNNs for text: Non-linear, non-consecutive convolutions. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association of Computational Linguistics, 1565--1575.
[23]
Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. 2015. A hierarchical neural autoencoder for paragraphs and documents. arXiv:1506.01057 (2015).
[24]
Biao Liu, Minlie Huang, Jiashen Sun, and Xuan Zhu. 2015. Incorporating domain and sentiment supervision in representation learning for domain adaptation. In Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 1277--1283.
[25]
Tomas Mikolov. 2012. Statistical language models based on neural networks. Presentation at Google, Mountain View, April 2, 2012.
[26]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. CoRR (2013).
[27]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13). 3111--3119.
[28]
Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In Proceedings of ACL. 236--244.
[29]
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 115--124.
[30]
Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2, 1--2 (2008), 1--135.
[31]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP’14) 12 (2014), 1532--1543.
[32]
Qiao Qian, Bo Tian, Minlie Huang, Yang Liu, Xuan Zhu, and Xiaoyan Zhu. 2015. Learning tag embeddings and tag-specific composition functions in recursive neural network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 1. 1365--1374.
[33]
Sebastian Rudolph and Eugenie Giesbrecht. 2010. Compositional matrix-space models of language. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). Association for Computer Linguistics, 907--916.
[34]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323 (1986), 533--536.
[35]
Aliaksei Severyn and Alessandro Moschitti. 2015a. On the automatic learning of sentiment lexicons. In Proceedings of the NAACL HLT 2015 Conference of the North American Chapter of the Association for Computational Linguistics. 1397--1402.
[36]
Aliaksei Severyn and Alessandro Moschitti. 2015b. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM SIGIR, 959--962.
[37]
Paul Smolensky. 1990. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence 46, 1 (1990), 159--216.
[38]
Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. 2013a. Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computer Linguistics, 455--465.
[39]
Richard Socher, Eric H. Huang, Jeffrey Pennin, Christopher D. Manning, and Andrew Y. Ng. 2011a. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11). 801--809.
[40]
Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’12). Association for Computational Linguistics, 1201--1211.
[41]
Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011b. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). Association for Computational Linguistics, 151--161.
[42]
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013b. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP. Association for Computational Linguistics, 1631--1642.
[43]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of Advances in Neural Information Processing Systems. 3104--3112.
[44]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075 (2015).
[45]
Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1422--1432.
[46]
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1555--1565.
[47]
Zhiyang Teng, Duy-Tin Vo, and Yue Zhang. 2016. Context-sensitive lexicon features for neural sentiment analysis. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1629--1638.
[48]
Tin Duy Vo and Yue Zhang. 2016. Don’t count, predict! an automatic approach to learning sentiment lexicons for short text. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 219--224.
[49]
Sida I. Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers—Volume 2 (ACL’12). Association for Computational Linguistics, 90--94.
[50]
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT’05). 347--354.
[51]
Ainur Yessenalina and Claire Cardie. 2011. Compositional matrix-space models for sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). Association for Computer Linguistics, 172--182.
[52]
Xiaodan Zhu, Parinaz Sobhani, and Hongyu Guo. 2015. Long short-term memory over tree structures. arXiv:1503.04881.

Cited By

View all
  • (2024)Automatically Distinguishing People’s Explicit and Implicit Attitude Bias by Bridging Psychological Measurements with Sentiment Analysis on Large CorporaApplied Sciences10.3390/app1410419114:10(4191)Online publication date: 15-May-2024
  • (2024)Semi-Supervised Dimensional Media Sentiment Analysis via Exploring Sample RelationshipsIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.330768511:4(5298-5307)Online publication date: Aug-2024
  • (2024)Enhanced Sentiment Intensity Regression Through LoRA Fine-Tuning on Llama 3IEEE Access10.1109/ACCESS.2024.343835312(108072-108087)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 35, Issue 3
July 2017
410 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3026478
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2017
Accepted: 01 December 2016
Revised: 01 October 2016
Received: 01 June 2016
Published in TOIS Volume 35, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Neural networks
  2. deep learning
  3. long short-term memory
  4. recursive neural network
  5. representation learning
  6. sentiment analysis
  7. sentiment classification

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Basic Research Program (973 Program)
  • National Science Foundation of China
  • Beijing Higher Education Young Elite Teacher Project

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)6
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automatically Distinguishing People’s Explicit and Implicit Attitude Bias by Bridging Psychological Measurements with Sentiment Analysis on Large CorporaApplied Sciences10.3390/app1410419114:10(4191)Online publication date: 15-May-2024
  • (2024)Semi-Supervised Dimensional Media Sentiment Analysis via Exploring Sample RelationshipsIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.330768511:4(5298-5307)Online publication date: Aug-2024
  • (2024)Enhanced Sentiment Intensity Regression Through LoRA Fine-Tuning on Llama 3IEEE Access10.1109/ACCESS.2024.343835312(108072-108087)Online publication date: 2024
  • (2024)Towards a System that Predicts the Category of Educational and Vocational Guidance Questions, Utilizing Bidirectional Encoder Representations of Transformers (BERT)Engineering Applications of Artificial Intelligence10.1007/978-3-031-50300-9_5(81-94)Online publication date: 20-Feb-2024
  • (2023)Ensemble Approach to Combining Episode Prediction Models Using Sequential Circadian Rhythm Sensor Data from Mental Health PatientsSensors10.3390/s2320854423:20(8544)Online publication date: 18-Oct-2023
  • (2023)Utilizing Machine Learning for Detecting Harmful Situations by Audio and TextApplied Sciences10.3390/app1306392713:6(3927)Online publication date: 20-Mar-2023
  • (2023)Intelligence system for sentiment classification with deep topic embedding using N-gram based topic modelingJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23024645:1(1539-1565)Online publication date: 2-Jul-2023
  • (2023)Multistream BertGCN for Sentiment Classification Based on Cross-Document LearningQuantum Engineering10.1155/2023/36689602023(1-9)Online publication date: 13-Nov-2023
  • (2023)Can Online Consumer Reviews Signal Restaurant Closure: A Deep Learning-Based Time-Series AnalysisIEEE Transactions on Engineering Management10.1109/TEM.2020.301632970:3(834-848)Online publication date: Mar-2023
  • (2023)An Efficient framework for Metadata Extraction over Scholarly Documents using Ensemble CNN and BiLSTM Technique2023 2nd International Conference for Innovation in Technology (INOCON)10.1109/INOCON57975.2023.10101029(1-9)Online publication date: 3-Mar-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media