Skip to main content
Log in

Combining weighted category-aware contextual information in convolutional neural networks for text classification

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) are widely used in many natural language processing tasks, which employ some convolutional filters to capture useful semantic features of a text. However, a small window size convolutional filter is short of the ability to capture contextual information, simply increasing the window size may bring the problems of data sparsity and enormous parameters. To capture the contextual information, we propose to use the weighted sum operation to obtain contextual word representation. We present one implicit weighting method and two explicit category-aware weighting methods to assign the weights of the contextual information. Experimental results on five text classification datasets show the effectiveness of our proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

Notes

  1. http://cogcomp.cs.illinois.edu/Data/QA/QC/

  2. http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html

  3. https://www.cs.cornell.edu/people/pabo/movie-review-data/

  4. http://www.cs.cornell.edu/home/llee/data/search-subj.html

  5. http://www.cs.pitt.edu/mpqa/

References

  1. Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining Text Data, pp. 163–222. Springer (2012)

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate, arXiv:1409.0473 (2014)

  3. Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)

  4. Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading, arXiv:1601.06733 (2016)

  5. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)

    MATH  Google Scholar 

  6. Cotterell, R., Schütze, H.: Morphological word-embeddings. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1287–1292 (2015)

  7. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)

    MATH  Google Scholar 

  8. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Text Mining and its Applications, pp. 81–97. Springer (2004)

  9. Hu, Z., Ma, X., Liu, Z., Hovy, E., Xing, E.: Harnessing deep neural networks with logic rules, arXiv:1603.06318 (2016)

  10. Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 95–105 (2015)

  11. Irsoy, O., Cardie, C.: Deep recursive neural networks for compositionality in language. In: Advances in Neural Information Processing Systems, pp. 2096–2104 (2014)

  12. Kim, Y.: Convolutional neural networks for sentence classification, arXiv:1408.5882 (2014)

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  14. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, vol. 333, pp. 2267–2273 (2015)

  15. Lan, M., Tan, C.L., Low, H.B.: Proposing a new term weighting scheme for text categorization. In: AAAI, vol. 6, pp. 763–768 (2006)

  16. Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 721–735 (2009)

    Article  Google Scholar 

  17. LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time series. Handbook Brain Theory Neural Netw. 3361(10), 1995 (1995)

    Google Scholar 

  18. Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46(1-3), 423–444 (2002)

    Article  Google Scholar 

  19. Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)

  20. Li, S., Zhao, Z., Liu, T., Hu, R., Du, X.: Initializing convolutional filters with semantic features for text classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1884–1889 (2017)

  21. Li, X., Rao, Y., Xie, H., Lau, R.Y.K., Yin, J., Wang, F.L.: Bootstrapping social emotion classification with semantically rich hybrid neural networks. IEEE Trans. Affect. Comput. 8(4), 428–442 (2017)

    Article  Google Scholar 

  22. Li, Y., Cai, Y., Leung, H.F., Li, Q.: Improving short text modeling by two-level attention networks for sentiment classification. In: International Conference on Database Systems for Advanced Applications, pp. 878–890. Springer (2018)

  23. Liang, W., Xie, H., Rao, Y., Lau, R.Y., Wang, F.L.: Universal affective model for readers’ emotion classification over short texts. Expert Syst. Appl. 114, 322–333 (2018)

    Article  Google Scholar 

  24. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

  25. Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)

  26. Ng, A.Y.: Feature selection, l 1 vs. l 2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on Machine learning, p 78. ACM (2004)

  27. Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, p 271. Association for Computational Linguistics (2004)

  28. Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)

  29. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)

  30. Parikh, A.P., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference, arXiv:1606.01933(2016)

  31. Post, M., Bergsma, S.: Explicit and implicit syntactic features for text classification. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 866–872 (2013)

  32. Quan, X., Wenyin, L., Qiu, B.: Term weighting schemes for question categorization. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 1009–1021 (2011)

    Article  Google Scholar 

  33. Rao, Y., Xie, H., Li, J., Jin, F., Wang, F.L., Li, Q.: Social emotion classification of short text via topic-level maximum entropy model. Inform. Manag. 53 (8), 978–986 (2016)

    Article  Google Scholar 

  34. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)

    Article  Google Scholar 

  35. Santos, C.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1818–1826 (2014)

  36. Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 129–136 (2011)

  37. Socher, R., Bauer, J., Manning, C.D., et al.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 455–465 (2013)

  38. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

  39. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Document. 28(1), 11–21 (1972)

    Article  Google Scholar 

  40. Tang, D., Qin, B., Feng, X., Liu, T.: Target-dependent sentiment classification with long short term memory, arXiv:1512.01100 (2015)

  41. Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics (2010)

  42. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)

  43. Wang, S., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 90–94. Association for Computational Linguistics (2012)

  44. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph and text jointly embedding. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1591–1601 (2014)

  45. Wang, T., Cai, Y., Leung, H.f., Cai, Z., Min, H.: Entropy-based term weighting schemes for text categorization in vsm. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 325–332 (2015)

  46. Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2915–2921. AAAI Press (2017)

  47. Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Resour. Eval. 39(2–3), 165–210 (2005)

    Article  Google Scholar 

  48. Wu, X., Cai, Y., Li, Q., Xu, J., Leung, H.f.: Combining contextual information by self-attention mechanism in convolutional neural networks for text classification. In: International Conference on Web Information Systems Engineering, pp. 453–467. Springer (2018)

  49. Yang, C., Lin, K.H.Y., Chen, H.H.: Emotion classification using Web blog corpora. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI’07), pp. 275–278 (2007)

  50. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

  51. Yang, Q., Rao, Y., Xie, H., Wang, J., Wang, F.L., Chan, W.H., Cambria, E.C.: Segment-level joint topic-sentiment model for online review analysis. IEEE Intell. Syst. 34(1), 43–50 (2019)

    Article  Google Scholar 

  52. Yin, W., Schütze, H.: Multichannel variable-size convolution for sentence classification. arXiv:1603.04513 (2016)

  53. Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 545–550 (2014)

  54. Zeiler, M.D.: Adadelta: An adaptive learning rate method. arXiv:1212.5701 (2012)

  55. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)

  56. Zhang, Y., Roller, S., Wallace, B.: Mgnc-cnn: a simple approach to exploiting multiple word embeddings for sentence classification. arXiv:1603.00968 (2016)

  57. Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 207–212 (2016)

Download references

Acknowledgements

This article is the extension of the conference paper: Wu, X., Cai, Y., Li, Q., Xu, J., Leung, H. F. (2018, November). Combining Contextual Information by Self-attention Mechanism in Convolutional Neural Networks for Text Classification. In International Conference on Web Information Systems Engineering (pp. 453-467). Springer, Cham.

In this article, we make the following contributions beyond the conference paper.

– We conduct several extension experiments to further test the performance of our methods proposed in the conference paper. We find that the weights computed by self-attention could turn to some unexplainable values and become a harmful noise to the model.

– To address the limitations of the self-attention mechanism and further improve the interpretability and controllability or the model, we propose an explicit category-aware term weighting method to explicitly assign the weights to words and compute the contextual word embedding.

– To further leverage the information between words and words, we propose a co-occurrence based weighting method.

– We conduct several experiments on five short text classification datasets to demonstrate the effectiveness of our new proposed methods. The results show that our methods can outperform some state-of-the-art methods.

This work was supported by the Fundamental Research Funds for the Central Universities, SCUT (No. 2017ZD048, D2182480), the Science and Technology Planning Project of Guangdong Province (No.2017B050506004), the Science and Technology Program of Guangzhou (No. 201704030076,201802010027). The research described in this article been supported by a collaborative research grant from the Hong Kong Research Grants Council (project no. C1031-18G) and a CUHK Direct Grant (Project Code EE16963).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Cai.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Web Information Systems Engineering 2018

Guest Editors: Hakim Hacid, Wojciech Cellary, Hua Wang and Yanchun Zhang

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X., Cai, Y., Li, Q. et al. Combining weighted category-aware contextual information in convolutional neural networks for text classification. World Wide Web 23, 2815–2834 (2020). https://doi.org/10.1007/s11280-019-00757-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00757-y

Keywords

Navigation