Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification

Zhang, Kaiqiang; Wang, Shupeng; Li, Binbin; Mei, Feng; Zhang, Jianyu

doi:10.1007/978-3-030-29894-4_18

Kaiqiang Zhang^10,11,
Shupeng Wang¹⁰,
Binbin Li¹⁰,
Feng Mei¹² &
…
Jianyu Zhang¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11672))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

2725 Accesses
2 Citations

Abstract

We propose a hierarchical convolutional attention network using joint Chinese word embedding for text classification. Compared with previous methods, our model has three notable improvements: (i) it considers not only words but also their characters and fine-grained sub-character components; (ii) it employs self-attention mechanisms with the benefits of convolution feature extraction, enable it to attend differentially to more and less important content; (iii) it has a hierarchical structure that can get the document vector. We demonstrate the effectiveness of our architecture by surpassing the accuracy of the current state-of-the-art on four classification datasets. Visualization of our hierarchical structure illustrates that our model is able to select informative sentences and words in a document.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 90–94. Association for Computational Linguistics (2012)
Google Scholar
Guo, Z., Zhao, Y., Zheng, Y., Si, X., Liu, Z., Sun, M.: THUCTC: An Efficient Chinese Text Classifier (2016)
Google Scholar
Wang, C., Zhang, M., Ma, S., Ru, L.: Automatic online news issue construction in web environment. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, New York, NY, USA, pp. 457–466 (2008)
Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014. arXiv:1406.1078 (2014)
Krizhevsky, A.I.S.A., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012, pp. 1097–1105 (2012)
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv 2014. arXiv:1404.2188 (2014)
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: Learning for Text Categorization: Papers from the 1998 workshop, vol. 62, pp. 98–105 (1998)
Google Scholar
Sebastiani, F.: Text Categorization [OL]. Encyclopedia of Database Technologies and Applications (2005)
Chapter Google Scholar
Joachims, T.: Text categorization with Support Vector Machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Blunsom, P., Grefenstette, E., Kalchbrenner, N., et al.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (2014)
Google Scholar
Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709 (2017)
Hendrycks, D., Gimpel, K.: Gaussian Error Linear Units (GELUs). arXiv:1606.08415 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP, pp. 1746–1751 (2014)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: NIPS, pp. 649–657 (2015)
Google Scholar
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
MATH Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Proceedings of NIPS, pp. 1081–1088 (2009)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119 (2013)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Lin, Z., et al.: A structured self-attentive sentence embedding. In: ICLR (2017)
Google Scholar
Cheng, J., Dong, L., Lapata, M.: Long Short-term Memory-networks for Machine Reading, pp. 551–561 (2016)
Google Scholar
Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Kaiqiang Zhang, Shupeng Wang & Binbin Li
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Kaiqiang Zhang
Changan Communication Technology Co., LTD., Beijing, China
Feng Mei & Jianyu Zhang

Authors

Kaiqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shupeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Binbin Li
View author publications
You can also search for this author in PubMed Google Scholar
Feng Mei
View author publications
You can also search for this author in PubMed Google Scholar
Jianyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Mei .

Editor information

Editors and Affiliations

Department of Computing, Macquarie University, Sydney, NSW, Australia
Abhaya C. Nayak
RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Alok Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, K., Wang, S., Li, B., Mei, F., Zhang, J. (2019). Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11672. Springer, Cham. https://doi.org/10.1007/978-3-030-29894-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-29894-4_18
Published: 23 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29893-7
Online ISBN: 978-3-030-29894-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics