Abstract
Research of document classification is ongoing to employ the attention based-deep learning algorithms and achieves impressive results. Owing to the complexity of the document, classical models, as well as single attention mechanism, fail to meet the demand of high-accuracy classification. This paper proposes a method that classifies the document via the hierarchical multi-attention networks, which describes the document from the word-sentence level and the sentence-document level. Further, different attention strategies are performed on different levels, which enables accurate assigning of the attention weight. Specifically, the soft attention mechanism is applied to the word-sentence level while the CNN-attention to the sentence-document level. Due to the distinctiveness of the model, the proposed method delivers the highest accuracy compared to other state-of-the-art methods. In addition, the attention weight visualization outcomes present the effectiveness of attention mechanism in distinguishing the importance.
Similar content being viewed by others
References
Chambers A (2013) Statistical models for text classification and clustering: applications and analysis. Dissertation. University of California, Irvine
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407
Pang B, Lee LJF (2008) Retrieval TiI. Opin Min Sentiment Anal 2:1–135
Longpre S, Pradhan S, Xiong C, Socher RJapa (2016) A way out of the odyssey: analyzing and combining recent insights for LSTMs
Sarioglu ES (2014) Effective classification of clinical reports: natural language processing-based and topic modeling-based approaches. The George Washington University
Hassan A, Mahmood A (2018) Convolutional recurrent deep learning model for sentence classification. IEEE Access 6:13949–13957. https://doi.org/10.1109/ACCESS.2018.2814818
Core DB (2012) Applications of text classification to enterprise support documents. UC Santa Cruz
Silva C, Lotric U, Ribeiro B (2010) Dobnikar AJIToS, Man, cybernetics PC. Distributed text classification with an ensemble kernel-based learning approach. IEEE Trans Syst Man Cybern 40:287–297
Nii M, Ando S, Takahashi Y, Uchinuno A, Sakashita R (2007) Nursing-care freestyle text classification using support vector machines. In: IEEE international conference on granular computing, 2007, GRC 2007. IEEE, New York, pp 665–665
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, 1997, pp 412–420
Kim Y (2014) Convolutional neural networks for sentence classification arXiv preprint https://arxiv.org/abs/14085882
Zhao Y, Zhang J, Li Y et al (2019) Sentiment analysis using embedding from language model and multi-scale convolutional neural network. Comput Appl 40(3):651–657
Funahashi K-I, Nakamura Y (1993) Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw 6:801–806
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint https://arxiv.org/abs/14090473
Du J, Gui L, Xu R, He YA (2017) Convolutional attention model for text classification. In: National CCF conference on natural language processing and Chinese computing, 2017. Springer, New York, pp 183–195
Zhang Y, Er MJ, Venkatesan R, Wang N, Pratama M (2016) Sentiment classification using comprehensive attention recurrent models. In: 2016 international joint conference on neural networks (IJCNN). IEEE, New York, pp 1562–1569
Remy JB, Tixier AJP, Vazirgiannis M (2019) Bidirectional context-aware hierarchical attention network for document understanding. arXiv preprint https://arxiv.org/abs/1908.06006
Shi M, Liu J (2018) Functional and contextual attention-based LSTM for service recommendation in Mashup creation. IEEE Trans Parallel Distrib Syst 30(5):1077–1090
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1480–1489
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint https://arxiv.org/abs/14123555
Xu C, Shen J, Du X, Zhang F (2018) An intrusion detection system using a deep neural network with gated recurrent units. IEEE Access 6:48697–48707. https://doi.org/10.1109/ACCESS.2018.2867564
Song Y (2018) Stock trend prediction: based on machine learning methods. UCLA
Iyyer M, Manjunatha V, Boyd-Graber J, Daumé III H (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), pp 1681–1691
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks arXiv preprint https://arxiv.org/abs/150602078
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 142–150
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2016) Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint https://arxiv.org/abs/1406.1078.
Du J, Gui L, He Y et al (2019) Convolution-based neural attention with applications to sentiment classification. IEEE Access 7:27983–27992
Acknowledgements
This work was supported by the National Statistical Science Research Project of China under Grant No. 2016LY98, the Science and Technology Department of Guangdong Province in China under Grant Nos. 2016A010101020, 2016A010101021 and 2016A010101022, the Characteristic Innovation Projects of Guangdong Colleges and Universities (Nos. 2018KTSCX049 and 2018GKTSCX069), the Bidding Project of Laboratory of Language Engineering and Computing of Guangdong University of Foreign Studies (No. LEC2019ZBKT005).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Huang, Y., Chen, J., Zheng, S. et al. Hierarchical multi-attention networks for document classification. Int. J. Mach. Learn. & Cyber. 12, 1639–1647 (2021). https://doi.org/10.1007/s13042-020-01260-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01260-x