Skip to main content
Log in

Hierarchical multi-attention networks for document classification

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Research of document classification is ongoing to employ the attention based-deep learning algorithms and achieves impressive results. Owing to the complexity of the document, classical models, as well as single attention mechanism, fail to meet the demand of high-accuracy classification. This paper proposes a method that classifies the document via the hierarchical multi-attention networks, which describes the document from the word-sentence level and the sentence-document level. Further, different attention strategies are performed on different levels, which enables accurate assigning of the attention weight. Specifically, the soft attention mechanism is applied to the word-sentence level while the CNN-attention to the sentence-document level. Due to the distinctiveness of the model, the proposed method delivers the highest accuracy compared to other state-of-the-art methods. In addition, the attention weight visualization outcomes present the effectiveness of attention mechanism in distinguishing the importance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Chambers A (2013) Statistical models for text classification and clustering: applications and analysis. Dissertation. University of California, Irvine

    Google Scholar 

  2. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407

    Article  Google Scholar 

  3. Pang B, Lee LJF (2008) Retrieval TiI. Opin Min Sentiment Anal 2:1–135

    Google Scholar 

  4. Longpre S, Pradhan S, Xiong C, Socher RJapa (2016) A way out of the odyssey: analyzing and combining recent insights for LSTMs

  5. Sarioglu ES (2014) Effective classification of clinical reports: natural language processing-based and topic modeling-based approaches. The George Washington University

  6. Hassan A, Mahmood A (2018) Convolutional recurrent deep learning model for sentence classification. IEEE Access 6:13949–13957. https://doi.org/10.1109/ACCESS.2018.2814818

    Article  Google Scholar 

  7. Core DB (2012) Applications of text classification to enterprise support documents. UC Santa Cruz

  8. Silva C, Lotric U, Ribeiro B (2010) Dobnikar AJIToS, Man, cybernetics PC. Distributed text classification with an ensemble kernel-based learning approach. IEEE Trans Syst Man Cybern 40:287–297

    Article  Google Scholar 

  9. Nii M, Ando S, Takahashi Y, Uchinuno A, Sakashita R (2007) Nursing-care freestyle text classification using support vector machines. In: IEEE international conference on granular computing, 2007, GRC 2007. IEEE, New York, pp 665–665

  10. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, 1997, pp 412–420

  11. Kim Y (2014) Convolutional neural networks for sentence classification arXiv preprint https://arxiv.org/abs/14085882

  12. Zhao Y, Zhang J, Li Y et al (2019) Sentiment analysis using embedding from language model and multi-scale convolutional neural network. Comput Appl 40(3):651–657

    Google Scholar 

  13. Funahashi K-I, Nakamura Y (1993) Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw 6:801–806

    Article  Google Scholar 

  14. Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212

  15. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint https://arxiv.org/abs/14090473

  16. Du J, Gui L, Xu R, He YA (2017) Convolutional attention model for text classification. In: National CCF conference on natural language processing and Chinese computing, 2017. Springer, New York, pp 183–195

  17. Zhang Y, Er MJ, Venkatesan R, Wang N, Pratama M (2016) Sentiment classification using comprehensive attention recurrent models. In: 2016 international joint conference on neural networks (IJCNN). IEEE, New York, pp 1562–1569

  18. Remy JB, Tixier AJP, Vazirgiannis M (2019) Bidirectional context-aware hierarchical attention network for document understanding. arXiv preprint https://arxiv.org/abs/1908.06006

  19. Shi M, Liu J (2018) Functional and contextual attention-based LSTM for service recommendation in Mashup creation. IEEE Trans Parallel Distrib Syst 30(5):1077–1090

    Article  Google Scholar 

  20. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1480–1489

  21. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint https://arxiv.org/abs/14123555

  22. Xu C, Shen J, Du X, Zhang F (2018) An intrusion detection system using a deep neural network with gated recurrent units. IEEE Access 6:48697–48707. https://doi.org/10.1109/ACCESS.2018.2867564

    Article  Google Scholar 

  23. Song Y (2018) Stock trend prediction: based on machine learning methods. UCLA

  24. Iyyer M, Manjunatha V, Boyd-Graber J, Daumé III H (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), pp 1681–1691

  25. Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks arXiv preprint https://arxiv.org/abs/150602078

  26. Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 142–150

  27. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2016) Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint https://arxiv.org/abs/1406.1078.

  28. Du J, Gui L, He Y et al (2019) Convolution-based neural attention with applications to sentiment classification. IEEE Access 7:27983–27992

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Statistical Science Research Project of China under Grant No. 2016LY98, the Science and Technology Department of Guangdong Province in China under Grant Nos. 2016A010101020, 2016A010101021 and 2016A010101022, the Characteristic Innovation Projects of Guangdong Colleges and Universities (Nos. 2018KTSCX049 and 2018GKTSCX069), the Bidding Project of Laboratory of Language Engineering and Computing of Guangdong University of Foreign Studies (No. LEC2019ZBKT005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xue.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Chen, J., Zheng, S. et al. Hierarchical multi-attention networks for document classification. Int. J. Mach. Learn. & Cyber. 12, 1639–1647 (2021). https://doi.org/10.1007/s13042-020-01260-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01260-x

Keywords

Navigation