skip to main content
research-article

HGAT: Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

Authors Info & Claims
Published:05 May 2021Publication History
Skip Abstract Section

Abstract

Short text classification has been widely explored in news tagging to provide more efficient search strategies and more effective search results for information retrieval. However, most existing studies, concentrating on long text classification, deliver unsatisfactory performance on short texts due to the sparsity issue and the insufficiency of labeled data. In this article, we propose a novel heterogeneous graph neural network-based method for semi-supervised short text classification, leveraging full advantage of limited labeled data and large unlabeled data through information propagation along the graph. Specifically, we first present a flexible heterogeneous information network (HIN) framework for modeling short texts, which can integrate any type of additional information and meanwhile capture their relations to address the semantic sparsity. Then, we propose Heterogeneous Graph Attention networks (HGAT) to embed the HIN for short text classification based on a dual-level attention mechanism, including node-level and type-level attentions. To efficiently classify new coming texts that do not previously exist in the HIN, we extend our model HGAT for inductive learning, avoiding re-training the model on the evolving HIN. Extensive experiments on single-/multi-label classification demonstrates that our proposed model HGAT significantly outperforms state-of-the-art methods across the benchmark datasets under both transductive and inductive learning.

References

  1. Charu C. Aggarwal and ChengXiang Zhai. 2012. A survey of text classification algorithms. In Mining Text Data. Springer, 163–222. DOI:https://doi.org/10.1007/978-1-4614-3223-4_6Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Faizan Ahmad, Ahmed Abbasi, Jingjing Li, David G. Dobolyi, Richard G. Netemeyer, Gari D. Clifford, and Hsinchun Chen. 2020. A deep learning architecture for psychometric natural language processing. ACM Trans. Info. Syst. 38, 1, Article 6 (Feb. 2020), 29 pages. DOI:https://doi.org/10.1145/3365211Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David Blei, Andrew Ng, and Michael Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (May 2003), 993–1022. DOI:https://doi.org/10.1162/jmlr.2003.3.4-5.993Google ScholarGoogle Scholar
  4. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of the 2nd International Conference on Learning Representations (ICLR’14), Yoshua Bengio and Yann LeCun (Eds.). OpenReview.net. Retrieved from http://arxiv.org/abs/1312.6203.Google ScholarGoogle Scholar
  5. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the Annual Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems 29, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 3837–3845. Retrieved from http://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.Google ScholarGoogle Scholar
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). Association for Computational Linguistics, 4171–4186. DOI:https://doi.org/10.18653/v1/n19-1423Google ScholarGoogle Scholar
  7. Di Yao, Jingping Bi, Jianhui Huang, and Jin Zhu. 2015. A word distributed representation-based framework for large-scale short text classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’15). IEEE, 1–7. DOI:https://doi.org/10.1109/IJCNN.2015.7280513Google ScholarGoogle Scholar
  8. Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144. DOI:https://doi.org/10.1145/3097983.3098036Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Harris Drucker, Donghui Wu, and Vladimir Vapnik. 1999. Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10, 5 (1999), 1048–1054. DOI:https://doi.org/10.1109/72.788645Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jernej Flisar and Vili Podgorelec. 2020. Improving short text classification using information from DBpedia ontology. Fundamenta Informaticae 172, 3 (Feb. 2020), 261–297. DOI:https://doi.org/10.3233/FI-2020-1905Google ScholarGoogle ScholarCross RefCross Ref
  11. Erfan Ghadery, Sajad Movahedi, Heshaam Faili, and Azadeh Shakery. 2019. MNCN: A multilingual ngram-based convolutional network for aspect category detection in online reviews. Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 2019), 6441–6448. DOI:https://doi.org/10.1609/aaai.v33i01.33016441Google ScholarGoogle ScholarCross RefCross Ref
  12. Marco Gori, Gabriele Monfardini, and Franco Scarselli. 2005. A new model for learning in graph domains. In Proceedings of the IEEE International Joint Conference on Neural Networks, Vol. 2. IEEE, 729–734. DOI:https://doi.org/10.1109/IJCNN.2005.1555942Google ScholarGoogle ScholarCross RefCross Ref
  13. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864. DOI:https://doi.org/10.1145/2939672.2939754Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Annual Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems 30. 1024–1034. Retrieved from http://papers.nips.cc/paper/6703-inductive-representation-learning-on-large-graphs.Google ScholarGoogle Scholar
  15. Ming Ji, Yizhou Sun, Marina Danilevsky, Jiawei Han, and Jing Gao. 2010. Graph regularized transductive classification on heterogeneous information networks. In Machine Learning and Knowledge Discovery in Databases. Vol. 6321. Springer, Berlin, 570–586. DOI:https://doi.org/10.1007/978-3-642-15880-3_42Google ScholarGoogle Scholar
  16. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1746–1751. DOI:https://doi.org/10.3115/v1/d14-1181Google ScholarGoogle ScholarCross RefCross Ref
  17. Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net. Retrieved from https://openreview.net/forum?id=SJU4ayYgl.Google ScholarGoogle Scholar
  18. Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of Machine Learning Research, Vol. 32. PMLR, 1188–1196. Retrieved from http://proceedings.mlr.press/v32/le14.html.Google ScholarGoogle Scholar
  19. Chenliang Li, Shiqian Chen, Jian Xing, Aixin Sun, and Zongyang Ma. 2018. Seed-guided topic model for document filtering and classification. ACM Trans. Info. Syst. 37, 1, Article Article 9 (Dec. 2018), 37 pages. DOI:https://doi.org/10.1145/3238250Google ScholarGoogle Scholar
  20. Chenliang Li, Yu Duan, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2017. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Info. Syst. 36, 2, Article 11 (Aug. 2017), 30 pages. DOI:https://doi.org/10.1145/3091108Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. 2019. Heterogeneous graph attention networks for semi-supervised short text classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/d19-1488Google ScholarGoogle ScholarCross RefCross Ref
  22. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 2873–2879. Google ScholarGoogle Scholar
  23. Yu Meng, Jiaming Shen, Chao Zhang, and Jiawei Han. 2018. Weakly-supervised neural text classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM’18). Association for Computing Machinery, 983–992. DOI:https://doi.org/10.1145/3269206.3271737Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liqiang Nie, Yongqi Li, Fuli Feng, Xuemeng Song, Meng Wang, and Yinglong Wang. 2020. Large-scale question tagging via joint question-topic embedding learning. ACM Trans. Info. Syst. 38, 2 (2020). DOI:https://doi.org/10.1145/3380954Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (ACL’05), Kevin Knight, Hwee Tou Ng, and Kemal Oflazer (Eds.). Association for Computer Linguistics, 115–124. Retrieved from https://www.aclweb.org/anthology/P05-1015/.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xuan-Hieu Phan, Le-Minh Nguyen, and Susumu Horiguchi. 2008. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM Press. DOI:https://doi.org/10.1145/1367497.1367510Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rafael Geraldeli Rossi, Alneu de Andrade Lopes, and Solange Oliveira Rezende. 2016. Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Info. Process. Manage. 52, 2 (Mar. 2016), 217–257. DOI:https://doi.org/10.1016/j.ipm.2015.07.004Google ScholarGoogle Scholar
  28. François Rousseau, Emmanouil Kiagias, and Michalis Vazirgiannis. 2015. Text categorization as a graph classification problem. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15). The Association for Computer Linguistics, 1702–1712. DOI:https://doi.org/10.3115/v1/p15-1164Google ScholarGoogle Scholar
  29. Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Proceedings of the Annual Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems 30. Curran Associates, 3856–3866. Retrieved from http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.Google ScholarGoogle Scholar
  30. Franco Scarselli, Marco Gori, Ah Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Trans. Neural Netw. 20 (Jan. 2009), 61–80. DOI:https://doi.org/10.1109/TNN.2008.2005605Google ScholarGoogle Scholar
  31. Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. Comput. Surveys 34, 1 (Mar. 2002), 1–47. DOI:https://doi.org/10.1145/505282.505283Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kazuya Shimura, Jiyi Li, and Fumiyo Fukumoto. 2018. HFT-CNN: Learning hierarchical category structure for multi-label short text categorization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 811–816. DOI:https://doi.org/10.18653/v1/d18-1093Google ScholarGoogle ScholarCross RefCross Ref
  33. Joao Silva, Luisa Coheur, Ana Cristina Mendes, and Andreas Wichert. 2011. From symbolic to sub-symbolic information in question classification. Artific. Intell. Rev. 35, 2 (Feb. 2011), 137–154. DOI:https://doi.org/10.1007/s10462-010-9188-4Google ScholarGoogle Scholar
  34. Koustuv Sinha, Yue Dong, Jackie Chi Kit Cheung, and Derek Ruths. 2018. A hierarchical neural attention-based text classifier. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 817–823. DOI:https://doi.org/10.18653/v1/d18-1094Google ScholarGoogle ScholarCross RefCross Ref
  35. Ge Song, Yunming Ye, Xiaolin Du, Xiaohui Huang, and Shifu Bie. 2014. Short text classification: A survey. J. Multimedia 9, 5 (May 2014), 635. DOI:https://doi.org/10.4304/jmm.9.5.635-643Google ScholarGoogle ScholarCross RefCross Ref
  36. Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. PTE: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams (Eds.). ACM, 1165–1174. DOI:https://doi.org/10.1145/2783258.2783307Google ScholarGoogle Scholar
  37. Jesper E. Van Engelen and Holger H. Hoos. 2020. A survey on semi-supervised learning. Mach. Learn. 109, 2 (Feb. 2020), 373–440. DOI:https://doi.org/10.1007/s10994-019-05855-6Google ScholarGoogle ScholarCross RefCross Ref
  38. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems 30. 5998–6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.Google ScholarGoogle Scholar
  39. Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net. Retrieved from https://openreview.net/forum?id=rJXMpikCZ.Google ScholarGoogle Scholar
  40. Daniele Vitale, Paolo Ferragina, and Ugo Scaiella. 2012. Classification of short texts by deploying topical annotations. In Lecture Notes in Computer Science. Springer, Berlin, 376–387. DOI:https://doi.org/10.1007/978-3-642-28997-2_32Google ScholarGoogle Scholar
  41. Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han. 2016. Text classification with heterogeneous information network kernels. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 2130–2136. Retrieved from http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12392.Google ScholarGoogle Scholar
  42. Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017. Combining knowledge with deep convolutional neural networks for short text classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Vol. 350. International Joint Conferences on Artificial Intelligence Organization. DOI:https://doi.org/10.24963/ijcai.2017/406Google ScholarGoogle ScholarCross RefCross Ref
  43. Pu Wang and Carlotta Domeniconi. 2008. Building semantic kernels for text classification using wikipedia. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM Press, Las Vegas, NV, 713. DOI:https://doi.org/10.1145/1401890.1401976Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sida I. Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. The Association for Computer Linguistics, 90–94. Retrieved from https://www.aclweb.org/anthology/P12-2018/.Google ScholarGoogle Scholar
  45. Xiang Wang, Ruhua Chen, Yan Jia, and Bin Zhou. 2013. Short text classification using Wikipedia concept-based document representation. In Proceedings of the International Conference on Information Technology and Applications. IEEE, 471–474. DOI:https://doi.org/10.1109/ita.2013.114Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In Proceedings of the World Wide Web Conference (WWW’19). ACM Press. DOI:https://doi.org/10.1145/3308558.3313562Google ScholarGoogle Scholar
  47. Xiaolong Wang, Yufei Ye, and Abhinav Gupta. 2018. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, 6857–6866. DOI:https://doi.org/10.1109/CVPR.2018.00717Google ScholarGoogle ScholarCross RefCross Ref
  48. Jingyun Xu, Yi Cai, Xin Wu, Xue Lei, Qingbao Huang, Ho-fung Leung, and Qing Li. 2020. Incorporating context-relevant concepts into convolutional neural networks for short text classification. Neurocomputing 386 (Apr. 2020), 42–53. DOI:https://doi.org/10.1016/j.neucom.2019.08.080Google ScholarGoogle Scholar
  49. Min Yang, Wei Zhao, Jianbo Ye, Zeyang Lei, Zhou Zhao, and Soufei Zhang. 2018. Investigating capsule networks with dynamic routing for text classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 3110–3119. DOI:https://doi.org/10.18653/v1/D18-1350Google ScholarGoogle ScholarCross RefCross Ref
  50. Yiming Yang and Christopher G. Chute. 1994. An example-based mapping method for text categorization and retrieval. ACM Trans. Info. Syst. 12, 3 (July 1994), 252–277. DOI:https://doi.org/10.1145/183422.183424Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Graph convolutional networks for text classification. Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 2019), 7370–7377. DOI:https://doi.org/10.1609/aaai.v33i01.33017370Google ScholarGoogle ScholarCross RefCross Ref
  52. Chunyong Yin, Jun Xiang, Hui Zhang, Jin Wang, Zhichao Yin, and Jeong-Uk Kim. 2015. A new SVM method for short text classification based on semi-supervised learning. In Proceedings of the 4th International Conference on Advanced Information Technology and Sensor Application (AITS’15). IEEE, 100–103. DOI:https://doi.org/10.1109/aits.2015.34Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jichuan Zeng, Jing Li, Yan Song, Cuiyun Gao, Michael R. Lyu, and Irwin King. 2018. Topic memory networks for short text classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3120–3131. DOI:https://doi.org/10.18653/v1/d18-1351Google ScholarGoogle ScholarCross RefCross Ref
  54. Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Proceedings of the Annual Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems 28. 649–657. Retrieved from http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.Google ScholarGoogle Scholar
  55. Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Advances in Neural Information Processing Systems 16. MIT Press, 321–328. Retrieved from http://papers.nips.cc/paper/2506-learning-with-local-and-global-consistency.pdf.Google ScholarGoogle Scholar
  56. Guang-You Zhou and Jimmy Xiangji Huang. 2017. Modeling and mining domain shared knowledge for sentiment analysis. ACM Trans. Info. Syst. 36, 2, Article 18 (Aug. 2017), 36 pages. DOI:https://doi.org/10.1145/3091995Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International Conference on International Conference on Machine Learning (ICML’03). AAAI Press, 912–919. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HGAT: Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Information Systems
            ACM Transactions on Information Systems  Volume 39, Issue 3
            July 2021
            432 pages
            ISSN:1046-8188
            EISSN:1558-2868
            DOI:10.1145/3450607
            Issue’s Table of Contents

            Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 May 2021
            • Accepted: 1 February 2021
            • Revised: 1 January 2021
            • Received: 1 May 2020
            Published in tois Volume 39, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format