Abstract
The real world involves many graphs and networks that are essentially heterogeneous, in which various types of relations connect multiple types of vertices. With the development of information networks, node features can be described by data of different modalities, resulting in multimodal heterogeneous graphs. However, most existed methods can only handle unimodal heterogeneous graphs. Moreover, most existing heterogeneous graph mining methods are based on meta-paths that depend on domain experts for modeling. In this paper, we propose a novel multimodal heterogeneous graph attention network (MHGAT) to address these problems. Specifically, we exploit edge-level aggregation to capture graph heterogeneity information to achieve more informative representations adaptively. Further, we use the modality-level attention mechanism to obtain multimodal fusion information. Because plain graph convolutional networks can not capture higher-order neighborhood information, we utilize the residual connection and the dense connection access to obtain it. Extensive experimental results show that the MHGAT outperforms state-of-the-art baselines on three datasets for node classification, clustering, and visualization tasks.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availibility
The datasets that support the findings of this study are available in https://github.com/jiaxiangen/MHGAT.
References
Abu-El-Haija S, Perozzi B, Kapoor A, Alipourfard N, Lerman K, Harutyunyan H, Ver Steeg G, Galstyan A(2019) Mixhop: higher-order graph convolutional architectures via sparsified neighborhood mixing. In: 36th international conference on machine learning, ICML 2019, vol. 2019, pp. 32–41
Baltrusaitis T, Ahuja C, Morency LP (2019) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Chen J, Zhang A (2020) Hgmf: heterogeneous graph-based fusion for multimodal data with incompleteness. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1295–1305
Chen Y, Yuan J, You Q, Luo J (2018) Twitter sentiment analysis via bi-sense emoji embedding and attention-based lstm. In: Proceedings of the 26th ACM international conference on multimedia, MM ’18, Association for Computing Machinery, New York, pp. 117-125,
Defferrard M, Bresson X, Vandergheynst Pierre (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 59:3844–3852
Dong Y, Chawla NV, Swami A (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Part F 1296:135–144
Fu TY, Lee WC, Lei Z (2017) HIN2Vec: Explore meta-paths in heterogeneous information networks for representation learning. In: International conference on information and knowledge management, proceedings, vol. Part F1318, pp 1797–1806
Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, vol. 13–17, pp. 855–864
Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inf Process Syst 2017:1025–1035
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Hong H, Guo H, Lin Y, Yang X, Li Z, Ye J (2020) An attention-based graph neural network for heterogeneous structural learning. In: AAAI, pp. 4132–4139
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol. 2017, pp. 2261–2269
Jing Y, Yang Y, Wang X, Song M, Tao D (2021) Amalgamating knowledge from heterogeneous graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15709–15718
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations
Kiros R, Salakhutdinov R, Zemel RS (2014) Unifying visual-semantic embeddings with multimodal neural language models. CoRR arXiv:1411.2539
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on international conference on machine learning - Vol. 32, pp. 1188-1196, AAAA.org
Li Q, Han Z, Wu XM (2018) Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the 32nd AAAI conference on artificial intelligence, AAAI ’18, pp. 3538–3545, AAAI Press
Luan S, Zhao M, Chang XW, Precup D (2019) Break the ceiling: stronger multi-scale deep graph convolutional networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc, Red Hook, pp 10945–10955
Luo L, Fang Y, Cao X, Zhang X, Zhang W (2021) Detecting communities from heterogeneous graphs: a context path-based graph neural network model. In: Proceedings of the 30th ACM international conference on information & knowledge management, pp. 1170–1180
Lv Q, Ding M, Liu Q, Chen Y, Feng W, He S, Zhou C, Jiang J, Dong Y, Tang J (2021) Are we really making much progress? revisiting, benchmarking and refining heterogeneous graph neural networks. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp. 1150–1160
Mroueh Y, Marcheret E, Goel V (2015) Deep multimodal learning for audio-visual speech recognition. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2130–2134
Parisot S, Ktena Sofia I, Ferrante E, Lee M, Guerrero R, Glocker B, Rueckert D (2018) Disease prediction using graph convolutional networks: application to autism spectrum disorder and alzheimer’s disease. Med Image Anal 48:117–130
Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: Online learning of social representations. in: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp. 701–710
Ragesh R, Sellamanickam S, Iyer A, Bairi R, Lingam V (2021) Hetegcn: heterogeneous graph convolutional networks for text classification. In: Proceedings of the 14th ACM international conference on web search and data mining, pp. 860–868
Sak H, Senior A, Rao K, Beaufays F (2015) Fast and accurate recurrent neural network acoustic models for speech recognition. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, vol. 2015, pp. 1468–1472
Shi C, Hu B, Zhao WX, Yu PS (2019) Heterogeneous information network embedding for recommendation. IEEE Trans Knowl Data Eng 31(2):357–370
Shi C, Li Y, Zhang J, Sun Y, Yu PS (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37
Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. In: 52nd annual meeting of the association for computational linguistics, ACL 2014 - proceedings of the conference, vol. 1, pp. 721–732
Song K, Zhang Y, Wang X, Zuo J (2019) Representation learning for heterogeneous network with multiple link attributes. In: ACM unternational conference proceeding series, pp. 1358–1368
Srivastava N, Salakhutdinov R (2014) Multimodal learning with deep Boltzmann machines. J Mach Learn Res 15:2949–2980
Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, Sheng QZ, Yu PS (2022) A comprehensive survey on community detection with deep learning. IEEE Trans Neural Netw Learn Syst pp. 1–21. https://ieeexplore.ieee.org/document/9732192
Tang Q, Qu J, Wang M, Zhang M, Yan M, Mei J (2015) LINE: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 1067–1077
van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
Veličković P, Casanova A, Liò P, Cucurull G, Romero A, Bengio Y (2018) Graph attention networks. In: 6th international conference on learning representations, ICLR 2018 - conference track proceedings, arXiv: 1710.10903
Wang J, Jun H, Qian S, Fang Q, Changsheng X (2020) Multimodal graph convolutional networks for high quality content recognition. Neurocomputing 412:42–51
Wang X, Ji H, Cui P, Yu P, Shi C, Wang B, Ye Y (2019) Heterogeneous graph attention network. In: The web conference 2019 - proceedings of the World Wide Web conference, WWW 2019, pp. 2022–2032
Wang X, Zhu M, Bo D, Cui P, Shi C, Pei J (2020) AM-GCN: adaptive multi-channel graph convolutional networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1243–1253
Wei Y, He X, Wang X, Hong R, Nie L, Chua TS (2019) MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video. In: MM 2019 - proceedings of the 27th ACM international conference on multimedia, pp. 1437–1445
Wu J, Li B, Qin Y, Ni W, Zhang H, Fu R, Sun Y (2021) A multiscale graph convolutional network for change detection in homogeneous and heterogeneous remote sensing images. Int J Appl Earth Obs Geoinf 105:102615
Xie Y, Yao C, Gong M, Chen C, Qin AK (2020) Graph convolutional networks with multi-level coarsening for graph classification. Knowl-Based Syst 194:105578
Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence vol. 33, pp. 7370–7377
You J, Ying R, Leskovec J (2019) Position-aware graph neural networks. In: Kamalika C, Ruslan S (eds) Proceedings of the 36th international conference on machine learning, vol. 97 of Proceedings of machine learning research, Long Beach, California, USA, pp. 7134–7143
Zhang J, Lu CT, Zhou M, Xie S, Chang Y, Yu Philip S (2016) HEER: heterogeneous graph embedding for emerging relation detection from news. In: Proceedings - 2016 IEEE international conference on big data, big data 2016, IEEE, pp. 803–812
Zhang Z, Cui P, Zhu W (2020) Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng 34:249–270
Zhou J, Huang JX, Hu QV, He L (2020) SK-GCN: modeling syntax and knowledge via graph convolutional network for aspect-level sentiment classification. Knowl-Based Syst 205:106292
Acknowledgments
This work was supported in part by Zhejiang NSF Grants No. LY20F020009, China NSF Grant No. 61572266 and No.61602133, Ningbo NSF Grants No.202003N4086, as well as programs sponsored by K.C. Wong Magna Fund in Ningbo University. (Corresponding author: Yihong Dong.)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jia, X., Jiang, M., Dong, Y. et al. Multimodal heterogeneous graph attention network. Neural Comput & Applic 35, 3357–3372 (2023). https://doi.org/10.1007/s00521-022-07862-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07862-6