Skip to main content
Log in

Multimodal heterogeneous graph attention network

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The real world involves many graphs and networks that are essentially heterogeneous, in which various types of relations connect multiple types of vertices. With the development of information networks, node features can be described by data of different modalities, resulting in multimodal heterogeneous graphs. However, most existed methods can only handle unimodal heterogeneous graphs. Moreover, most existing heterogeneous graph mining methods are based on meta-paths that depend on domain experts for modeling. In this paper, we propose a novel multimodal heterogeneous graph attention network (MHGAT) to address these problems. Specifically, we exploit edge-level aggregation to capture graph heterogeneity information to achieve more informative representations adaptively. Further, we use the modality-level attention mechanism to obtain multimodal fusion information. Because plain graph convolutional networks can not capture higher-order neighborhood information, we utilize the residual connection and the dense connection access to obtain it. Extensive experimental results show that the MHGAT outperforms state-of-the-art baselines on three datasets for node classification, clustering, and visualization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availibility

The datasets that support the findings of this study are available in https://github.com/jiaxiangen/MHGAT.

Notes

  1. https://www.imdb.com.

  2. http://deepyeti.ucsd.edu/jianmo/amazon.

  3. https://movie.douban.com.

References

  1. Abu-El-Haija S, Perozzi B, Kapoor A, Alipourfard N, Lerman K, Harutyunyan H, Ver Steeg G, Galstyan A(2019) Mixhop: higher-order graph convolutional architectures via sparsified neighborhood mixing. In: 36th international conference on machine learning, ICML 2019, vol. 2019, pp. 32–41

  2. Baltrusaitis T, Ahuja C, Morency LP (2019) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443

    Article  Google Scholar 

  3. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  4. Chen J, Zhang A (2020) Hgmf: heterogeneous graph-based fusion for multimodal data with incompleteness. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1295–1305

  5. Chen Y, Yuan J, You Q, Luo J (2018) Twitter sentiment analysis via bi-sense emoji embedding and attention-based lstm. In: Proceedings of the 26th ACM international conference on multimedia, MM ’18, Association for Computing Machinery, New York, pp. 117-125,

  6. Defferrard M, Bresson X, Vandergheynst Pierre (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 59:3844–3852

    Google Scholar 

  7. Dong Y, Chawla NV, Swami A (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Part F 1296:135–144

  8. Fu TY, Lee WC, Lei Z (2017) HIN2Vec: Explore meta-paths in heterogeneous information networks for representation learning. In: International conference on information and knowledge management, proceedings, vol. Part F1318, pp 1797–1806

  9. Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, vol. 13–17, pp. 855–864

  10. Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inf Process Syst 2017:1025–1035

    Google Scholar 

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  12. Hong H, Guo H, Lin Y, Yang X, Li Z, Ye J (2020) An attention-based graph neural network for heterogeneous structural learning. In: AAAI, pp. 4132–4139

  13. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol. 2017, pp. 2261–2269

  14. Jing Y, Yang Y, Wang X, Song M, Tao D (2021) Amalgamating knowledge from heterogeneous graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15709–15718

  15. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations

  16. Kiros R, Salakhutdinov R, Zemel RS (2014) Unifying visual-semantic embeddings with multimodal neural language models. CoRR arXiv:1411.2539

  17. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on international conference on machine learning - Vol. 32, pp. 1188-1196, AAAA.org

  18. Li Q, Han Z, Wu XM (2018) Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the 32nd AAAI conference on artificial intelligence, AAAI ’18, pp. 3538–3545, AAAI Press

  19. Luan S, Zhao M, Chang XW, Precup D (2019) Break the ceiling: stronger multi-scale deep graph convolutional networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc, Red Hook, pp 10945–10955

  20. Luo L, Fang Y, Cao X, Zhang X, Zhang W (2021) Detecting communities from heterogeneous graphs: a context path-based graph neural network model. In: Proceedings of the 30th ACM international conference on information & knowledge management, pp. 1170–1180

  21. Lv Q, Ding M, Liu Q, Chen Y, Feng W, He S, Zhou C, Jiang J, Dong Y, Tang J (2021) Are we really making much progress? revisiting, benchmarking and refining heterogeneous graph neural networks. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp. 1150–1160

  22. Mroueh Y, Marcheret E, Goel V (2015) Deep multimodal learning for audio-visual speech recognition. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2130–2134

  23. Parisot S, Ktena Sofia I, Ferrante E, Lee M, Guerrero R, Glocker B, Rueckert D (2018) Disease prediction using graph convolutional networks: application to autism spectrum disorder and alzheimer’s disease. Med Image Anal 48:117–130

    Article  Google Scholar 

  24. Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: Online learning of social representations. in: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp. 701–710

  25. Ragesh R, Sellamanickam S, Iyer A, Bairi R, Lingam V (2021) Hetegcn: heterogeneous graph convolutional networks for text classification. In: Proceedings of the 14th ACM international conference on web search and data mining, pp. 860–868

  26. Sak H, Senior A, Rao K, Beaufays F (2015) Fast and accurate recurrent neural network acoustic models for speech recognition. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, vol. 2015, pp. 1468–1472

  27. Shi C, Hu B, Zhao WX, Yu PS (2019) Heterogeneous information network embedding for recommendation. IEEE Trans Knowl Data Eng 31(2):357–370

    Article  Google Scholar 

  28. Shi C, Li Y, Zhang J, Sun Y, Yu PS (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37

    Article  Google Scholar 

  29. Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. In: 52nd annual meeting of the association for computational linguistics, ACL 2014 - proceedings of the conference, vol. 1, pp. 721–732

  30. Song K, Zhang Y, Wang X, Zuo J (2019) Representation learning for heterogeneous network with multiple link attributes. In: ACM unternational conference proceeding series, pp. 1358–1368

  31. Srivastava N, Salakhutdinov R (2014) Multimodal learning with deep Boltzmann machines. J Mach Learn Res 15:2949–2980

    MATH  Google Scholar 

  32. Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, Sheng QZ, Yu PS (2022) A comprehensive survey on community detection with deep learning. IEEE Trans Neural Netw Learn Syst pp. 1–21. https://ieeexplore.ieee.org/document/9732192

  33. Tang Q, Qu J, Wang M, Zhang M, Yan M, Mei J (2015) LINE: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 1067–1077

  34. van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605

    MATH  Google Scholar 

  35. Veličković P, Casanova A, Liò P, Cucurull G, Romero A, Bengio Y (2018) Graph attention networks. In: 6th international conference on learning representations, ICLR 2018 - conference track proceedings, arXiv: 1710.10903

  36. Wang J, Jun H, Qian S, Fang Q, Changsheng X (2020) Multimodal graph convolutional networks for high quality content recognition. Neurocomputing 412:42–51

    Article  Google Scholar 

  37. Wang X, Ji H, Cui P, Yu P, Shi C, Wang B, Ye Y (2019) Heterogeneous graph attention network. In: The web conference 2019 - proceedings of the World Wide Web conference, WWW 2019, pp. 2022–2032

  38. Wang X, Zhu M, Bo D, Cui P, Shi C, Pei J (2020) AM-GCN: adaptive multi-channel graph convolutional networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1243–1253

  39. Wei Y, He X, Wang X, Hong R, Nie L, Chua TS (2019) MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video. In: MM 2019 - proceedings of the 27th ACM international conference on multimedia, pp. 1437–1445

  40. Wu J, Li B, Qin Y, Ni W, Zhang H, Fu R, Sun Y (2021) A multiscale graph convolutional network for change detection in homogeneous and heterogeneous remote sensing images. Int J Appl Earth Obs Geoinf 105:102615

    Google Scholar 

  41. Xie Y, Yao C, Gong M, Chen C, Qin AK (2020) Graph convolutional networks with multi-level coarsening for graph classification. Knowl-Based Syst 194:105578

    Article  Google Scholar 

  42. Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence vol. 33, pp. 7370–7377

  43. You J, Ying R, Leskovec J (2019) Position-aware graph neural networks. In: Kamalika C, Ruslan S (eds) Proceedings of the 36th international conference on machine learning, vol. 97 of Proceedings of machine learning research, Long Beach, California, USA, pp. 7134–7143

  44. Zhang J, Lu CT, Zhou M, Xie S, Chang Y, Yu Philip S (2016) HEER: heterogeneous graph embedding for emerging relation detection from news. In: Proceedings - 2016 IEEE international conference on big data, big data 2016, IEEE, pp. 803–812

  45. Zhang Z, Cui P, Zhu W (2020) Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng 34:249–270

    Article  Google Scholar 

  46. Zhou J, Huang JX, Hu QV, He L (2020) SK-GCN: modeling syntax and knowledge via graph convolutional network for aspect-level sentiment classification. Knowl-Based Syst 205:106292

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by Zhejiang NSF Grants No. LY20F020009, China NSF Grant No. 61572266 and No.61602133, Ningbo NSF Grants No.202003N4086, as well as programs sponsored by K.C. Wong Magna Fund in Ningbo University. (Corresponding author: Yihong Dong.)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihong Dong.

Ethics declarations

Conflict of interest

The authors have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, X., Jiang, M., Dong, Y. et al. Multimodal heterogeneous graph attention network. Neural Comput & Applic 35, 3357–3372 (2023). https://doi.org/10.1007/s00521-022-07862-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07862-6

Keywords

Navigation