skip to main content
research-article

HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph Completion

Published: 29 June 2024 Publication History

Abstract

Recent years have witnessed the successful application of knowledge graph techniques in structured data processing, while how to incorporate knowledge from visual and textual modalities into knowledge graphs has been given less attention. To better organize them, Multimodal Knowledge Graphs (MKGs), comprising the structural triplets of traditional Knowledge Graphs (KGs) together with entity-related multimodal data (e.g., images and texts), have been introduced consecutively. However, it is still a great challenge to explore MKGs due to their inherent incompleteness. Although most existing Multimodal Knowledge Graph Completion (MKGC) approaches can infer missing triplets based on available factual triplets and multimodal information, they almost ignore the modal conflicts and supervisory effect, failing to achieve a more comprehensive understanding of entities. To address these issues, we propose a novel Hierarchical Knowledge Alignment (HKA) framework for MKGC. Specifically, a macro-knowledge alignment module is proposed to capture global semantic relevance between modalities for dealing with modal conflicts in MKG. Furthermore, a micro-knowledge alignment module is also developed to reveal the local consistency information through inter- and intra-modality supervisory effects more effectively. By integrating different modal predictions, a final decision can be made. Experimental results on three benchmark MKGC tasks have demonstrated the effectiveness of the proposed HKA framework.

References

[1]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems 26 (2013), 2787–2795.
[2]
Feihu Che, Dawei Zhang, Jianhua Tao, Mingyue Niu, and Bocheng Zhao. 2020. Parame: Regarding neural network parameters as relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 2774–2781.
[3]
Liyi Chen, Zhi Li, Tong Xu, Han Wu, Zhefeng Wang, Nicholas Jing Yuan, and Enhong Chen. 2022. Multi-modal Siamese network for entity alignment. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 118–126.
[4]
Xiaojun Chen, Shengbin Jia, and Yang Xiang. 2020. A review: Knowledge reasoning over knowledge graph. Expert Systems with Applications 141 (2020), 112948.
[5]
Xiang Chen, Ningyu Zhang, Lei Li, Shumin Deng, Chuanqi Tan, Changliang Xu, Fei Huang, Luo Si, and Huajun Chen. 2022. Hybrid transformer with multi-level fusion for multimodal knowledge graph completion. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 904–915.
[6]
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[7]
Laura Dietz, Alexander Kotov, and Edgar Meij. 2018. Utilizing knowledge graphs for text-centric information retrieval. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1387–1390.
[8]
Takuma Ebisu and Ryutaro Ichise. 2018. Toruse: Knowledge graph embedding on a lie group. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[9]
Jeffrey L. Elman. 1990. Finding structure in time. Cognitive Science 14, 2 (1990), 179–211.
[10]
Hao Guo, Jiuyang Tang, Weixin Zeng, Xiang Zhao, and Li Liu. 2021. Multi-modal entity alignment in hyperbolic space. Neurocomputing 461 (2021), 598–607.
[11]
Lingbing Guo, Zequn Sun, and Wei Hu. 2019. Learning to exploit long-term relational dependencies in knowledge graphs. In International Conference on Machine Learning. PMLR, 2505–2514.
[12]
Bei Hui, Lizong Zhang, Xue Zhou, Xiao Wen, and Yuhui Nian. 2022. Personalized recommendation system based on knowledge embedding and historical behavior. Applied Intelligence 52 (2022), 1–13.
[13]
Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 687–696.
[14]
Seyed Mehran Kazemi and David Poole. 2018. Simple embedding for link prediction in knowledge graphs. Advances in Neural Information Processing Systems 31 (2018), 4289–4300.
[15]
Bosung Kim, Taesuk Hong, Youngjoong Ko, and Jungyun Seo. 2020. Multi-task learning for knowledge graph completion with pre-trained language models. In Proceedings of the 28th International Conference on Computational Linguistics. 1737–1743.
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 84–90.
[17]
Gen Li, Nan Duan, Yuejian Fang, Ming Gong, and Daxin Jiang. 2020. Unicoder-VL: A universal encoder for vision and language by cross-modal pre-training. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11336–11344.
[18]
Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2020. What does BERT with vision look at? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5265–5275.
[19]
Xinhang Li, Xiangyu Zhao, Jiaxing Xu, Yong Zhang, and Chunxiao Xing. 2023. IMF: Interactive multimodal fusion model for link prediction. In Proceedings of the ACM Web Conference 2023. 2572–2580.
[20]
Shuang Liang, Anjie Zhu, Jiasheng Zhang, and Jie Shao. 2023. Hyper-node relational graph attention network for multi-modal knowledge graph completion. ACM Transactions on Multimedia Computing, Communications and Applications 19, 2 (2023), 1–21.
[21]
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.
[22]
Hanxiao Liu, Yuexin Wu, and Yiming Yang. 2017. Analogical inference for multi-relational embeddings. In International Conference on Machine Learning. PMLR, 2168–2178.
[23]
Ye Liu, Hui Li, Alberto Garcia-Duran, Mathias Niepert, Daniel Onoro-Rubio, and David S. Rosenblum. 2019. MMKG: Multi-modal knowledge graphs. In Proceedings of the 16th International Conference on the Semantic Web (ESWC’19). Springer, 459–474.
[24]
Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Entity-duet neural ranking: Understanding the role of knowledge graph semantics in neural information retrieval. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2395–2405.
[25]
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. Advances in Neural Information Processing Systems 27 (2014), 2204–2212.
[26]
Wenxin Ni, Qianqian Xu, Yangbangyan Jiang, Zongsheng Cao, Xiaochun Cao, and Qingming Huang. 2023. PSNEA: Pseudo-Siamese network for entity alignment between multi-modal knowledge graphs. In Proceedings of the 31st ACM International Conference on Multimedia. 3489–3497.
[27]
Maximilian Nickel, Volker Tresp, Hans-Peter Kriegel, et al. 2011. A three-way model for collective learning on multi-relational data. In International Conference on Machine Learning, Vol. 11. 3104482–3104584.
[28]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.
[29]
Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. Advances in Neural Information Processing Systems 30 (2017), 3859–3869.
[30]
Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In Proceedings of the 15th International Conference on the Semantic Web (ESWC’18). Springer, 593–607.
[31]
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web. 697–706.
[32]
Zhiqing Sun, Zhihong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations,International Conference on Learning Representations.
[33]
Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning 2016(JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 2071–2080. http://proceedings.mlr.press/v48/trouillon16.html
[34]
Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung, et al. 2019. A capsule network-based embedding model for knowledge graph completion and search personalization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 2180–2189.
[35]
Bo Wang, Tao Shen, Guodong Long, Tianyi Zhou, Ying Wang, and Yi Chang. 2021. Structure-augmented text representation learning for efficient knowledge graph completion. In Proceedings of the Web Conference 2021. 1737–1748.
[36]
Meng Wang, Sen Wang, Han Yang, Zheng Zhang, Xi Chen, and Guilin Qi. 2021. Is visual context really helpful for knowledge graph? A representation learning perspective. In Proceedings of the 29th ACM International Conference on Multimedia. 2735–2743.
[37]
Zikang Wang, Linjing Li, Qiudan Li, and Daniel Zeng. 2019. Multimodal data enhanced representation learning for knowledge graphs. In 2019 International Joint Conference on Neural Networks (IJCNN’19). IEEE, 1–8.
[38]
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.
[39]
Ruobing Xie, Zhiyuan Liu, Jia Jia, Huanbo Luan, and Maosong Sun. 2016. Representation learning of knowledge graphs with entity descriptions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
[40]
Ruobing Xie, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2017. Image-embodied knowledge representation learning. In Proceedings of the 26th International Joint Conference on Artificial Intelligence.
[41]
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In 3rd International Conference on Learning Representations.
[42]
Yuhao Yang, Chao Huang, Lianghao Xia, and Chenliang Li. 2022. Knowledge graph contrastive learning for recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1434–1443.
[43]
Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. KG-BERT: BERT for knowledge graph completion. arXiv preprint arXiv:1909.03193 (2019).
[44]
Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. QA-GNN: Reasoning with language models and knowledge graphs for question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[45]
Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander Smola, and Le Song. 2018. Variational reasoning for question answering with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[46]
Yu Zhao, Xiangrui Cai, Yike Wu, Haiwei Zhang, Ying Zhang, Guoqing Zhao, and Ning Jiang. 2022. MoSE: Modality split and ensemble for multimodal knowledge graph completion. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 10527–10536.
[47]
Shangfei Zheng, Weiqing Wang, Jianfeng Qu, Hongzhi Yin, Wei Chen, and Lei Zhao. 2023. MMKGR: Multi-hop multi-modal knowledge graph reasoning. In 2023 IEEE 39th International Conference on Data Engineering (ICDE’23). IEEE, 96–109.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 8
August 2024
726 pages
EISSN:1551-6865
DOI:10.1145/3618074
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2024
Online AM: 11 May 2024
Accepted: 24 April 2024
Revised: 23 April 2024
Received: 26 November 2023
Published in TOMM Volume 20, Issue 8

Check for updates

Author Tags

  1. Multimodal knowledge graph
  2. knowledge graph completion
  3. knowledge alignment

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • National High Level Hospital Clinical Research Funding
  • Beijing Natural Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 551
    Total Downloads
  • Downloads (Last 12 months)551
  • Downloads (Last 6 weeks)55
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media