survey

A Survey of Multi-modal Knowledge Graphs: Technologies and Trends

Authors:

Pasquale De Meo,

Jia ZhuAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 11

Article No.: 273, Pages 1 - 41

https://doi.org/10.1145/3656579

Published: 28 June 2024 Publication History

Abstract

In recent years, Knowledge Graphs (KGs) have played a crucial role in the development of advanced knowledge-intensive applications, such as recommender systems and semantic search. However, the human sensory system is inherently multi-modal, as objects around us are often represented by a combination of multiple signals, such as visual and textual. Consequently, Multi-modal Knowledge Graphs (MMKGs), which combine structured knowledge representation with multiple modalities, represent a powerful extension of KGs. Although MMKGs can handle certain types of tasks (e.g., visual query answering) or queries that standard KGs cannot process, and they can effectively tackle some standard problems (e.g., entity alignment), we lack a widely accepted definition of MMKG. In this survey, we provide a rigorous definition of MMKGs along with a classification scheme based on how existing approaches address four fundamental challenges: representation, fusion, alignment, and translation, which are crucial to improving an MMKG. Our classification scheme is flexible and allows for easy incorporation of new approaches, as well as a comparison of two approaches in terms of how they address one of the fundamental challenges mentioned above. As the first comprehensive survey of MMKG, this article aims at inspiring and provide a reference for relevant researchers in the field of Artificial Intelligence.

References

[1]

Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, and Shih-Fu Chang. 2019. Multi-level multimodal common semantic space for image-phrase grounding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 12476–12486.

[2]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a Web of open data. In Proceedings of the Semantic Web. Springer, 722–735.

Digital Library

[3]

A. Az, Hhbc Huang, and Hhac Chen. 2019. Multimodal joint learning for personal knowledge base construction from Twitter-based lifelogs. Information Processing & Management 57, 6 (2019), 102148.

[4]

Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 423–443.

Digital Library

[5]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65–72.

Digital Library

[6]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 1247–1250.

Digital Library

[7]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’13). (Lake Tahoe, Nevada), Curran Associates Inc., Red Hook, NY, USA, 2787–2795.

[8]

Jake Bouvrie. 2006. Notes on convolutional neural networks. (2006).

[9]

Martin D. Buhmann. 2003. Radial Basis Functions: Theory and Implementations. Vol. 12. Cambridge university press.

[10]

Ermei Cao, Difeng Wang, Jiacheng Huang, and Wei Hu. 2020. Open knowledge enrichment for long-tail entities. In Proceedings of The Web Conference 2020. ACM / IW3C2, Taipei, Taiwan, 384–394.

Digital Library

[11]

Changhao Chen, Stefano Rosa, Yishu Miao, Chris Xiaoxuan Lu, Wei Wu, Andrew Markham, and Niki Trigoni. 2019. Selective sensor fusion for neural visual-inertial odometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Vision Foundation, Long Beach, CA, USA, 10542–10551.

[12]

Liyi Chen, Zhi Li, Yijun Wang, Tong Xu, Zhefeng Wang, and Enhong Chen. 2020. MMEA: Entity alignment for multi-modal knowledge graph. In Proceedings of the International Conference on Knowledge Science, Engineering and Management.Springer, Hangzhou, China, 134–147.

Digital Library

[13]

Xinlei Chen, Abhinav Shrivastava, and Abhinav Gupta. 2013. NEIL: Extracting visual knowledge from Web data. In Proceedings of the IEEE International Conference on Computer Vision. 1409–1416.

Digital Library

[14]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, ARTICLE (2011), 2493–2537.

Digital Library

[15]

Victor de Boer, Jan Wielemaker, Judith van Gent, Marijke Oosterbroek, Michiel Hildebrand, Antoine Isaac, Jacco van Ossenbruggen, and Guus Schreiber. 2013. Amsterdam museum linked open data. Semantic Web 4, 3 (2013), 237–243.

[16]

Cheng Deng, Yuting Jia, Hui Xu, Chong Zhang, Jingyao Tang, Luoyi Fu, Weinan Zhang, Haisong Zhang, Xinbing Wang, and Chenghu Zhou. 2021. GAKG: A multimodal geoscience academic knowledge graph. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM Press, Queensland, Australia, 4445–4454.

Digital Library

[17]

Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence.

[18]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Minneapolis, MN, USA, 4171–4186.

[19]

Yang Ding, Jing Yu, Bang Liu, Yue Hu, Mingxin Cui, and Qi Wu. 2022. Mukea: Multimodal knowledge extraction and accumulation for knowledge-based visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5089–5098.

[20]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth \(16\times 16\) words: Transformers for image recognition at scale. In International Conference on Learning Representations, Oral.

[21]

Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008. Open information extraction from the web. Communications of the ACM 51, 12 (2008), 68–74.

Digital Library

[22]

Congcong Ge, Xiaoze Liu, Lu Chen, Baihua Zheng, and Yunjun Gao. 2021. LargeEA: Aligning entities for large-scale knowledge graphs. Proceedings of the VLDB Endowment 15, 2 (2021), 237–245.

Digital Library

[23]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.

Digital Library

[24]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’14). (Montreal, Canada), MIT Press, Cambridge, MA, USA, 2672–2680.

[25]

Lingbing Guo, Zequn Sun, and Wei Hu. 2019. Learning to exploit long-term relational dependencies in knowledge graphs. In Proceedings of the International Conference on Machine Learning. PMLR, 2505–2514.

[26]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: Retrieval-augmented language model pre-training. arXiv:2002.08909. Retrieved from https://arxiv.org/abs/2002.08909

[27]

Junheng Hao, Muhao Chen, Wenchao Yu, Yizhou Sun, and Wei Wang. 2019. Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019. ACM, Anchorage, Alaska, USA, 1709–1719.

Digital Library

[28]

F. Maxwell Harper and Joseph A. Konstan. 2015. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems 5, 4 (2015), 1–19.

Digital Library

[29]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.

[30]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[31]

K. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359–366.

[32]

Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In Proceedings of the ACM International Conference on Web Search and Data Mining. ACM, Melbourne, Australia, 105–113.

Digital Library

[33]

Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 687–696.

[34]

Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge graph completion with adaptive sparse transfer matrix. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.

Digital Library

[35]

Shengbin Jia, Yang Xiang, Xiaojun Chen, Kun Wang, and Shijia E. 2019. Triple trustworthiness measurement for knowledge graph. In Proceedings of the The World Wide Web Conference. ACM Press, San Francisco, CA, USA, 2865–2871.

Digital Library

[36]

Xiaotian Jiang, Quan Wang, and Bin Wang. 2019. Adaptive convolution for multi-relational learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 978–987.

[37]

Amar Viswanathan Kannan, Dmitriy Fradkin, Ioannis Akrotirianakis, Tugba Kulahcioglu, Arquimedes Canedo, Aditi Roy, Shih-Yuan Yu, Malawade Arnav, and Mohammad Abdullah Al Faruque. 2020. Multimodal knowledge graph for deep learning papers and code. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 3417–3420.

Digital Library

[38]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence Pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 7871–7880.

[39]

Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2020. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering 34, 1 (2020), 50–70.

Digital Library

[40]

Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, and Shih-Fu Chang. 2022. CLIP-Event: Connecting text and images with event structures. In Proceedings of the Internationaò Conference on Computer Vision and Pattern Recognition. IEEE, New Orleans, USA, 16399–16408.

[41]

Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, and Shih-Fu Chang. 2022. CLIP-Event: Connecting text and images with event structures. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. IEEE, New Orleans, USA, 16399–16408.

[42]

Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare R. Voss, Daniel Napierski, and Marjorie Freedman. 2020. GAIA: A fine-grained multimedia knowledge extraction system. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, 77–86.

[43]

Yangning Li, Jiaoyan Chen, Yinghui Li, Yuejia Xiang, Xi Chen, and Hai-Tao Zheng. 2023. Vision, deduction and alignment: An empirical study on multi-modal knowledge graph alignment. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’23). 1–5.

[44]

Ke Liang, Lingyuan Meng, Meng Liu, Yue Liu, Wenxuan Tu, Siwei Wang, Sihang Zhou, Xinwang Liu, and Fuchun Sun. 2022. Reasoning over different types of knowledge graphs: Static, temporal and multi-modal. arXiv preprint arXiv:2212.05767 (2022).

[45]

Ying Lin, Liyuan Liu, Heng Ji, Dong Yu, and Jiawei Han. 2019. Reliability-aware dynamic feature composition for name tagging. In Proceedings of the Conference of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 165–174.

[46]

Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. 2015. Modeling relation paths for representation learning of knowledge bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lluís Màrquez, Chris Callison-Burch, and Jian Su (Eds.). Association for Computational Linguistics, Lisbon, Portugal, 705–714.

[47]

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.

Digital Library

[48]

Fangyu Liu, Muhao Chen, Dan Roth, and Nigel Collier. 2021. Visual pivoting for (unsupervised) entity alignment. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 4257–4266.

[49]

Fangyu Liu, Rongtian Ye, Xun Wang, and Shuaipeng Li. 2020. HAL: Improved text-image matching by mitigating visual semantic hubs. In Proceedings of AAAI Conference on Artificial Intelligence. AAAI Press, New York, USA, 11563–11571.

[50]

Ye Liu, Hui Li, Alberto Garcia-Duran, Mathias Niepert, Daniel Onoro-Rubio, and David S. Rosenblum. 2019. MMKG: Multi-modal knowledge graphs. In Proceedings of the European Semantic Web Conference. Springer, 459–474.

Digital Library

[51]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from https://arxiv.org/abs/1907.11692

[52]

Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency. 2018. Efficient low-rank multimodal fusion with modality-specific factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Melbourne, Australia, 2247–2256.

[53]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.

[54]

Justin Lovelace, Denis Newman-Griffis, Shikhar Vashishth, Jill Fain Lehman, and Carolyn P. Rosé. 2021. Robust knowledge graph completion with stacked convolutions and a student re-ranking network. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Virtual Event, 1016–1029.

[55]

Justin Lovelace and Carolyn P. Rosé. 2022. A framework for adapting pre-trained language models to knowledge graph completion. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 5937–5955.

[56]

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Proceedings of the Annual Conference on Advances in Neural Information Processing Systems. Vancouver, BC, Canada, 13–23.

[57]

Yaojie Lu, Qing Liu, Dai Dai, Xinyan Xiao, Hongyu Lin, Xianpei Han, Le Sun, and Hua Wu. 2022. Unified structure generation for universal information extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 5755–5772.

[58]

Huaishao Luo, Lei Ji, Botian Shi, Haoyang Huang, Nan Duan, Tianrui Li, Jason Li, Taroon Bharti, and Ming Zhou. 2020. Univl: A unified video and language pre-training model for multimodal understanding and generation. arXiv preprint arXiv:2002.06353 (2020).

[59]

Xin Mao, Wenting Wang, Huimin Xu, Yuanbin Wu, and Man Lan. 2020. Relational reflection entity alignment. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, Virtual Event, 1095–1104.

Digital Library

[60]

Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: contextualized word vectors. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). (Long Beach, California, USA), Curran Associates Inc., Red Hook, NY, USA, 6297–6308.

[61]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[62]

Hatem Mousselly-Sergieh, Teresa Botschen, Iryna Gurevych, and Stefan Roth. 2018. A multimodal translation-based approach for knowledge graph representation learning. In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. 225–234.

[63]

Deepak Nathani, Jatin Chauhan, Charu Sharma, and Manohar Kaul. 2019. Learning attention-based embeddings for relation prediction in knowledge graphs. In Proceedings of the Conference of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4710–4723.

[64]

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of the ICML.

[65]

Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Phung. 2018. A novel embedding model for knowledge base completion based on convolutional neural network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 327–333.

[66]

Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Dongdong Zhang, and Nan Duan. 2021. M3P: Learning universal representations via multitask multilingual multimodal pre-training. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 3977–3986.

[67]

Daniel Oñoro-Rubio, Mathias Niepert, Alberto García-Durán, Roberto González, and Roberto J. López-Sastre. 2017. Answering visual-relational queries in web-extracted knowledge graphs. arXiv:1409.0473. Retrieved from https://arxiv.org/abs/1701.0013

[68]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.

Digital Library

[69]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP ’14). 1532–1543.

[70]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227–2237.

[71]

Pouya Pezeshkpour, Liyan Chen, and Sameer Singh. 2018. Embedding multimodal relational data for knowledge base completion. arXiv:1409.0473. Retrieved from https://arxiv.org/abs/1701.00133

[72]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative-Radford-Narasimhan/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035

[73]

T. Subba Rao and M. M. Gabr. 2012. An Introduction to Bispectral Analysis and Bilinear Time Series Models. Vol. 24. Springer Science and Business Media.

[74]

Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih-Fu Chang, Alexander G. Schwing, and Heng Ji. 2022. MuMuQA: Multimedia multi-hop news question answering via cross-media knowledge extraction and grounding. In Proc. of the AAAI Conference on Artificial Intelligence (AAAI’22). AAAI Press, 11200–11208.

[75]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.

[76]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans.actions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.

[77]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans.actions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.

Digital Library

[78]

S. Rendle. 2010. Factorization machines. In Proceedings of the 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010.

Digital Library

[79]

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2009), 61–80. DOI:DOI:

Digital Library

[80]

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In Proceedings of the European Semantic Web Conference. Springer, 593–607.

Digital Library

[81]

Edward W. Schneider. 1973. Course modularization applied: The interface system and its implications for sequence control and data analysis. Behavioral Objectives (1973), 21.

[82]

Ekaterina Shutova, Douwe Kiela, and Jean Maillard. [n. d.]. Black holes and white rabbits: Metaphor identification with visual features. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, San Diego, 160–170.

[83]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[84]

Amit Singhal. 2012. Introducing the knowledge graph: Things, not strings. Official Google Blog 5 (2012), 16. https://www.blog.google/products/search/introducing-knowledgegraph-things-not/

[85]

Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 567–576.

[86]

Fenglong Su, Chengjin Xu, Han Yang, Zhongwu Chen, and Ning Jing. 2023. Neural entity alignment with cross-modal supervision. Information Processing and Management 60, 2 (2023), 103174.

Digital Library

[87]

Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, and Arun Sacheti. 2021. GEM: A general evaluation benchmark for multimodal tasks. In Findings of the Association for Computational Linguistics: (ACL-IJCNLP’21). Association for Computational Linguistics, Online, 2594–2603.

[88]

Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web. 697–706.

Digital Library

[89]

Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal knowledge graphs for recommender systems. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, 1405–1414.

Digital Library

[90]

Zequn Sun, Chengming Wang, Wei Hu, Muhao Chen, Jian Dai, Wei Zhang, and Yuzhong Qu. 2020. Knowledge graph alignment network with gated multi-hop neighborhood aggregation. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, New York, NY, 222–229.

[91]

Shaohua Tao, Runhe Qiu, Yuan Ping, and Hui Ma. 2021. Multi-modal knowledge-aware reinforcement learning network for explainable recommendation. Knowledge-Based Systems 227 (2021), 107217.

Digital Library

[92]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, 6000–6010.

[93]

Meng Wang, Guilin Qi, HaoFen Wang, and Qiushuo Zheng. 2019. Richpedia: A comprehensive multi-modal knowledge graph. In Proceedings of the Joint International Semantic Technology Conference. Springer, 130–145.

[94]

Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhengyan Zhang, Zhiyuan Liu, Juanzi Li, and Jian Tang. 2021. KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics 9 (2021), 176–194.

[95]

Yuxuan Wang, Yutai Hou, Wanxiang Che, and Ting Liu. 2020. From static to dynamic word representations: A survey. International Journal of Machine Learning and Cybernetics 11, 7 (2020), 1611–1630.

[96]

Youze Wang, Shengsheng Qian, Jun Hu, Quan Fang, and Changsheng Xu. 2020. Fake news detection via knowledge-driven multimodal graph convolutional networks. In Proceedings of the 2020 International Conference on Multimedia Retrieval. 540–547.

Digital Library

[97]

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence.

[98]

W. X. Wilcke, Peter Bloem, Victor de Boer, R. H. van t Veer, and F. A. H. van Harmelen. 2020. End-to-end entity classification on multimodal knowledge graphs. arXiv preprint arXiv:2003.12383 (2020).

[99]

Han Xiao, Minlie Huang, Yu Hao, and Xiaoyan Zhu. 2015. TransA: An adaptive approach for knowledge graph embedding. arXiv preprint arXiv:1509.05490 (2015).

[100]

Han Xiao, Minlie Huang, and Xiaoyan Zhu. 2016. TransG: A generative model for knowledge graph embedding. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 2316–2325.

[101]

Jiawang Xie, Zhenhao Dong, Qinghua Wen, Hongyin Zhu, Hailong Jin, Lei Hou, and Juanzi Li. 2021. Construction of multimodal chinese tourism knowledge graph. In Proceedings of the International Conference of Pioneering Computer Scientists, Engineers and Educators. Springer, Taiyuan, China, 16–29.

[102]

Ruobing Xie, Zhiyuan Liu, Fen Lin, and Leyu Lin. 2018. Does William Shakespeare really write Hamlet? Knowledge representation learning with confidence. In Proceedings of the AAAI Conference on Artificial Intelligence.

[103]

Ruobing Xie, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2017. Image-embodied knowledge representation learning. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI’17). ijcai.org, Melbourne, Australia, 3140–3146.

[104]

Ruobing Xie, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2017. Image-embodied knowledge representation learning. In Proceedings of the International Joint Conference on Artificial Intelligence. ijcai.org, Melbourne, Australia, 3140–3146.

[105]

Chenyan Xiong, Russell Power, and Jamie Callan. 2017. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the International Conference on World Wide Web.ACM, Perth, Australia, 1271–1279.

Digital Library

[106]

Guohai Xu, Hehong Chen, Feng-Lin Li, Fu Sun, Yunzhou Shi, Zhixiong Zeng, Wei Zhou, Zhongzhou Zhao, and Ji Zhang. 2021. AliMe MKG: A multi-modal knowledge graph for live-streaming E-commerce. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, Queensland, Australia, 4808–4812.

Digital Library

[107]

Yexiang Xue, Yang Yuan, Zhitian Xu, and Ashish Sabharwal. 2018. Expanding holographic embeddings for knowledge completion. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). (Montréal, Canada), Curran Associates Inc., Red Hook, NY, 4496–4506.

[108]

Bishan Yang, Scott Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR’15).

[109]

Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. KG-BERT: BERT for knowledge graph completion. CoRR abs/1909.03193 (2019). arXiv:1909.03193 http://arxiv.org/abs/1909.03193

[110]

Shih-Yuan Yu, Ahmet Salih Aksakal, Sujit Rokka Chhetri, and Mohammad Abdullah Al Faruque. 2020. Deep Code Curator–code2graph Part-II. Technical Report. Technical Report TR-20-01. Center for Embedded and Cyber-Physical Systems ....

[111]

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. https://arxiv.org/abs/1409.2329

[112]

Kaisheng Zeng, Chengjiang Li, Lei Hou, Juanzi Li, and Ling Feng. 2021. A comprehensive survey of entity alignment for knowledge graphs. AI Open 2 (2021), 1–13.

[113]

Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco, CA, USA, 353–362.

Digital Library

[114]

Huaiwen Zhang, Quan Fang, Shengsheng Qian, and Changsheng Xu. 2019. Multi-modal knowledge-aware event memory network for social media rumor detection. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, Nice, France, 1942–1951.

Digital Library

[115]

Ningyu Zhang, Lei Li, Xiang Chen, Xiaozhuan Liang, Shumin Deng, and Huajun Chen. 2022. Multimodal analogical reasoning over knowledge graphs. arXiv preprint arXiv:2210.00312 (2022).

[116]

Yingying Zhang, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2019. Multi-modal knowledge-aware hierarchical attention network for explainable medical question answering. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, Nice, France, 1089–1097.

Digital Library

[117]

Chaoyu Zhu, Zhihao Yang, Xiaoqiong Xia, Nan Li, Fan Zhong, and Lei Liu. 2022. Multimodal reasoning based on knowledge graph embedding for specific diseases. Bioinformatics 38, 8 (2022), 2235–2245.

[118]

Jia Zhu, Changqin Huang, and Pasquale De Meo. 2023. DFMKE: A dual fusion multi-modal knowledge graph embedding framework for entity alignment. Information Fusion 90 (2023), 111–119.

Digital Library

[119]

Xiangru Zhu, Zhixu Li, Xiaodan Wang, Xueyao Jiang, Penglei Sun, Xuwu Wang, Yanghua Xiao, and Nicholas Jing Yuan. 2024. Multi-modal knowledge graph construction and application: A Survey. IEEE Transactions on Knowledge and Data Engineering 36, 2 (2024), 715–735.

Digital Library

[120]

Yuke Zhu, Ce Zhang, Christopher Ré, and Li Fei-Fei. 2015. Building a large-scale multimodal knowledge base system for answering visual queries. arXiv:1507.05670 [cs.CV].

[121]

Yushan Zhu, Huaixiao Zhao, Wen Zhang, Ganqiang Ye, Hui Chen, Ningyu Zhang, and Huajun Chen. 2021. Knowledge perceived multi-modal pretraining in E-commerce. In Proceedings of the 29th ACM International Conference on Multimedia (MM’21). (Virtual Event, China), Association for Computing Machinery, New York, NY, 2744–2752.

Digital Library

Cited By

Li YJi HYu FCheng LChe N(2025)Temporal multi-modal knowledge graph generation for link predictionNeural Networks10.1016/j.neunet.2024.107108185(107108)Online publication date: May-2025
https://doi.org/10.1016/j.neunet.2024.107108
Liu WHe YWang CXie SLi W(2025)Beyond expression: Comprehensive visualization of knowledge triplet factsInformation Processing & Management10.1016/j.ipm.2025.10406262:3(104062)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2025.104062
Sellami DInoubli WFarah IAridhi S(2025)Knowledge graph representation learning: A comprehensive and experimental overviewComputer Science Review10.1016/j.cosrev.2024.10071656(100716)Online publication date: May-2025
https://doi.org/10.1016/j.cosrev.2024.100716
Show More Cited By

Index Terms

A Survey of Multi-modal Knowledge Graphs: Technologies and Trends
1. Hardware
  1. Emerging technologies
    1. Biology-related information processing
      1. Neural systems

Recommendations

Multi-modal knowledge graphs representation learning via multi-headed self-attention
Abstract
Traditional knowledge graphs (KG) representation learning focuses on the link information between entities, and the effectiveness of learning is influenced by the complexity of KGs. Considering a multi-modal knowledge graph (MKG), due ...
Highlights
- New dataset containing the multi-modal features of each entity is generated.
- ...
NativE: Multi-modal Knowledge Graph Completion in the Wild
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Multi-modal knowledge graph completion (MMKGC) aims to automatically discover the unobserved factual knowledge from a given multi-modal knowledge graph by collaboratively modeling the triple structure and multi-modal information from entities. However, ...
Multi-modal entity alignment in hyperbolic space
Abstract
Many AI-related tasks involve the interactions of data in multiple modalities. It has been a new trend to merge multi-modal information into knowledge graph (KG), resulting in multi-modal knowledge graphs (MMKG). However, MMKGs usually suffer ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 11

November 2024

977 pages

EISSN:1557-7341

DOI:10.1145/3613686

Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2024

Online AM: 10 April 2024

Accepted: 02 April 2024

Revised: 28 February 2024

Received: 15 September 2022

Published in CSUR Volume 56, Issue 11

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

Research and Demonstration Application of Key Technologies for Personalized Learning Driven by Educational Big Data
National Key R&D Program of China
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
2,503
Total Downloads

Downloads (Last 12 months)2,503
Downloads (Last 6 weeks)463

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YJi HYu FCheng LChe N(2025)Temporal multi-modal knowledge graph generation for link predictionNeural Networks10.1016/j.neunet.2024.107108185(107108)Online publication date: May-2025
https://doi.org/10.1016/j.neunet.2024.107108
Liu WHe YWang CXie SLi W(2025)Beyond expression: Comprehensive visualization of knowledge triplet factsInformation Processing & Management10.1016/j.ipm.2025.10406262:3(104062)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2025.104062
Sellami DInoubli WFarah IAridhi S(2025)Knowledge graph representation learning: A comprehensive and experimental overviewComputer Science Review10.1016/j.cosrev.2024.10071656(100716)Online publication date: May-2025
https://doi.org/10.1016/j.cosrev.2024.100716
Yang JHe TGao FYang Y(2024)Deep knowledge tracing method based on enhancing knowledge graph embeddingProceedings of the 2024 7th International Conference on Computer Information Science and Artificial Intelligence10.1145/3703187.3703284(576-580)Online publication date: 13-Sep-2024
https://dl.acm.org/doi/10.1145/3703187.3703284
Ren SMa ZWang M(2024)Product Qulity Traceability of Cold Chain Logistics Based on Multimodal Knowledge Graph2024 20th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)10.1109/ICNC-FSKD64080.2024.10702298(1-6)Online publication date: 27-Jul-2024
https://doi.org/10.1109/ICNC-FSKD64080.2024.10702298
Zheng JZhao YJin GCui R(2024)Text-Guided Hierarchical Visual Prefix Network for Multimodal Relation Extraction2024 4th International Conference on Electronic Information Engineering and Computer Science (EIECS)10.1109/EIECS63941.2024.10800016(1051-1055)Online publication date: 27-Sep-2024
https://doi.org/10.1109/EIECS63941.2024.10800016
Alfaqeeh M(2024)TriMod Fusion for Multimodal Named Entity Recognition in Social Media2024 34th International Conference on Collaborative Advances in Software and COmputiNg (CASCON)10.1109/CASCON62161.2024.10837944(1-9)Online publication date: 11-Nov-2024
https://doi.org/10.1109/CASCON62161.2024.10837944

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents