Skip to main content
Log in

MMKRL: A robust embedding approach for multi-modal knowledge graph representation learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Most knowledge representation learning (KRL) methods only use structured knowledge graphs (KGs); however, there is still much multi-modal (textual, visual) knowledge that has not been used. To address this challenge, we propose a novel solution called multi-modal knowledge representation learning (MMKRL) to take advantage of multi-source (structured, textual, and visual) knowledge. Instead of simply integrating multi-modal knowledge with structured knowledge in a unified space, we introduce a component alignment scheme and combine it with translation methods to accomplish multi-modal KRL. Specifically, MMKRL firstly reconstructs multi-source knowledge by summing different plausibility functions and then aligns multi-source knowledge using specific norm constraints to reduce reconstruction errors. We also select an adversarial training strategy to enhance the robustness of MMKRL, which is rarely considered in existing multi-modal KRL methods. Experimental results show that MMKRL can effectively utilize multi-modal knowledge to achieve better link prediction and triple classification than other baselines in two widely used datasets. Further, when relying on structured knowledge or limited multi-source knowledge, MMKRL still achieves competitive results in link prediction, demonstrating our model’s superiority.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Ahmadvand M, Tahmoresnezhad J (2021) Metric transfer learning via geometric knowledge embedding. Appl Intell 51(2):921–934

    Article  Google Scholar 

  2. ao FLAC, Pádua FLC, Lacerda A, Pereira ACM, Dalip DH (2019) Multimodal data fusion framework based on autoencoders for top-n recommender systems. Appl Intell 49(9):3267–3282

    Article  Google Scholar 

  3. Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang JT (ed) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, pp 1247–1250 ACM

  4. Bordes A, Glorot X, Weston J, Bengio Y (2014) A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Mach Learn 94(2):233–259

    Article  MathSciNet  Google Scholar 

  5. Bordes A, Usunier N, García-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795

  6. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: Delving deep into convolutional nets. In: Valstar MF, French AP, Pridmore TP (eds) British machine vision conference, BMVC 2014, Nottingham, UK, September 1-5, 2014. BMVA Press

  7. Chen L, Li Z, Wang Y, Xu T, Wang Z, Chen E (2020) MMEA: entity alignment for multi-modal knowledge graph. In: Li G, Shen HT, Yuan Y, Wang X, Liu H, Zhao X (eds) Knowledge Science, engineering and management - 13th international conference, KSEM 2020, Hangzhou, China, August 28-30, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol 12274, pp 134–147. Springer

  8. Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pp 248–255. IEEE Computer Society

  9. Fang Y, Wang H, Zhao L, Yu F, Wang C (2020) Dynamic knowledge graph based fake-review detection. Appl Intell 50(12):4281–4295

    Article  Google Scholar 

  10. Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

  11. Han Y, Chen G, Li Z, Geng Z, Li F, Ma B (2020) An asymmetric knowledge representation learning in manifold space. Inf Sci 531:1–12

    Article  MathSciNet  Google Scholar 

  12. Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pp. 687–696. The Association for Computer Linguistics

  13. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

  14. Kundu D, Mandal DP (2019) Formulation of a hybrid expertise retrieval system in community question answering services. Appl Intell 49(2):463–477

    Article  Google Scholar 

  15. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2):167–195

    Article  Google Scholar 

  16. Li R, Jiang Z, Wang L, Lu X, Zhao M (2020) Directional attention weaving for text-grounded conversational question answering. Neurocomputing 391:13–24

    Article  Google Scholar 

  17. Li Y, Li H (2020) Online transferable representation with heterogeneous sources. Appl Intell 50(6):1674–1686

    Article  Google Scholar 

  18. Lin L, Liu J, Lv Y, Guo F (2020) A similarity model based on reinforcement local maximum connected same destination structure oriented to disordered fusion of knowledge graphs. Appl Intell 50 (9):2867–2886

    Article  Google Scholar 

  19. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Bonet B, Koenig S (eds) Proceedings of the Twenty-Ninth AAAI conference on artificial intelligence, January 25-30, 2015, Austin, Texas, USA, pp. 2181–2187. AAAI Press

  20. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st International conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings

  21. Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  22. Omran PG, Wang K, Wang Z (2021) An embedding-based approach to rule learning in knowledge graphs. IEEE Trans Knowl Data Eng 33(4):1348–1359

    Article  Google Scholar 

  23. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1532–1543. ACL

  24. Romadhony A, Widyantoro DH, Purwarianti A (2019) Utilizing structured knowledge bases in open IE based event template extraction. Appl Intell 49(1):206–219

    Article  Google Scholar 

  25. Sergieh HM, Botschen T, Gurevych I, Roth S (2018) A multimodal translation-based approach for knowledge graph representation learning. In: Nissim M, Berant J, Lenci A (eds) Proceedings of the seventh joint conference on lexical and computational semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018, pp 225–234. Association for Computational Linguistics

  26. Shafahi A, Najibi M, Xu Z, Dickerson JP, Davis LS, Goldstein T (2020) Universal adversarial training. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The Tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp 5636–5643. AAAI Press

  27. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings

  28. Tang X, Chen L, Cui J, Wei B (2019) Knowledge representation learning with entity descriptions, hierarchical types, and textual relations. Inf Process Manag 56(3):809–822

    Article  Google Scholar 

  29. Vo A, Nguyen Q, Ock C (2020) Semantic and syntactic analysis in learning representation based on a sentiment analysis model. Appl Intell 50(3):663–680

    Article  Google Scholar 

  30. Wang H, Jiang S, Yu Z (2020) Modeling of complex internal logic for knowledge base completion. Appl Intell 50(10):3336– 3349

    Article  Google Scholar 

  31. Wang LF, Lu X, Jiang Z, Zhang Z, Li R, Zhao M, Chen D (2019) Frs: A simple knowledge graph embedding model for entity prediction. Math Biosci Eng 16(6):7789–7807

    Article  MathSciNet  Google Scholar 

  32. Wang Z, Li L, Li Q, Zeng D (2019) Multimodal data enhanced representation learning for knowledge graphs. In: International joint conference on neural networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019, pp 1–8. IEEE

  33. Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Brodley CE, Stone P (eds) Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pp. 1112–1119. AAAI Press

  34. Wu Y, Wang S, Song G, Huang Q (2021) Augmented adversarial training for cross-modal retrieval. IEEE Trans Multim 23:559–571

    Article  Google Scholar 

  35. Xie R, Heinrich S, Liu Z, Weber C, Yao Y, Wermter S, Sun M (2020) Integrating image-based and knowledge-based representation learning. IEEE Trans Cogn Dev Syst 12(2):169–178

    Article  Google Scholar 

  36. Xie R, Liu Z, Jia J, Luan H, Sun M (2016) Representation learning of knowledge graphs with entity descriptions. In: Schuurmans D, Wellman MP (eds) Proceedings of the Thirtieth AAAI conference on artificial intelligence, February 12-17, 2016, Phoenix, Arizona, USA, pp 2659–2665. AAAI Press

  37. Xie R, Liu Z, Luan H, Sun M (2017) Image-embodied knowledge representation learning. In: Sierra C (ed) Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pp 3140–3146. ijcai.org

  38. Xu C, Yang M, Ao X, Shen Y, Xu R, Tian J (2021) Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning, vol 214, p 106730

  39. Yan Z, Peng R, Wang Y, Li W (2020) CTEA: context and topic enhanced entity alignment for knowledge graphs. Neurocomputing 410:419–431

    Article  Google Scholar 

  40. Zeb A, Haq AU, Zhang D, Chen J, Gong Z (2021) KGEL: A novel end-to-end embedding learning framework for knowledge graph completion. Expert Syst Appl 167:114164

    Article  Google Scholar 

  41. Zhang Y, Fang Q, Qian S, Xu C (2020) Multi-modal multi-relational feature aggregation network for medical knowledge representation learning. In: Chen CW, Cucchiara R, Hua X, Qi G, Ricci E, Zhang Z, Zimmermann R (eds) MM ’20: The 28th ACM international conference on multimedia, virtual event / seattle, WA, USA, October 12-16, 2020, pp 3956–3965. ACM

  42. Zhang Y, Xu H, Zhang X, Wu X, Yang Z (2021) TRFR: A ternary relation link prediction framework on knowledge graphs. Ad Hoc Netw 113:102402

    Article  Google Scholar 

  43. Zhang Z, Cai J, Zhang Y, Wang J (2020) Learning hierarchy-aware knowledge graph embeddings for link prediction. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The Tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 3065–3072. AAAI Press

Download references

Acknowledgments

This work was supported by the Henan key Laboratory for Big Data Processing & Analytics of Electronic Commerce(2020-KF-9).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zejun Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, X., Wang, L., Jiang, Z. et al. MMKRL: A robust embedding approach for multi-modal knowledge graph representation learning. Appl Intell 52, 7480–7497 (2022). https://doi.org/10.1007/s10489-021-02693-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02693-9

Keywords

Navigation